Key Topics in Surgical Research and Methodology
Thanos Athanasiou Haile Debas Ara Darzi (Eds.)
Key Topics in Surgical Research and Methodology
Thanos Athanasiou MD, PhD, FETCS Imperial College London St. Mary’s Hospital London Dept. Biosurgery & Surgical Technology 10th floor QEQM Bldg. Praed Street W2 1NY, London United Kingdom
Ara Darzi KBE, PC, FMedSci, HonFREng Imperial College London St. Mary’s Hospital London Dept. Biosurgery & Surgical Technology 10th floor QEQM Bldg. Praed Street W2 1NY, London United Kingdom
Dr. Haile T. Debas MD UCSF Global Health Sciences 3333 California Street, Suite 285 San Francisco L, CA 94143-0443 USA
ISBN: 978-3-540-71914-4
e-ISBN: 978-3-540-71915-1
DOI: 10.1007/978-3-540-71915-1 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2009933270 © Springer-Verlag Berlin Heidelberg 2010 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Product liability: The publishers cannot guarantee the accuracy of any information about dosage and application contained in this book. In every individual case the user must check such information by consulting the relevant literature. Cover design: eStudio Calamar, Figueres/Berlin Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Foreword
This is the first book to distil the tried experiences and reflective thoughts of three world leaders in academic surgery to comprehensively present the essence of building a department of surgery of stature. It is an ambitious undertaking. The contents cover all areas of activity that a modern and successful Professor of Surgery and Chief of Surgery of an academic department must be fully engaged in. These include traditional areas such as teaching, research, clinical service and administration. The demands for each of these areas have increased enormously in recent years. In addition however there are other important aspects particularly relevant to today’s surgical practice such as healthcare delivery and leadership. The vast amount of new knowledge currently available in these areas of responsibility, even for the most conscientious, is impossible to assimilate in a timely manner. This book covers all these and other subjects with sufficient information to very adequately arm the enquirer. Bringing the many divergent threads that represent the required core skills and weaving them into a complex interlocking fabric has been excellently achieved. The chapters are contributed by world leaders, and embody the definitive current record. It is a text for anyone who aspires to pursue a career in academic surgery; in addition, it is also essential reading for those who wish to engage in the critical and rigorous intellectual exercises of the thoughtful surgeon. In the Title and Preface the authors have placed emphasis on surgical research and its methods. This is to be interpreted in the broadest sense, as in many chapters of this book, there is a focus on the clinical care of patients. As such, it is also for those who wish to provide top quality service to their patients, whether in the university, teaching hospital environment or in the rural setting. The authors combine the best of cross-Atlantic thoughts on developing a surgical department of excellence. It is not a cook book that will ensure success, but it will help to avoid much of the learning curve and help to minimise mistakes on the way. Consulting this work gives contextual background to aid in learning on the job and contributing to world-class research. It is an academic manual of the highest quality that communicates the most refined skills of leading academic units and surgeons.
v
vi
Foreword
The concept of this book is to go beyond the restrictive nature of traditional surgical texts, and prepares the future academic leaders in surgery. With the changing scenario in so many fields, it is predictable that a book such as this will need to be updated regularly. Professor John Wong Chair of Surgery & Head Department of Surgery University of Hong Kong Medical Centre Queen Mary Hospital Hong Kong
Preface
Academic surgery has gained considerable importance over the last century, and it continues to benefit from the significant advances in science and technology. Its role in the continually evolving world of modern healthcare is becoming increasingly influential. Many of the recent innovations in our surgical practice such as minimally invasive surgery, telerobotic surgery and metabolic surgery have been spearheaded by academic surgeons. This has only been possible through significant efforts in the implementation of cutting-edge research and the adoption of evidence-based practice. Much of this has been realised through judicious surgical leadership and academic departmental organisations who foster an environment where the best candidates can be selected. Individuals central to this approach are surgeons who are not only technically proficient but also academically productive. Creating a dynamic exchange between research and clinical expertise has not always existed in the surgical profession. There are numerous operative practices that have few standards or have paradigms that are not wholly based on the best available science or evidence. The solution is self-explanatory and requires the adoption of educational excellence, technical proficiency and continual innovative research. Academic surgeons are key to implementing many of these strategic goals and will require an understating of many disciplines that range from basic laboratory research to statistical awareness of complex analytical methods. These proficiencies need to be accompanied by academic leadership, expertise in communication and non-technical skills. The aim of this book is to equip surgeons across all disciplines and specialities to enhance their academic know-how in order to successfully work within a surgical academic unit and to maximise their academic potential. The goals are to endow the fundamental scientific tents of surgical science, and also to increase the awareness of the equally important areas of departmental collaboration, the adoption of business acumen, engineering knowledge and industrial know-how. It addresses a whole range of topics ranging from how to incorporate best surgical evidence, applying for grants, performing a research study, applying ethics to research, setting-up a surgical education programme and running an academic department. It also communicates many of the surgical technological highlights that are considered important in modern surgical practice and presents some of the most significant bimolecular concepts of the present and future. Surgical research has improved in quality over the past few decades, and we present this book to advocate further the use of high-quality research in the form of clinical research trials. We strongly emphasise the importance of randomised studies with clearly defined, clinically relevant endpoints. Many of the chapters also focus on the increasing developments of biomedical technology in modern surgical practice. vii
viii
Preface
They clarify the increasing need to understand and adopt these developments to augment surgical practice and patient outcomes. The role of evidence-based surgery is also given particular focus. Although reading, interpreting and applying the best knowledge from the literature is one aspect of this field, it does not represent “all the evidence” available. This book considers a broader concept of evidence, and by doing so, specifies the central role of patients themselves within evidence-based practice. This is best understood through an equilibrium between the surgeon, the patient and the healthcare institution. The concepts presented will require application within the context of healthcare organisations and institutions worldwide. Many of these are already large or are in the process of significant growth, requiring visionary leadership strategies. An example includes the Academic Health Science Centre model, where collaboration, research networking and global cooperation are imperative. The scope of this book has been targeted to allow academic surgeons to exploit their local advantages whilst bridging the gap between surgical practice, patient safety and laboratory research. It will give an oversight of the importance of surgical research both locally and internationally. Many of the topics covered also highlight the importance of surgical research to governmental departments and policy makers. It will enable surgeons to clarify and prioritise the continuous influx of knowledge within the international literature. It strives to define the characteristics of talented individuals whilst also specifying the importance of market forces and administrative management. As such, we present it as a dedicated guide of modern academic surgery. The future of the surgical profession lies in the development of our knowledge, treatment resources and our most prized asset, surgeons themselves. We cannot only enhance our current strengths, but we require the continual advancement of the next generation of our trainees. A roadmap for the development of our future surgeons can be achieved through academic curricula. We therefore envisage this book as a foundation guide for the training of academic surgeons. This project would not have been possible without the significant knowledge forwarded by the chapter authors, many of whom are world leaders in their field. We thank our many colleagues and friends who helped us in this endeavour. The units where we work, namely the Department of Biosurgery and Surgical Technology at Imperial College London and the School of Medicine at the University of California, San Francisco, are sites of great inspiration and rewarding academic crosstalk that motivated us to write and prepare this book. London, UK San Francisco, USA London, UK
Thanos Athanasiou Haile T. Debas Ara Darzi
Acknowledgements
The editors want to specifically forward their appreciation to a number of individuals without whom this book would not have been possible. Beth Janz tirelessly managed the book from its inception, devoting long hours in communication with contributors and editors to bring this book to completion. Specific thanks also go to Hutan Ashrafian, who worked with energy and skill to co-ordinate many of the authors in keeping this project on track. We also recognise Christopher Rao for his dedicated graphical support on many of the chapter figures and also Erik Mayer for his continued assistance with this endeavour.
ix
About the Editors
Mr. Thanos Athanasiou, MD, PhD, FETCS Reader in Cardiac Surgery and Consultant Cardiac Surgeon
Mr. Thanos Athanasiou is a consultant cardiothoracic surgeon at St. Mary’s Hospital, Imperial College Healthcare NHS Trust and a reader of cardiac surgery in the Department of Biosurgery and Surgical Technology at Imperial College London. He specialises in complex aortic surgery, coronary artery bypass grafting (CABG), minimally invasive cardiac surgery and robotic assisted cardiothoracic surgery. His institutional responsibility is to lead academic cardiac surgery and complex aorta surgery. He is currently supervising eight MD/PhD students and has published more than 200 peer-reviewed journal papers. He has given several invited lectures in national and international forums in the field of technology in cardiac surgery, healthcare delivery and quality in surgery. His specialty research interest includes bio-inspired robotic systems and their application in cardiothoracic surgery, outcomes research in cardiac surgery, metabolic surgery and regenerative cardiovascular strategies.
xi
xii
His general research interests include quality metrics in healthcare and evidence synthesis including meta-analysis, decision and economic analysis. His statistical interests include longitudinal outcomes from cardiac surgical interventions. He has recently developed and published a novel methodology for analysing longitudinal and psychometric data. Link to the personal web page: http://www.thanosathanasiou.co.uk
Professor Haile T. Debas, MD Executive Director, UCSF Global Health Sciences, Maurice Galante Distinguished Professor of Surgery & Dean Emeritus, School of Medicine
Haile T. Debas, MD, the Executive Director of UCSF Global Health Sciences, is recognized internationally for his contributions to academic medicine and is currently widely consulted on issues associated with global health. At UCSF, he served as Dean (Medicine), Vice Chancellor (Medical Affairs), and Chancellor. Dr. Debas is also the Maurice Galante Distinguished Professor of Surgery and chaired the UCSF Department of Surgery. A native of Eritrea, he received his MD from McGill University and completed his surgical training at the University of British Columbia. Under Dr. Debas’s stewardship, the UCSF School of Medicine became a national model for medical education, an achievement for which he was recognized with the 2004 Abraham Flexner Award of the AAMC. His prescient grasp of the implications of fundamental changes in science led him to create several interdisciplinary research centres that have been instrumental in reorganising the scientific community at UCSF. He played a key role in developing UCSF’s new campus at Mission Bay. He has held
About the Editors
About the Editors
xiii
leadership positions with numerous membership organisations and professional associations, including serving as President of the American Surgical Association and Chair of the Council of Deans of the AAMC. He served for two terms as a member of the Committee on Science, Engineering, and Public Policy of the National Academy of Sciences. He is a member of the Institute of Medicine and has served as Chair of the Membership Committee. He is a fellow of the American Academy of Arts and Sciences. He currently serves on the United Nations’ Commission on HIV/ AIDS and Governance in Africa, and is a member of the Board of Regents of the Uniformed Services University of the Health Sciences.
Professor the Lord Darzi of Denham, KBE, PC, FMedSci, HonFREng Paul Hamlyn Chair of Surgery at Imperial College London. Honorary Consultant Surgeon at The Royal Marsden NHS Foundation Trust and Chairman of the Section of Surgery at The Institute of Cancer Research
Professor Lord Darzi holds the Paul Hamlyn Chair of Surgery at Imperial College London where he is Head of the Department of Biosurgery and Surgical Technology. He is an honorary consultant surgeon at Imperial College Hospital NHS Trust and the Royal Marsden Hospital. He also holds the Chair of Surgery at the Institute of Cancer Research. Professor Lord Darzi and his team are internationally respected for their innovative work in the advancement of minimal invasive surgery, robotics and allied technologies. His research is directed towards achieving best surgical practice through both innovation in surgery and enhancing the safety and quality of healthcare. This includes the evaluation of new technologies, studies of the safety and quality of care, the development of methods for enhancing healthcare delivery and new approaches for education and training. His contribution within these research fields has been outstanding, publishing over 500 peer-reviewed research papers to date. In recognition of his outstanding achievements in research and development of surgical technologies,
xiv
Professor Lord Darzi was elected as an Honorary Fellow of the Royal Academy of Engineering, and a Fellow of the Academy of Medical Sciences. Following a Knighthood in 2002 for his service to medicine and surgery, Professor Lord Darzi was introduced to the House of Lords in 2007 and appointed as Parliamentary Under Secretary of State at the Department of Health (2007–2009). At the Prime Minister’s request, Professor Lord Darzi led a review of the United Kingdom’s National Health Service, with the aim of achieving high quality care for all national healthcare patients. He was awarded the Queen’s approval of membership in Her Majesty’s most honourable Privy Council in 2009. Professor Lord Darzi is currently appointed as the Global Ambassador for Health and Life Sciences, and Chair of NHS Global for the Cabinet office.
About the Editors
Contents
1
The Role of Surgical Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Omer Aziz and John G. Hunter
1
2
Evidence-Based Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hutan Ashrafian, Nick Sevdalis, and Thanos Athanasiou
9
3
The Role of the Academic Surgeon in the Evaluation of Healthcare Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roger M. Greenhalgh
27
Study Design, Statistical Inference and Literature Search in Surgical Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Petros Skapinakis and Thanos Athanasiou
33
4
5
Randomised Controlled Trials: What the Surgeon Needs to Know . . Marcus Flather, Belinda Lees, and John Pepper
55
6
Monitoring Trial Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hutan Ashrafian, Erik Mayer, and Thanos Athanasiou
67
7
How to Recruit Patients in Surgical Studies . . . . . . . . . . . . . . . . . . . . . Hutan Ashrafian, Simon Rowland, and Thanos Athanasiou
75
8
Diagnostic Tests and Diagnostic Accuracy in Surgery . . . . . . . . . . . . . Catherine M. Jones, Lord Ara Darzi, and Thanos Athanasiou
83
9
Research in Surgical Education: A Primer . . . . . . . . . . . . . . . . . . . . . . Adam Dubrowski, Heather Carnahan, and Richard Reznick
99
10
Measurement of Surgical Performance for Delivery of a Competency-Based Training Curriculum . . . . . . . . . . . . . . . . . . . Raj Aggarwal and Lord Ara Darzi
115
Health-Related Quality of Life and its Measurement in Surgery – Concepts and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . Jane M. Blazeby
129
11
xv
xvi
12
Contents
Surgical Performance Under Stress: Conceptual and Methodological Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sonal Arora and Nick Sevdalis
141
13
How can we Assess Quality of Care in Surgery? . . . . . . . . . . . . . . . . . Erik Mayer, Andre Chow, Lord Ara Darzi, and Thanos Athanasiou
151
14
Patient Satisfaction in Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andre Chow, Erik Mayer, Lord Ara Darzi, and Thanos Athanasiou
165
15
How to Measure Inequality in Health Care Delivery . . . . . . . . . . . . . . Erik Mayer and Julian Flowers
175
16
The Role of Volume–Outcome Relationship in Surgery. . . . . . . . . . . . Erik Mayer, Lord Ara Darzi, and Thanos Athanasiou
195
17
An Introduction to Animal Research . . . . . . . . . . . . . . . . . . . . . . . . . . . James Kinross and Lord Ara Darzi
207
18
The Ethics of Animal Research. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hutan Ashrafian, Kamran Ahmed, and Thanos Athanasiou
229
19
Ethical Issues in Surgical Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amy G. Lehman and Peter Angelos
237
20
Principles and Methods in Qualitative Research . . . . . . . . . . . . . . . . . Roger Kneebone and Heather Fry
243
21
Safety in Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Charles Vincent and Krishna Moorthy
255
22
Safety and Hazards in Surgical Research . . . . . . . . . . . . . . . . . . . . . . . Shirish Prabhudesai and Gretta Roberts
271
23
Fraud in Surgical Research – A Framework of Action Is Required. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conor J. Shields, Desmond C. Winter, and Patrick Broe
283
A Framework Is Required to Reduce Publication Bias The Academic Surgeon’s View . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ronnie Tung-Ping Poon and John Wong
293
24
25
26
Data Collection, Database Development and Quality Control: Guidance for Clinical Research Studies . . . . . . . . . . . . . . . . . Daniel R. Leff, Richard E. Lovegrove, Lord Ara Darzi, and Thanos Athanasiou The Role of Computers and the Type of Computing Skills Required in Surgery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Julian J. H. Leong
305
321
Contents
xvii
27
Computational and Statistical Methodologies for Data Mining in Bioinformatics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lee Lancashire and Graham Ball
28
The Use of Bayesian Networks in Decision-Making . . . . . . . . . . . . . . . Zhifang Ni, Lawrence D. Phillips, and George B. Hanna
29
A Bayesian Framework for Assessing New Surgical Health Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elisabeth Fenwick
337
351
361
30
Systematic Reviews and Meta-Analyses in Surgery . . . . . . . . . . . . . . . Sukhmeet S. Panesar, Weiming Siow, and Thanos Athanasiou
375
31
Decision Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christopher Rao and Thanos Athanasiou
399
32
Cost-Effectiveness Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christopher Rao and Thanos Athanasiou
411
33
Value of Information Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christopher Rao and Thanos Athanasiou
421
34
Methodological Framework for Evaluation and Prevention of Publication Bias in Surgical Studies . . . . . . . . . . . . . . . . . . . . . . . . . Danny Yakoub, Sukhmeet S. Panesar, and Thanos Athanasiou
35
Graphs in Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Akram R.G. Hanna, Christopher Rao, and Thanos Athanasiou
36
Questionnaires, Surveys, Scales in Surgical Research: Concepts and Methodology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohammed Shamim Rahman, Sana Usman, Oliver Warren, and Thanos Athanasiou
429
441
477
37
How to Perform Analysis of Survival Data in Surgery. . . . . . . . . . . . . Fotios Sianis
495
38
Risk Stratification and Prediction Modelling in Surgery. . . . . . . . . . . Vassilis G. Hadjianastassiou, Thanos Athanasiou, and Linda J. Hands
507
39
The Principles and Role of Medical Imaging in Surgery . . . . . . . . . . . Daniel Elson and Guang-Zhong Yang
529
40
How to Read a Paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hutan Ashrafian and Thanos Athanasiou
545
41
How to Evaluate the Quality of the Published Literature . . . . . . . . . . Andre Chow, Sanjay Purkayastha, and Thanos Athanasiou
557
xviii
Contents
42
How to Write a Surgical Paper. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sanjay Purkayastha
569
43
A Primer for Grant Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hutan Ashrafian, Alison Mortlock, and Thanos Athanasiou
579
44
Key Aspects of Grant Applications: The Surgical Viewpoint . . . . . . . Bari Murtuza and Thanos Athanasiou
587
45
How to Organise an Educational Research Programme Within an Academic Surgical Unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kamran Ahmed, Hutan Ashrafian, and Paraskevas Paraskeva
597
46
How to Structure an Academic Lecture. . . . . . . . . . . . . . . . . . . . . . . . . Bari Murtuza and Thanos Athanasiou
605
47
How to Write a Book Proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christopher Rao and Thanos Athanasiou
611
48
How to Organise a Surgical Meeting: National and International . . . Bari Murtuza and Thanos Athanasiou
615
49
Presentation Skills in Surgery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sanjay Purkayastha
625
50
Internet Research Resources for Surgeons . . . . . . . . . . . . . . . . . . . . . . Santhini Jeyarajah and Sanjay Purkayastha
629
51
Clinical Practice Guidelines in Surgery. . . . . . . . . . . . . . . . . . . . . . . . . Shawn Forbes, Cagla Eskicioglu, and Robin McLeod
637
52
From Idea to Bedside: The Process of Surgical Invention and Innovation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . James Wall, Geoffrey C. Gurtner, and Michael T. Longaker
647
Research Governance and Research Funding in the USA: What the Academic Surgeon Needs to Know . . . . . . . . . . . . . . . . . . . . Michael W. Mulholland and James A. Bell
657
Research Governance in the UK: What the Academic Surgeon Needs to Know. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gary C. Roper
669
Research Funding, Applying for Grants and Research Budgeting in the UK: What the Academic Surgeon Needs to Know. . . . . . . . . . . Karen M Sergiou
677
How to Enhance Development and Collaboration in Surgical Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peter Ellis
695
53
54
55
56
Contents
xix
57
Mentoring in Academic Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oliver Warren and Penny Humphris
715
58
Leadership in Academic Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oliver Warren and Penny Humphris
727
59
Using Skills from Art in Surgical Practice and Research-Surgery and Art. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Donna Winderbank-Scott
60
Administration of the Academic Department of Surgery . . . . . . . . . . Carlos A. Pellegrini, Avalon R. Lance, and Haile T. Debas
61
Information Transfer and Communication in Surgery: A Need for Improvement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kamal Nagpal and Krishna Moorthy
62
General Surgery: Current Trends and Recent Innovations. . . . . . . . . John P. Cullen and Mark A. Talamini
63
Upper Gastrointestinal Surgery: Current Trends and Recent Innovations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Danny Yakoub, Oliver Priest, Akram R. George, and George B. Hanna
64
Colorectal Cancer Surgery: Current Trends and Recent Innovations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oliver Priest, Paul Ziprin, and Peter W. Marcello
65
Urology: Current Trends and Recent Innovations . . . . . . . . . . . . . . . . Erik Mayer and Justin Vale
66
Cardiothoracic Surgery: Current Trends and Recent Innovations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joanna Chikwe, Thanos Athanasiou, and Adanna Akujuo
741
753
771
781
793
815
833
849
67
Vascular Surgery: Current Trends and Recent Innovations . . . . . . . . Mark A. Farber, William A. Marston, and Nicholas Cheshire
875
68
Breast Surgery: Current Trends and Recent Innovations . . . . . . . . . . Dimitri J. Hadjiminas
895
69
Thyroid Surgery: Current Trends and Recent Innovations . . . . . . . . Charlie Huins and Neil Samuel Tolley
905
70
Orthopaedic Surgery: Current Trends and Recent Innovations. . . . . Andrew Carr and Stephen Gwilym
913
71
Plastic, Reconstructive and Aesthetic Surgery: Current Trends and Recent Innovations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marios Nicolaou, Matthew D. Gardiner, and Jagdeep Nanchahal
923
xx
Contents
72
Neurosurgery: Current Trends and Recent Innovations . . . . . . . . . . . David G.T. Thomas and Laurence Watkins
941
73
Molecular Techniques in Surgical Research . . . . . . . . . . . . . . . . . . . . . Athanassios Kotsinas, Michalis Liontos, Ioannis S. Pateras, and Vassilis G. Gorgoulis
951
74
Molecular Carcinogenesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Zachariadis, Konstantinos Evangelou, Nikolaos G. Kastrinakis, Panagiota Papanagnou, and Vassilis G. Gorgoulis
975
Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005
Contributors
Raj Aggarwal Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, QEQM Building, St. Mary’s Hospital, Praed Street, London W2 1NY, UK
[email protected] Kamran Ahmed, MBBS, MRCS The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, at St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected] Adanna Akujuo, MD Department of Cardiothoracic Surgery, Mount Sinai Medical Centre, 1190 Fifth Avenue, New York, NY 10029, USA
[email protected] Peter Angelos, MD, PhD, FACS Department of Surgery,The University of Chicago, University of Chicago Medical Center, 5841 South Maryland Avenue, MC 5031, Chicago, IL 60637, USA
[email protected] Sonal Arora, BSc, MBBS, MRCS Department of Biosurgery and Surgical Technology, Imperial College London, 10th floor, QEQM, St. Mary’s Hospital, South Wharf Road, London W2 1NY, UK
[email protected] Hutan Ashrafian, MBBS, BSc(Hons), MRCS The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected] Thanos Athanasiou, MD, PhD, FETCS The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected]
xxi
xxii
Omer Aziz, MBBS, BSc, MRCS Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor QEQM Building, St. Mary’s Hospital, Praed Street, London W2 1NY, UK
[email protected] Graham Ball, BSc, PhD The John Van Geest Cancer Research Centre, School of Science and Technology, Nottingham Trent University, Clifton Lane, Nottingham NG11 8NS, UK
[email protected] James A. Bell, CPA, JD University of Michigan Health Systems, 2101 Taubman Center/SPC 5346, 1500 East Medical Center Drive, Ann Arbor, MI 48109, USA
[email protected] Jane M. Blazeby, MSc, MD, FRCS University Hospitals Bristol, NHS Foundation Trust, Level 7, Bristol Royal Infirmary, Marlborough Street, Bristol BS2 8HW, UK
[email protected] Patrick Broe, MCh, FRCSI Royal College of Surgeons in Ireland, Beaumont Hospital, Dublin, Ireland
[email protected] Heather Carnahan, PhD Department of Occupational Science and Occupational Therapy, University of Toronto, The Wilson Centre, 200 Elizabeth Street, Toronto, ON, M5G 2C4, Canada
[email protected] Andrew Carr, FRCS Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Nuffield Orthopaedic Centre, Windmill Road, Oxford OX3 7LD, UK
[email protected] Nicholas Cheshire, MD, FRCS Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, QEQM Wing, St Mary’s Hospital, Praed Street, London W2 1NY, UK
[email protected] Joanna Chikwe, MD, FRCS Department of Cardiothoracic Surgery, Mount Sinai Medical Centre, 1190 Fifth Avenue, New York, NY 10029, USA
[email protected] Andre Chow, BSc (Hons), MBBS, MRCS Department of Biosurgery and Surgical Technology, Imperial College London, QEQM Building, St Mary’s Hospital Campus, 10th Floor, Praed Street, London, W2 1NY, UK
[email protected] John P. Cullen, MD Department of Surgery, University of California at San Diego, 200 West Arbor Drive, 8400, San Diego, CA, USA
Contributors
Contributors
xxiii
Lord Ara Darzi, MD, FRCS, KBE The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected] Haile T. Debas University of California, 3333 California Street, Suite 285, San Francisco, CA 94143-0443, USA
[email protected] Adam Dubrowski, PhD Centre for Nursing Education Research, University of Toronto, 155 College Street, Toronto, ON, M5T 1P8, Canada
[email protected] Peter Ellis People in Health, Ability House, 7 Portland Place, London W1B 1PP, UK
[email protected] Daniel Elson, PhD Department of Biosurgery and Surgical Technology, Institute of Biomedical Engineering, Imperial College London, London SW7 2AZ, UK
[email protected] Cagla Eskicioglu, MD Department of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, and Zane Cohen Digestive Diseases Clinical Research Centre, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, ON, Canada Konstantinos Evangelou, BSc, MD, PhD Department of Histology & Embryology, Molecular Carcinogenesis Group, Medical School, University of Athens, 75 Mikras Asias Street, Goudi, Athens 11527, Greece
[email protected] Mark A. Farber, MD, FACS University of North Carolina, 3025 Burnett Womack, Chapel Hill, NC 27599, USA
[email protected] Elisabeth Fenwick Community Based Sciences, University of Glasgow, 1 Lilybank Gardens, Glasgow G12 8RZ, UK
[email protected] Marcus Flather, FRCP Clinical Trials and Evaluation Unit, Royal Brompton and Hospital and Imperial College, London SW3 6NP, UK
[email protected] Julian Flowers Eastern Region Public Health Observatory, Institute of Public Health, Robinson Way, Cambridge CB2 0SR, UK
[email protected]
xxiv
Shawn Forbes, BSc, MD, FRCSC Department of Surgery, University of Toronto, Toronto, ON, and Zane Cohen Digestive Diseases Clinical Research Centre, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, ON, Canada Heather Fry, BA, Dip Ed, Dip ARM, MPhil Higher Education Funding Council for England, Northavon House, Coldharbour Lane, Bristol BS16 1QD, UK
[email protected] Matthew D. Gardiner, MA, BM, BCh, MRCS Kennedy Institute of Rheumatology Division, Imperial College London, 65 Aspenlea Road, London W6 8LH, UK
[email protected] Akram R.H. George, MBBS, MRCS The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected] Roger M. Greenhalgh, MA, M.D, MChir, FRCS Division of Surgery, Oncology, Reproductive Biology & Anaesthetics, Imperial College, Charing Cross Hospital, Fulham Palace Road, London W6 8RF, UK
[email protected] Vassilis G. Gorgoulis, MD, PhD Department of Histology & Embryology, Molecular Carcinogenesis Group, Medical School, University of Athens, 75 Mikras Asias Street, Goudi, Athens 11527, Greece
[email protected] Geoffrey C. Gurtner, MD The Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford University, 257 Campus Drive, Stanford, CA 94305-5148, USA
[email protected] Stephen Gwilym, MRCS Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Nuffield Orthopaedic Centre, Windmill Road, Oxford OX3 7LD, UK
[email protected] Vassilis G. Hadjianastassiou, DM (Oxon), FEBVS, FRCS (Gen), BMBCh (Oxon), BSc Department of Transplantation, Directorate of Nephrology Transplantation and Urology, Guy’s & St. Thomas’ NHS Foundation Trust, Guy’s Hospital, St. Thomas’ Street, London SE1 9RT, UK
[email protected] Dimitri J. Hadjiminas, MD, FRCS Department of Breast and Endocrine Surgery, St Mary’s Hospital NHS Trust, Praed Street, London W2 1NY, UK
[email protected]
Contributors
Contributors
xxv
George B. Hanna, PhD, FRCS The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected] Linda J. Hands, MA, BSc, MBBS, FRCS, MS Nuffield Department of Surgery, University of Oxford, 6th Floor, John Radcliffe Hospital, Headley Way, Headington, Oxford OX3 9DU, UK
[email protected] Charlie Huins, BSc (Hons), MRCS, DOHNS, MSc Department of Ear, Nose and Throat Surgery, St Mary’s Hospital Trust, Praed Street, London W2 1NY, UK
[email protected] Penny Humphris, MSc (Econ), CBE The Department of BioSurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK John G. Hunter, MD, FACS Department of Surgery, Oregon Health & Science University, 3181 SW Sam Jackson Park Road – L223, Portland, OR 97239-3098, USA
[email protected] Santhini Jeyarajah Department of General Surgery, Royal London Hospital, Whitechapel, London E1 1BB, UK
[email protected] Catherine M. Jones, BSc, FRCR The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected] Nikolaos G. Kastrinakis, BSc, MSc, PhD Department of Histology & Embryology, Molecular Carcinogenesis Group, Medical School, University of Athens, 75 Mikras Asias Street, Goudi, Athens 11527, Greece
[email protected] Roger Kneebone, PhD, FRCS, FRCGP Department of Biosurgery and Surgical Technology, Chancellor’s Teaching Centre, 2nd Floor QEQM Wing, Imperial College London, St Mary’s Hospital, Praed Street, London W2 1NY, UK
[email protected] James Kinross, MBBS, BSc, MRCS Department of Biosurgery and Surgical Technology, Imperial College, 10th floor, QEQM, St. Mary’s Hospital, Praed Street, London, W2 1NY, UK
[email protected]
xxvi
Athanassios Kotsinas, BSc, PhD Department of Histology–Embryology, Molecular Carcinogenesis Group, Medical School, University of Athens, 75 Mikras Asias Street, Goudi, Athens 11527, Greece
[email protected] Lee Lancashire, BSc, MSc, PhD Paterson Institute for Cancer Research, University of Manchester, Manchester M20 4BX, UK
[email protected] Avalon R. Lance, BSN, MHA Department of Surgery, University of Washington, Box 356410, Seattle, WA 98195-6410, USA
[email protected] Belinda Lees, PhD Clinical Trials and Evaluation Unit, Royal Brompton and Harefield NHS Trust, Sydney Street, London, and National Heart and Lung Institute, Imperial College London, London, UK Daniel R. Leff, MBBS, MRCS Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, QEQM Building, St Mary’s Hospital Campus, Praed Street, London, W2 1NY, UK
[email protected] Amy G. Lehman, MD, MBA Department of Surgery, The University of Chicago, University of Chicago Medical Center, 5841 South Maryland Avenue, MC 5031, Chicago, IL 60637, USA Julian J. H. Leong, MA, MBBS, MRCS The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected] Michalis Liontos, MD Department of Histology–Embryology, Molecular Carcinogenesis Group, Medical School, University of Athens, 75 Mikras Asias Street, Goudi, Athens 11527, Greece
[email protected] Michael T. Longaker, MD, MBA The Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford University, 257 Campus Drive, Stanford, CA 94305-5148, USA
[email protected] Richard E. Lovegrove, MBBS, MRCS Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, QEQM Building, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected] Peter W. Marcello, FACS Department of Colon & Rectal Surgery, Lahey Clinic, 41 Mall Road, Burlington, MA 01805, USA
[email protected]
Contributors
Contributors
xxvii
William A. Marston, FACS University of North Carolina, 3025 Burnett Womack, Chapel Hill, NC 27599, USA Erik Mayer, BSc (Hons), MBBS, MRCS Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, QEQM Building, St Mary’s Hospital Campus, Praed Street, London, W2 1NY, UK
[email protected] Robin McLeod, MD, FRCSC, FACS Department of Surgery, University of Toronto, Toronto, ON, Canada
[email protected] Krishna Moorthy, MS, MD, FRCS The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected] Alison Mortlock, BSc, PhD The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College London, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected] Michael W. Mulholland, MD, PhD University of Michigan Health Systems, 2101 Taubman Center/SPC 5346, 1500 East Medical Center Drive, Ann Arbor, MI 48109, USA
[email protected] Bari Murtuza, MA, PhD, FRCS (Eng) The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected] Kamal Nagpal, MBBS, MS, MRCS Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor QEQM Building, St. Mary’s Hospital, Praed Street, London W2 1NY, UK
[email protected] Jagdeep Nanchahal, BSc, PhD, MBBS, FRCS (Plast), FRACS Kennedy Institute of Rheumatology Division, Imperial College London, 1 Aspenlea Road, London W6 8RF, UK
[email protected] Zhifang Ni, MSc The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust at St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected]
xxviii
Marios Nicolaou, BMedSci, BM, BS, MRCS, PhD Imperial College London, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, L ondon W2 1NY, UK
[email protected] Sukhmeet S. Panesar, MBBS, BSc (Hons), AICSM National Patient Safety Agency, 4 – 8 Maple Street London, W1T 5HD, UK
[email protected] Panagiota Papanagnou, BSc Department of Histology & Embryology, Molecular Carcinogenesis Group, Medical School, University of Athens, 75 Mikras Asias Street, Goudi, Athens 11527, Greece
[email protected] Paraskevas Paraskeva, PhD, FRCS The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected] Ioannis S. Pateras, MD Department of Histology-Embryology, Molecular Carcinogenesis Group, Medical School, University of Athens, 75 Mikras Asias Street, Goudi, Athens 11527, Greece Carlos A. Pellegrini, MD, FACS Department of Surgery, University of Washington, Box 356410, Seattle, WA 98195-6410, USA
[email protected] John Pepper, FRCS National Heart and Lung Institute, Imperial College, London and Clinical Trials and Evaluation Unit, Royal Brompton and Harefield NHS Trust, Sydney Street, London, UK Lawrence D. Phillips, PhD The Department of Management, London School of Economics and Political Science, Houghton Street, London WC2A 2AE, UK
[email protected] Ronnie Tung-Ping Poon, MBBS, MS, PhD, FRCS (Edin), FACS Department of Surgery, Queen Mary Hospital, 102 Pokfulam Road, Hong Kong, China
[email protected] Shirish Prabhudesai, MS, MRCS Bart’s and the London Hospital NHS Trust, The Royal London Hospital, Whitechapel, London E1 1BB, UK
[email protected] Oliver Priest, MBChB, MRCS Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, QEQM Building, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected]
Contributors
Contributors
xxix
Sanjay Purkayastha, MD, MRCS Department of Biosurgery and Surgical Technology, Imperial College London, QEQM Building, St. Mary’s Hospital, 10th Floor, Praed Street, London W2 1NY, UK
[email protected] Mohammed Shamim Rahman, MBBS, MRCP The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust at St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected] Christopher Rao, MBBS, BSc (Hons) Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust at St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected] Richard Reznick, MD Department of Surgery, University of Toronto, 100 College Street, 311, Toronto, ON, M5G 1L5, Canada
[email protected] Gretta Roberts, BSc, PhD The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, St Mary’s Hospital, Praed Street, London W2 1NY, UK
[email protected] Gary C. Roper Imperial College London, Imperial College Healthcare NHS Trust, AHSC Joint Research Office, G02 Sir Alexander Fleming Building, Exhibition Road, London SW7 2AZ, UK
[email protected] Simon Rowland The Departments of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College London, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected] Karen M. Sergiou Research Office, Imperial College London, Exhibition Road, London SW7 2AZ, UK k.sergiou @imperial.ac.uk Nick Sevdalis, PhD National Institute for Health Research, Imperial Centre for Patient Safety and Service Quality, Imperial College London, London, and Clinical Safety Research Unit, The Department of Biosurgery and Surgical Technology, Imperial College London, London, UK
[email protected] Conor J. Shields, BSc, MD, FRCSI Department of Surgery, Mater Misericordiae University Hospital, Eccles Street, Dublin, Ireland
[email protected]
xxx
Fotios Sianis, PhD Department of Mathematics, University of Athens, Panepistemiopolis, Athens 15784, Greece
[email protected] Weiming Siow, MBBS, BSc (Hons) North Middlesex University NHS Hospital, Sterling Way, London N18 1QX, UK
[email protected] Petros Skapinakis, MD, MPH, PhD University of Ioannina, School of Medicine, Ioannina 45110, Greece p.skapinakis @gmail.com Mark A. Talamini, MD, FACS Department of Surgery, University of California at San Diego, 200 West Arbor Drive, 8400 San Diego, CA 92103, USA
[email protected] David G.T. Thomas, FRCS The National Hospital for Neurology and Neurosurgery, Institute of Neurology, Queen Square, London WC1N 3BG, UK
[email protected] Neil Samuel Tolley, MD, FRCS, DLO Department of Ear, Nose and Throat Surgery, St Mary’s Hospital, Imperial Hospital NHS Healthcare Trust, Praed Street, London W2 1NY, UK
[email protected] Sana Usman, BSc, MBBS The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected] Justin Vale, MS, FRCS (Urol) Imperial College Healthcare NHS Trust, St Mary’s Hospital, Praed Street, London W2 1NY, UK
[email protected] Charles Vincent, BA, MPhil, PhD The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK James Wall, MD The Department of Surgery, Stanford University School of Medicine, Stanford University, 257 Campus Drive, Stanford, CA 94305-5148, USA Oliver Warren, BSc (Hons), MRCS (Eng) The Department of BioSurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected]
Contributors
Contributors
xxxi
Laurence Watkins, FRCS Victor Horsley Department of Neurosurgery, The National Hospital for Neurology and Neurosurgery, Queen Square, London WC1N 3BG, UK
[email protected] Donna Winderbank-Scott, MBBS, BSc, AICSM The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected] Desmond C. Winter, MD, FRCSI Department of Surgery, St. Vincent’s University Hospital, Dublin, Ireland
[email protected] John Wong, MBBS, PhD, FRACS, FACS Department of Surgery, Queen Mary Hospital, 102 Pokfulam Road, Hong Kong, China
[email protected] Danny Yakoub, MBBCh, MSc, MRCSEd Department of Surgery, Staten Island University Hospital, 475 Seaview Avenue, Staten Island, New York, NY 10305, USA
[email protected] Guang-Zhong Yang Institute of Biomedical Engineering, Imperial College London, London, and Royal Society/Wolfson MIC Laboratory, 305/306 Huxley Building, Department of Computing, 180 Queens Gate, Imperial College of Science, Technology, and Medicine, London SW7 2BZ, UK
[email protected] Michael Zachariadis, BSc, PhD Department of Histology & Embryology, Molecular Carcinogenesis Group, Medical School, University of Athens, 75 Mikras Asias Street, Goudi, Athens 11527, and Department of Anatomy, Medical School, University of Athens, 75 Mikras Asias Street, Goudi, Athens 11527, Greece
[email protected] Paul Ziprin, MBBS, MD, FRCS The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK
[email protected]
1
The Role of Surgical Research Omer Aziz and John G. Hunter
Contents 1.1
Introduction ............................................................
1
1.2
The Aims of Surgical Research .............................
2
1.3
Translating Surgical Research into Practice........
3
1.4
Challenges Faced by the Twenty-First Century Academic Surgeon ..................................
5
The Role of the Academic Surgeon in Teaching ..............................................................
7
The Future of Surgical Research ..........................
7
References ...........................................................................
7
1.5 1.6
O. Aziz () Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor QEQM Building, St. Mary’s Hospital, Praed Street, London W2 1NY, UK e-mail:
[email protected]
Abstract This chapter outlines the role of surgical research in advancing clinical knowledge, achieving better clinical outcomes and ultimately improving the quality of patient care. It reviews the origins of surgical research and the challenges that need to be overcome if it is to survive, describing the importance of translation of research into clinical practice through better trial design, information dissemination and teaching. Finally, this chapter looks to the future of academic surgery and the shape that this may take.
1.1 Introduction Historically, research has played a crucial role in the advancement of medicine, our understanding of disease processes and the way that we study them. Clinicians and health care professionals across specialties and disciplines now use research in almost every aspect of their working lives in order to guide an evidence-based practice, evaluate the effectiveness of new therapies or demonstrate the efficacy of new health care technologies. The ultimate aim of clinical research is to improve the management that patients receive in order to achieve the best possible outcome for them. Financial support through government-funded grants, charities and the commercial sector has been a key driver for this and has led to the establishment of institutional clinical research units that employ academic clinicians across a range of disciplines. These academics are judged by both the quality and originality of the research their units produce, and are sustained by their fund-raising ability. While this has certainly raised the standard of research through improved trial design, execution and reporting, there remain areas within medicine where both the nature of the disease processes involved and the ethical dilemmas associated
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_1, © Springer-Verlag Berlin Heidelberg 2010
1
2
with studying them require a special appreciation of clinical investigative tools. The study of surgical disease is one such challenging area and has historically led to research that has been largely observational, reflective and retrospective in nature. In modern times, the surgical research community has responded to this by developing novel and effective investigative tools to become, perhaps, one of the most rapidly growing and exciting areas of investigative clinical activity. This chapter highlights the role of surgical research in the advancement of clinical science to date, and outlines the challenges that lie ahead. It sets the scene for the rest of this text, which aims to cover many of the crucial advances in surgical research methodology, thereby providing a platform for the latest breed of academic surgeons to launch their investigative career.
1.2 The Aims of Surgical Research When considering the challenges faced by academic surgeons and their aims to overcome these, it is important to appreciate the origins of surgical research. Historically, the vast majority of “great surgeons” have been individuals with an undeterred drive to be leaders in their field. They have been stereotyped as inquisitive, challenging and bold individuals with a confidence to succeed as well as a passion for perfectionism through knowledge. The high profile that these individuals have enjoyed maintains itself through the disease processes, procedures and instruments they have had named after themselves, making them the forefathers of surgery. John Hunter’s (1728–1793) work on human anatomical research through the application of rigorous scientific experimentation [6], Theodor Billroth’s (1829–1894) landmark gastro-intestinal surgical procedures [5] and Alfred Blalock’s (1899–1964) study of shock and the Tetralogy of Fallot [2] serve as examples of the immortality of the contributions of these surgeons. It is through this inquisitive environment that “academic surgery” has seen its birth, with observations stimulating investigative study, publication of practical findings and resulting clinical applications. The nature of investigative research itself has evolved dramatically, with much of what was acceptable before now becoming ethically unjustifiable. A great example of this is the French surgeon Ambroise Paré’s (1510–1590) experiment on Bezoars [9]. It was
O. Aziz and J. G. Hunter
at the time a commonly-held belief that the antidote to any poison was the Bezoar stone. Sceptical of this hypothesis, Paré designed an opportunistic experiment on a servant in his staff who had been caught stealing and was to be hanged for this. He convinced the servant to being poisoned instead, with the understanding that immediately after taking the poison, he would agree to also ingest a piece of Bezoar. If he lived, he would be free. The servant died from the poison, and Paré’s observation disproved this hypothesis in an extremely simplistic yet dramatic fashion. In modern surgical research, we can thankfully no longer ethically justify such types of study, yet the underlying inquisitiveness and problem-solving approach of the academic surgeon are key ingredients required in investigative surgical research to this day. The sophistication of diagnostic, study design and statistical tools now available to the academic surgeon have improved dramatically. Studies are now designed to appreciate and minimise bias, have clear outcome measures and are of adequate sample size and power to be able to prove significance of findings. The stereotypical attitude that modern surgical research remains largely reflective and observational is as a result unjustified and should rapidly be dispelled. Academic surgeons in the twenty-first century have a strong moral and ethical responsibility that balances the rate at which their advances can be tested, yet the research they produce reaches a wider audience than ever before through electronic websites and research forums on the world wide web. This has an enabling effect not only within the profession, but also on patients who are better informed and more sophisticated in the choices they make than at any other point in history. Our challenge now is to treat a medically educated patient population who demands a more “personalised” health care service. The ultimate aim of research in surgery is to improve the health care of patients through the advancement of knowledge about surgical conditions, interventions, training and new technologies. Academic clinical institutions, research infrastructure, faculty and patients have allied aims (Fig. 1.1). Surgical journals, societies and websites fulfil a much wider role of providing the forum for surgeons to collaborate, learn from each other and disseminate their knowledge and experience. The ability of the academic surgeon to balance the activities of teaching, investigative research and patient care is key to a successful career in surgical research.
1
The Role of Surgical Research
3
Fig. 1.1 Close links between surgical research and clinical practice
Faculty Investigators & Senior investigators
Trainees
Associates
Universities
Infrastructure
Hospitals
Clinical Research Networks
Patients & Public
Clinical Research Facilities & Centres
Research
Research Projects & Programmes
Research Units & Schools
Research Governance Systems
Research Information Systems
Systems
1.3 Translating Surgical Research into Practice The practice of evidence-based medicine promotes the translation of high quality research into clinical practice. As a result, a very important responsibility is now placed on all types of research, with surgery being no exception. The ease with which research output can now be accessed by health care professionals and patients over the internet means that the influence research has on clinical practice is greater than at any other point in time. The importance that the surgical community itself places on research is well demonstrated by the fact that research is now actively encouraged at a very early stage in surgical training, and is a key part of most residency programmes, their equivalents and fellowships across the world. Examples of this can be seen in the Western World over the last two decades through the intercalation of surgical residency programmes with full-time research degrees. The United States was the first country to adopt the MD/ PhD dual degree, with a large number of medical schools across America now offering such combined programmes. The majority of these programmes, however, have historically focussed on medical as opposed to surgical research largely because the length of surgical training deters many candidates from a research degree, and also because the quality of medical research is deemed to be higher [8]. Despite this, there is now a
very important place in academic surgery for the dual degree in establishing research credibility and launching an investigative surgical career. In the UK, we have seen the emergence of the PhD as the most credible research degree to be undertaken in surgical training. While this has been less integrated when compared to the American dual degree programmes, a large number of motivated candidates have been allowed to undertake this full-time research degree outside of their standard training. Other academic degrees pursued by surgical trainees include the MD and MSc, and are of varying time intensity. With time, it is likely that the UK will adopt a more integrated academic and clinical surgical training programme through the proposed changes to the British system brought about by the “Modernising Medical Careers” (MMC) initiative [3]. Ultimately, all these translate to there being more resource available to undertake research and more encouragement to take part in it than in previous times, with the days numbered when the academic surgeon practiced little clinically, and the clinical surgeon practiced little academically. It is encouraging that the number of investigative surgeons looks likely to increase, yet it is still important to see how their work is likely to translate into clinical practice. A key aspect of surgical research has always been the publication of individual experience and the use of new techniques in the form of technical operative notes as well as case series. While the validity of the surgical case report or observational case series and the
4
true value this adds to clinical practice has been questioned by some [7], the nature of surgery as an extremely practical specialty means that there is clearly a role for this type of research in modern times. Historical examples of case series such as those published by Joseph Lister showing that carbolic acid solution swabbed on wounds markedly reduced the incidence of gangrene are remembered for their importance in advancing the antiseptic principles [13], but modern times have seen important developments in surgical technology. This is perhaps best illustrated with how laparoscopy originated and was taken up. Initially reported in 1901 by German surgeon George Kelling who used a cystoscope to perform “koelioscopie” by peering into the abdomen of a dog after insufflating it with air, this was soon followed by Jacobeus who published his first report of what he called “Laparothorakoskopie” [14]. From this point, a number of reports of laparoscopic procedures began to emerge, but mainly for diagnostic purposes. The advent of rod-lens optical systems and cold light fiber-glass illumination meant that laparoscopy could become more widespread mainly in diagnostic gynaecological procedures. In 1987, the French gynaecologist Mouret performed the first acknowledged laparoscopic cholecystectomy by means of four trocars. The rest is history, and over the last decade, laparoscopy has advanced tremendously to the point where in the Western World, laparoscopic cholecystectomy has almost eradicated its open counterpart from clinical practice [10]. An enormous amount of randomised and non-randomised comparative research comparing laparoscopy to open surgery for a range of procedure has since been published, with meta-analyses of these trials showing the vast improvement in postoperative recovery that laparoscopy provides the patient. Research on laparoscopy and its benefits is perhaps the best examples in modern times of how surgical technology and techniques can be identified through publication of individual experience, and then trialled in the context of high quality randomised clinical research, with meta-analysis finally being used to determine the pooled effect of a number of clinical trials. Looking to the future, this trend is set to continue, and we are seeing new technologies such as robotics [1], surgical microdevices [4] and natural orifice totally endoscopic surgery [16] emerging with the potential to transform surgery in a similar way. Randomised controlled trials have for a while been the pillar of clinical research, representing the highest quality of study in medical practice. The aim of these trials is to randomise patients in unbiased manner in an attempt to assess the effects of an intervention. In
O. Aziz and J. G. Hunter
surgical research, however, there are only a minority of studies that can achieve a valid randomisation scheme, because of not only the nature of surgical interventions, but also ethical dilemmas. Randomised controlled trials in surgical patients have as a result been much more difficult to perform when compared to the rest of medicine and especially pharmacological interventions wherein the placebo has enabled much of the required blinding to take place. There are, however, other reasons why randomised controlled trials are difficult to undertake in surgery. First, surgical disease often presents in patients who themselves are a very heterogeneous group and often older. For example, seeing the effect of a new drug on generally healthy young adults with essential hypertension is much more straightforward with regard to patient matching when compared to evaluating a surgical intervention such as renal transplant in older patients with end-stage dialysis-dependent renal failure. Second, the nature of the surgical intervention can in itself be heterogeneous, varying not only with the experience of the surgeon, but also with the experience of the institution. As a result, surgical multi-centre trials are often unable to account for the differences in the skill levels of different surgeons, either between centres or across a country, which makes the applicability of randomised controlled trials difficult when it comes to many surgical interventions [15]. These difficulties have had an important impact on the funding that support surgical research receives from funding agencies. It is often easier to see how a trial solving a basic science or pharmacological question may come up with a solution when compared to a surgical trial. Ultimately, however, it is the responsibility of the research community to face and try to solve the uncertainty of clinical surgical research by understanding the nature of disease and using new statistical tools to overcome the challenges of trial design. The use of metaanalytic techniques is perhaps one of the best examples of how these limitations can be overcome. Finally, to the practicing surgeon, the way in which all this knowledge is disseminated is of great importance. Surgical journals, societies and websites are excellent examples of academic collaboration, acting as places where people are able to exchange their ideas and compare outcomes. The impact that this type of activity has can be almost immediate, resulting in a change in a surgeon’s daily practice. The use of surgical websites to disseminate information and experience is a newer phenomenon that continues to be developed, and among the latest generation of surgeons, it is the most direct method of learning when and where they need them. Examples of existing resources include anatomy
1
The Role of Surgical Research
5
websites, operative guides, reference texts and formularies, which are covered later on in this text.
1.4 Challenges Faced by the TwentyFirst Century Academic Surgeon The first consideration for the surgical research community in the early twenty-first century is deciding what type of research surgeons should be undertaking and where. Surgical research has been traditionally divided into clinical and non-clinical, with the latter being predominated by the study of basic science. Clinical surgical research can be thought to be patientorientated, epidemiological, behavioural, outcomes related and health services research. Patient-oriented research itself can be divided into research on mechanisms of human disease, therapeutic interventions, clinical trials and development of new technologies. Whilst this classification aims to be as broad as possible, the emergence of new fields and advancements in biological sciences are hazing the boundaries between differing research types. Recent times have seen the emergence of “biomedical science” where a multi-disciplinary approach to research is adopted, combining both clinical and non-clinical themes. A prime example is biomedical engineering, which is a fusion of medical science, engineering, computer science and biochemistry. When combined, these specialties have led to a more free-thinking approach to research with transparency in the way that surgical ideas are shared. There is also an important synergy in the opportunities for
competitive grant funding, which are significantly higher with a combined approach. This trend has seen academic institutions across the world setting up biomedical institutes where medical and non-medical researchers work together. At Imperial College London, for example, a newly established Institute of Biomedical Engineering promotes this inter-disciplinary potential in biomedical research, aiming to be an international centre of excellence in biomedical engineering research. It encourages collaboration between engineers, scientists, surgeons and medical researchers to tackle major challenges in modern health care, using enabling technologies such as bionics, tissue engineering, image analysis and bio-nanotechnology. The institution is organised into “Research Themes” that are designed to attract major funding. These themes are managed by the Technology Networks, each headed by a committee drawn from the key researchers in the field from across the university. This trend is rapidly growing across the world, with similar institutes emerging, for example, from Oxford University, University of Zurich and University of Minnesota to name but a few. The establishment of a multidisciplinary approach to research has increased not only the depth and quality of surgical research, but also its appeal to a wider audience. Biomedical research is able to access a significantly larger amount of funding than clinical research alone. What is clear is that research tools are widely disseminated among the surgical community, and the research itself is probably best performed in focused institutions such as Academic Health Science Centres with proven academic pedigree and worldwide credentials (Fig. 1.2).
World class AMCs deliver against the three missions...
Patient Care
Teaching
Research
Talent ...through strength in core capabilities...
External partnerships Financial strength
Fig. 1.2 Characteristics of Academic Health Science Centres
...founded on an aligned partnership
ACADEMIC
Brand Infrastructure Operating processes
HOSPITAL
6
The second important consideration for the surgical research community is how it funds itself. When compared to other forms of medical research, surgery to date has not able to attract similar quantities of funds, which is partly due to a lack of influence exerted by academic surgical leaders to make the case for funding surgical over medical research. It has also been the case that randomised trials receive funding preferentially compared to non-randomised research, with the former being more challenging in surgery than medicine [15]. The importance of having surgeons as leaders of research on selection committees and grant awarding committees has been a problem that needs to be addressed. It is known that the National Institutes of Health (NIH), which is the major source of biomedical funding in the United States, conveys a much less welcoming attitude towards surgical research than other types of clinical or basic science research. There is evidence to suggest that since 1975, surgeons have been disproportionately less successful than researchers in other clinical disciplines in obtaining funding [11]. At the NIH, the principal decisions for peer review of research and selection for grant funding are made by a group of about 10–20 individuals with expertise in their given field. To date, there have been few study sections devoted to surgically oriented clinical research, with only two out of a hundred study sections present in which surgeons are even a reasonable minority of the committee members. It is not surprising, therefore, that in comparison with medical research departments, grant proposals from surgical departments are less likely to be funded, and if they do receive funding, this is likely to be a smaller amount. The most likely cause of this problem is that the surgical profession has failed to address the problem of developing and sustaining an adequate research workforce [15]. The third consideration for the surgical research community is training surgeons to take on the skills required to undertake and interpret their research. While we have discussed the role of formal research programmes in surgical training, on a more general note, there is a need to provide a structure, organisation and oversight to the research training that all surgeons receive. It is important to instil the scientific disciplines that form the cornerstone of all basic, translational or clinical surgical research into all surgical residents, registrars, fellows and consultants.
O. Aziz and J. G. Hunter
All surgical residency programmes should offer research education to its trainees in the form of biostatistics, bioethics, experimental design and principles of clinical research, including clinical trials and database management. Education of the surgical workforce as a whole is key to reaching a situation in which all involved in clinical surgical practice can read, listen and think critically with scientific discipline. Following are a number of possible organisational routes that have been described for training surgeon scientists: 1. A period of full-time clinical experience (usually 2 years of a residency or its equivalent) followed by 1–3 years of research, followed by another period of full-time clinical experience to complete the residency. This has historically been the most common way to undertake a full research degree. 2. A period of full-time clinical experience, followed by integrated research and clinical training to complete the residency. 3. A full clinical residency followed by several years of full-time research. In reality, the more integrated a research programme the better, as the aspiring surgeon scientist may then have opportunities to obtain a faculty position in the research department in which they are based, and initiate a research programme without a potentially disruptive 2–3 year lapse. Through this process, it is also desirable that the aspiring surgeon scientist be supported during their time of vulnerability to intrusions by clinical practice, teaching and administrative responsibilities. Finally, the academic surgeon faces an interesting dilemma. In devoting their time to academic research, academic surgeons sacrifice the time they would otherwise spend in active clinical practice, which is important in gaining experience and credibility amongst the surgical community. This is especially challenging because at the same time, the academic surgeon is also judged as a researcher by the research institution that employs them, namely through their research output and grants generated. It is simply a case of “publish or perish”, requiring a fine balancing act in maintaining excellence in both trades [12]. The best respected academic surgeons are often known to be excellent surgeons and successful academics.
1
The Role of Surgical Research
1.5 The Role of the Academic Surgeon in Teaching Surgical education, both at undergraduate and postgraduate levels, has traditionally been left to the academic surgeon to deliver, usually through their employing university. At all levels, delivering surgical education is an extremely important part of academic surgery and one that is extremely poorly rewarded at present. Academic surgeons are often not remunerated proportionately for the time they spend teaching, and the time they are allocated for this purpose often clashes with both their clinical and academic commitments. As a result, the standard of education delivered by academics varies dramatically from one individual to the next. In addition to this, most academics are judged by their research output (high impact journal papers, conference presentations) and the amount of grant funding they are able to attract. Teaching is very poorly rewarded, and as a result, academic surgeons want to spend as little time as possible delivering it. In the UK, all universities undergo a research assessment exercise (RAE) that determines how high quality their institution is compared to the rest of the world. This information is then used to determine the grant for research that the institution will receive from one of four government funding agencies. The process is a very arduous and thorough one, but does not reward teaching in the same way that it rewards research, contributing to the above-mentioned effect.
7
fashion that only the great forefathers of surgery were able to, and should play an active role in technology development, patenting and commercialisation. It is an environment where innovation is set to thrive, but only will do so if it is encouraged. The future of academic surgery ultimately lies in the hands of its leaders, namely the professors of surgery, heads of academic surgical departments and surgical department chairpersons, who have a responsibility to protect and support their young investigators so that they may set up productive programmes funded with peer-reviewed grant support. The most vulnerable time for these investigators is when they are within the first 5 years of completion of their training in which they require guidance, high-quality training, discipline and critical scientific rigor. Integrating an academic surgical career with a clinical one is also a key because one cannot exist without the other, and academic surgeons should be protected as they learn to juggle the responsibilities of a joint clinical and academic career. Finally, academic surgical programmes need substantial expansion into clinical trials through the development of fellowships in clinical research, lead clinical trial programmes, outcome studies, database research and quality improvement programmes. The future of academic surgery looks bright, but lies firmly in our own hands.
References 1.6 The Future of Surgical Research At the dawn of the twenty-first century, academic surgery finds itself at a very important crossroads. This is a time when academic surgeons are expected to be clinicians, surgeons, researchers, statisticians, educators and entrepreneurs generating the funds required to undertake their research with excellence. It is also a technological age where new advances are being made at a frightfully rapid pace, with the arrival of new surgical instrumentation and tools almost daily. The “technology push” that we face from the medical device sector is immense and largely financially driven, yet it should be seen as a real opportunity. Surgeons of today have the chance to turn into inventors in a
1. Berryhill R, Jhaveri JJ, Yadav R et al (2008) Robotic prostatectomy: a review of outcomes compared with laparoscopic and open approaches. Urology 72:15–23 2. Blalock A, Taussig HB (1984) Landmark article May 19, 1945: the surgical treatment of malformations of the heart in which there is pulmonary stenosis or pulmonary atresia. By Alfred Blalock and Helen B. Taussig. JAMA 251:2123–2138 3. Brennan PA, McCaul JA (2007) The future of academic surgery – a consensus conference held at the Royal College of Surgeons of England, 2 September 2005. Br J Oral Maxillofac Surg 45:488–489 4. Chang WC, Sretavan DW (2007) Microtechnology in medicine: the emergence of surgical microdevices. Clin Neurosurg 54:137–147 5. Ellis H (2008) The first successful gastrectomy. J Perioper Pract 18:34 6. Evans CH (2007) John Hunter and the origins of modern orthopaedic research. J Orthop Res 25:556–560 7. Horton R (1996) Surgical research or comic opera: questions, but few answers. Lancet 347:984–985
8 8. Jones RS, Debas HT (2004) Research: a vital component of optimal patient care in the United States. Ann Surg 240:573–577 9. Nau JY (2007) [A great humanitarian and surgeon: Ambroise Pare]. Rev Med Suisse 3:2923 10. Polychronidis A, Laftsidis P, Bounovas A et al (2008) Twenty years of laparoscopic cholecystectomy: Philippe Mouret–March 17, 1987. JSLS 12:109–111 11. Rangel SJ, Efron B, Moss RL (2002) Recent trends in National Institutes of Health funding of surgical research. Ann Surg 236:277–286; discussion 286–277
O. Aziz and J. G. Hunter 12. Souba WW, Wilmore DW (2000) Judging surgical research: how should we evaluate performance and measure value? Ann Surg 232:32–41 13. Tan SY, Tasaki A (2007) Joseph Lister (1827–1912): father of antisepsis. Singapore Med J 48:605–606 14. Vecchio R, MacFayden BV, Palazzo F (2000) History of laparoscopic surgery. Panminerva Med 42:87–90 15. Weil RJ (2004) The future of surgical research. PLoS Med 1:e13 16. Zehetner J, Wayand WU (2008) NOTES – a new era? Hepatogastroenterology 55:8–12
2
Evidence-Based Surgery Hutan Ashrafian, Nick Sevdalis, and Thanos Athanasiou
Contents 2.1
Introduction ............................................................
9
2.2
What Is Evidence?..................................................
10
2.3
Hierarchy of Evidence ...........................................
10
2.4
Definition and Values .............................................
11
2.5
Benefits of Evidence-Based Surgery .....................
11
2.6
History and the So-Called “Discord” Between Surgery and Evidence-Based Practice ..
12
2.7
Principles of Identifying the Evidence..................
13
2.8
Sources of Evidence................................................
14
2.9
Managing the Increasing Volume of Evidence ..............................................................
15
2.10
Practising and Delivering the Evidence ...............
15
2.11
Surgical Modelling and Treatment Networks......
16
2.11.1 Surgical Thinking and Evidence Synthesis .............. 2.11.2 Bayesian Techniques ................................................ 2.11.3 Qualitative Comparative Analysis (QCA)................
17 17 18
2.12
Surgical Decision-Making and Clinical Judgement Analysis ..........................
18
2.13
Cost Effectiveness in Evidence-Based Surgery ....
20
2.14
Surgical Training in Evidence-Based Techniques...............................................................
22
2.15
Ethics .......................................................................
22
2.16
Conclusion...............................................................
23
References ...........................................................................
25
H. Ashrafian () The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail:
[email protected]
Abstract Evidence-based surgery (EBS) involves the integration of the best clinical and scientific evidence to treat patients. Best evidence is derived from the research literature and can be categorised into a hierarchy of levels. Application of the knowledge derived from “best evidence” results in enhanced care for patients and also improved standards for surgeons and health care institutions. The sources of surgical evidence are discussed, and we review surgical training in evidence-based practice. Techniques for answering a surgical question using an evidence-based method and for practising surgery in an evidence-based environment are described. Furthermore, we examine the role of treatment networks, cost-effectiveness, evidence synthesis, surgical decision making and the ethics of EBS. For the current and future surgeon, evidencebased practice is now an inevitable and fundamental component of the surgical profession. Its universal adoption will play an important role in the advancement of patient care and surgery worldwide.
2.1 Introduction Surgery has traditionally been considered a craft wherein individuals adopted techniques didactically from their teachers and performed each operation in a particular way because “that is how it was taught” to them. Throughout history, surgical practice has been dependent on learning through one’s own mistakes or those of others. Although a handful of exceptions did exist, such as the testing of medical efficacy by the eleventh century physician Avicenna [1], it was not until the late twentieth century that the concept of evidence-based medical practice came into fruition [2]. The concept of evidence-based practice involves the integration of the best available evidence to treat patients.
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_2, © Springer-Verlag Berlin Heidelberg 2010
9
10
H. Ashrafian et al.
It has become an inevitability in surgery and is now a requirement of all modern health care institutions, surgeons and patients alike [3]. This chapter aims to clarify the role of evidence-based surgery (EBS) and to contextualise its part in current and future surgical practice. As Birch and colleagues [4] stipulated, “It is no longer acceptable for a surgeon to be estranged from the current literature – the demands of colleagues, licensing bodies, and patients necessitate satisfactory knowledge of the best available evidence for surgical care”.
2.2 What Is Evidence? The Oxford English Dictionary defines Evidence as “information or signs indicating whether a belief or proposition is true or valid” [3]. As surgeons, we already apply this definition to our daily practice, so when we assess patients clinically, we all rely on our ability to draw clinical evidence from clinical signs and investigations, much in the same manner as Hippocrates did two and half thousand years ago.
The use of the word ‘Evidence’ in the context of evidence-base has a more specialised designation. EBS involves “the systematic, scientific and explicit use of current best evidence in making clinical decisions” [5].
2.3 Hierarchy of Evidence The evidence used in EBS is derived from published scientific research. In order to allow for a comparative evaluation of research data, hierarchies of evidence have been developed, which rank evidence according to its validity. Initially randomised controlled trials (RCTs) were considered as providing the highest level of evidence as these were deemed to have more validity by decreasing research bias and analysis error when compared to single case reports and retrospective cohort reviews. More recently, however, the Centre for Evidence-Based Medicine at Oxford produced a hierarchy list of Evidence types (Fig. 2.1). Here, they classify non-experimental information at the lowest levels of evidence and randomised control trials at a much higher level. The highest level of evidence corresponds
l Systematic Reviews of multiple RCTs
ll Properly designed RCT of appropriate size
lll Well-designed trials such as non-randomised trials, cohort studies, time series or matched case-controlled studies
lV well-designed non-experimental studies from more than one centre or research group
Fig. 2.1 Hierarchy of evidence pyramid based on the levels advocated by the Oxford Centre for evidencebased medicine (May 2001) [6]
V Opinions of respected authorities, based on clinical evidence, descriptive studies or reports of expert committees
2
Evidence-Based Surgery
11
to evidence sources wherein data from multiple randomised trials are integrated and appraised in the form of meta-analyses and systematic reviews.
2.4 Definition and Values The broad definition of EBS has been stipulated as “the integration of • Best research evidence (clinically relevant research, basic science, relating to diagnosis, treatment and prognosis) With • Clinical expertise (skills and experience adapted to a particular patient) and patient values (patient preference and attitudes to clinical entity and its overall management)” [7]. This broad definition can then be divided into two subcategories [8]: 1. Evidence-based surgical decision making – in which best evidence is applied to an individual or a finite group of surgical patients. 2. Evidence-based surgical guidelines – in which best evidence is applied at an institutional or national/ international group of surgical patients. In order to apply best-evidence (Fig. 2.2), raw data need to be processed by a number of different types of knowledge. These include the following: • • • •
Knowledge from research (refined data or evidence) Knowledge of measurement (statistical methodology) Knowledge from experience (judgments and decision) Knowledge of practice (leadership and management)
Thus, in order to carry out an evidence-based action, one needs to process raw clinical information by all the four knowledge types. An example would be a patient presenting with right iliac fossa pain and leucocytosis.
RAW DATA
REFINED DATA (EVIDENCE)
STATISTICAL ANALYSIS
As a clinician, one would use knowledge from research and experience to perform further investigations and make a provisional diagnosis. These data would be contextualised with data from the published literature as to the accuracy of the diagnosis and the best course of action based on the knowledge of measurement. If a surgical indication such as the need for appendicectomy was deduced, the knowledge of leadership and management would need to be applied to ensure the progression to operative management.
2.5 Benefits of Evidence-Based Surgery It has been predicted that “applying what we know already will have a bigger impact on health and disease than any drug or technology likely to be introduced in the next decade” [9]. The correct application of this knowledge is, therefore, critical in the development of future health care strategies and treatments. This has led to the concept that “Knowledge is the enemy of disease”, a metaphor that is both intuitive and has been increasingly applied by National Knowledge Service of the United Kingdom’s National Health Service [10]. Adopting a similar aim, the American College of Surgeons has also introduced the continuous quality improvement (CQI) [11] committee (initially known as the Office of Evidence-Based Surgery) [12], to promote the highest standards of surgical care through the evaluation of surgical outcomes in clinical practice. There is, however, a large discrepancy between what we know and what we currently apply [9], and EBS is a method by which this discrepancy can be bridged, but also built upon to fundamentally improve surgical health care outcomes. Many so-called surgical standards and customs are based on little if no evidence. For example, the application of post-operative wound drains [8, 13] and nasogastric tubes [14] is largely determined by habit and surgical apprenticeship as opposed to best evidence.
DECISION MAKING & JUDGEMENT
LEADERSHIP & MANAGEMENT
KNOWLEDGE TYPES
Fig. 2.2 Application of knowledge in the process of transforming raw data into evidence-based action
EVIDENCE-BASED ACTION
12
H. Ashrafian et al.
Furthermore, part of the reason why such a gulf exists between knowledge and surgical application lies in the traditional philosophy of local and national health care policy. Here, the traditional modus operandi was defined by a practice that aimed to “minimise costs”, rather than actually address why we practice surgery in the first place, namely to benefit the patient. The future of health care, therefore, lies in the practice of a system that is built on the fundamental concept of value to patients (so called the “valuebased” system). This model of health care can improve patient outcomes, quality of life, satisfaction and also financial cost [15]. At the heart of this value-based system is the practice of evidence-based surgery, which will enable care to be targeted for surgical diseases and patients as opposed to the “old fashioned” concept of treating patients by the “speciality” of their surgeons. Here, there will be a focus on risk-adjusted outcomes to improve patient care and satisfaction (Fig. 2.3). Furthermore, such a system will allow the enhancement of surgical training (particularly in an era of decreased training hours), improve surgical satisfaction and empower surgeons through improved leadership, management and decision making (Fig. 2.3).
2.6 History and the So-Called “Discord” Between Surgery and EvidenceBased Practice The historical perception that surgeons were unable to successfully apply evidence-based practice is not completely true. Many of the developments that led to its introduction in the twentieth century were spearheaded by surgeons. Working on the ideas of British physician
Sir Thomas Percival (1740–1804), the American surgeon Ernest Codman (1869–1940) established the first tumour registry in the United States in order to followup patient outcomes, identifying the most successful aspects of tumour treatment. He was a notable health care reformer, and is credited for introducing outcomes management in patient care through an “end results system” by following-up patients and their outcomes, essentially creating the first patient health care database. He introduced the first mortality and morbidity meetings and contributed to the founding of the American College of Surgeons and its Hospital Standardization Program [16]. James Alison Glover (1930s UK) and later Jack Wennberg (1970s USA) revealed that mass adoption of tonsillectomies was unrelated to tonsillar disease occurrence, which lead to a change in surgical practice in keeping with actual disease rates [17]. The famed cardiovascular surgeon, Michael DeBakey reported on the overall surgical experience in World War II as he had been working for the Army Surgeon General’s Office. His work described injury incidence, surgical management and analysis of outcomes [18]. Although our modern concept of evidence-based medicine was first described by Scottish physician Archie Cochrane in 1972 [19], the formal adoption of “Best Evidence” occurred approximately 20 years later by Gordon Guyatt and David Sackett in the 1990s [2]. In the interim, however, some surgeons attempted to introduce evidence-based practice by publishing and acting on surgical outcomes through the Health Care Financing Administration (HCFA) [20]. Despite these significant contributions, surgeons have been criticised with statements such as “a large proportion of the surgical literature is of questionable value”, and that surgeons perform in a “comic opera” in
‘customized care’ Improved safety
Improved care
Higher quality research increased efficiency of care
Institution
Patient
Incresed transparency of results
Accountability Incresed patient awareness
Evidence Based Surgery
Improved teamwork and cooperation Shared responsibility
Consistent national operating standards improved national guidelines
Improved satisfaction
Decision-making based on applied evidence
Healthcare
Identification of local healthcare strengths
Surgeon
Improved education and training Increased satisfaction Enhanced surgical research
Fig. 2.3 Evidence-based surgery at the centre of a value-based health care system
2
Evidence-Based Surgery
which “they suppose are statistics of operations” [21]. These indictments to the surgical fraternity have come about as surgical research has historically relied upon publishing data that are deemed to be of the “weakest evidence”, based mainly on case series as opposed to randomised trials and meta-analyses. The root causes of these have been assessed in the literature [22]: • Historical reasons (many operations progressed by small steps without RCTs). • Patients do not want their operations to be selected by randomisation (patient’s equipoise). • Operative variation makes standardisation of a surgical arm on an RCT difficult. • Urgent and Emergency Surgery makes inclusion into an RCT difficult. • The learning curve effect creates difficulty in analyzing RCTs. • Many surgeons do not have an adequate grounding in medical statistics. • Surgical RCTs have traditionally been poorly funded, whereas drug and technology companies have been more forthcoming in funding their device or procedure. • Surgeons have traditionally been poor at adopting RCTs as they have surgical equipoise and selfassuredness – “my operation is better”. The role of EBS is not to encourage a wider use of RCTs alone, but more importantly to arrive at surgical treatments that are considered “best evidence”. This does not always require an RCT and can include a wide corpus of other data that can be interpreted to select the best treatment for each patient.
2.7 Principles of Identifying the Evidence In order to carry out EBS, four components need to be achieved, the so called “Four Steps of EBS” [7]: Finding the best evidence is an essential skill for all modern surgeons. One well-applied method of performing a search is by utilising the “PICO” [5, 23] technique developed at McMaster University, where evidence-based methods have been taught for almost 30 years. Empirical studies demonstrate that PICO use improves the specificity and conceptual clarity of clinical problems, allows for more complex search strate-
13
1. Formulate a question based on a clinical situation encountered in daily practice. 2. Do a focused search of the relevant literature. 3. Critically appraise the literature obtained to find the best evidence. 4. Integrate the information and act in accordance with the best available evidence. gies and results in more precise search results [5, 12]. When asking a PICO question, it is important to: P: Patient population – Group for which you need evidence I: Intervention – Operation or treatment whose effect you need to study C: Comparison – What is the evidence that the proposed intervention produces better or worse results than no intervention or a different type of intervention? O: Outcomes – What are the effects and end-points of the intervention? • Be specific in your question(s) • Prioritise your questions • Ask answerable questions A poor question would be: “Is coronary artery bypass grafting better than angioplasty?” A better question is: “In diabetic Asian women aged over 75 with chronic stable angina and three vessel coronary artery disease, does off-pump coronary artery bypass grafting compare more favourably than percutaneous coronary intervention in terms of long-term mortality and physical quality of life?” This latter question can be derived from our PICO table: The next step is to work through the finding evidence sequence (Fig. 2.4). P – patient I – intervention C – comparison O – outcomes
Asian women aged over 75 years with diabetes Off-pump coronary artery bypass Percutaneous coronary intervention Long-term mortality and Quality of Life
14
H. Ashrafian et al. Do Guidelines Already Exist?
Formulate a PICO question
Own hospital Local National International
Do Systematic Reviews Exist in relevant databases?
Do studies exist that answers the PICO question? Examples include: Treatment Comparison - Randomized Control Trial(s) Disease Diagnosis - Prospective Study Treatment Prognosis - Cohort Study Treatment Harm - Case series or Case report Cost benefit - Economic Study Quality of Life - Questionnaire or Qualitative Study
Cochrane Library Health Technology Reviews PubMed
Fig. 2.4 Sequence of finding evidence in EBS based on McCulloch and Badenoch [24]
2.8 Sources of Evidence Sources of evidence can be selected from any of the categories of the “hierarchy of evidence” mentioned earlier. While traditionally these would have been discerned only from printed journal papers, many sources are now invariably available online through local intranets and the wider internet. These sources (Fig. 2.5) can be categorised as primary sources or journal articles (which exist in large number) or secondary sources (which exist as a large variety). Primary sources provide direct evidence or data concerning a topic under investigation, whereas secondary sources summarise or add commentary to primary sources. As depicted in the figure, there is a hierarchy of secondary sources to match the hierarchy of evidence
(Fig. 2.1). Thus, for example, textbooks are on the first run on the secondary sources pyramid, whereas the Cochrane Database of Systematic Reviews is at the peak of the source pyramid as it reveals information deemed to come from the highest level of evidence. By far the most frequent database used by surgeons is PubMed [26], a free internet-based search engine for accessing the MEDLINE database of citations and abstracts of biomedical research articles. Here the vast majority of citations are from 1966 onwards, although they do reach back as far as 1951. PubMed is not the only search engine used, and many surgeons may start a search using Google Inc. search engines such as standard Google [27] or Google Scholar [28]. Alternatively, they may go straight to a government guideline website or even a systematic review database.
Systematic Reviews: Cochrane Library, DARE
Secondary Sources
Critically Appraised Topics (CATs); CAT Crawler, Clinical Evidence Database, Institution-specific CATs, National/ International Guidelines
Critically Appraised Articles: ACP Journal Club, Trip Database
Textbooks
Fig. 2.5 Hierarchy of the sources of evidence in EBS based on McCulloch and Badenoch [24] and the University of Virginia, Health Sciences Library [25]
Primary Sources
Journal Articles: Medline, EMBASE (pubMed, Ovid, Silver Platter, Dialog/Datastar
2
Evidence-Based Surgery
When reading the evidence, it is important not to lose sight of scientific rationality, so as to acquire best evidence for best patient outcomes. The concepts of information validity need to be rigorously assessed. These include: • • • •
Bias Sample and study sizes Methodology Relevance
2.9 Managing the Increasing Volume of Evidence In 2000, the non-profit medical research organisation OpenClinical produced a white paper entitled “The medical knowledge crisis and its solution through knowledge management” [29]. Here they recognised that “few busy doctors have the time to do the reading that is necessary and, even if they are assiduous in their reading, the imperfect human capacity to remember and apply knowledge in the right way at the right time and under all circumstances is clearly a critical limitation”. As a result, they specified a number of fundamental points as follows: • It is now humanly impossible for unaided health care professionals to possess all the knowledge needed to deliver medical care with the efficacy and safety made possible by current scientific knowledge. • This can only get worse in the post-genomic era. • A potential solution to the knowledge crisis is the adoption of rigorous methods and technologies for knowledge management. In order to combat this “information overload”, a number of governmental bodies and medical groups propose knowledge management methods to enable clinicians to cope with this large volume of data [10, 22, 27]: • Clinicians may need to become more superspecialised – to have an intimate knowledge of their own field. • Multi-disciplinary teams, conference attendance and postgraduate teaching can help in the spreading of the most “important” evidence. • Local, National and international guidelines can be an easy-to-reach source of best evidence.
15
• Increased use of free-to-use governmental and institutional internet “web-portals” will make best evidence easily accessible. • Encouraged communication with interactive medical media and medical library facilities to facilitate evidence-based searching and practice. To achieve universal application of these evidencebased knowledge management principles, there is a general requirement for the whole surgical fraternity to adopt an “evidence-based culture”. This requires the support and acceptance of evidence-based teaching and practice at all levels of surgery, whether in medical schools, national hospitals, private practice or academic surgical units. As a result, individual clinicians can be constantly informed with the most up-to-date knowledge, whilst also ensuring that future generations of surgeons will be satisfactorily educated to achieve successful outcomes using the best evidence.
2.10 Practising and Delivering the Evidence The traditional practice of EBS was centred on individual clinicians taking responsibility for their own evidence-based practice. This would take place through required incentives to complete optional medical education accreditation for professional appraisals or awareness of either local or national guidelines. This optional adherence to evidence-based practice has led to a wide variability in its application [5]. It has recently been proposed that “Apart from health care professionals, the health care system itself and its influence on the delivery of care need to be considered” [5]. Measuring performance intrinsically allows the setting of a standard, which can then be compared against another standard, permitting a so-called system of “benchmarking” to be introduced. If the benchmarking measures are widely accepted, improved techniques in evaluating volume-output relationships and health inequality data can reveal important clinical outcome factors that can be improved upon. Implementing these changes requires the involvement of senior leadership, health care management and hospital boards to advance and promote the evidencebased culture. This will facilitate innovative organisational design and structure whilst benefiting from information management and technology [30].
16
H. Ashrafian et al.
2.11 Surgical Modelling and Treatment Networks
In order to create a visually representative model of comparing treatments that accounts for the last two conjectures, Salanti et al. [32] have devised a “geometry of the network”. Here, they specify that “network’s geometry may reflect the wider clinical context of the evidence and may be shaped by rational choices for treatment comparators or by specific biases”. In order to create a visual representation of all studies for a specific disease, there needs to be consideration for two factors: Diversity – The number of treatments for a specific condition. Co-occurrence – the relative frequency where two specific treatments have been compared against each other. Applying these factors, it is possible to represent all of the evidence for a specific condition (Fig. 2.6). Here, a line between two dots can represent a comparison of two treatments. The thicker the line, the larger the number of studies comparing the treatment (co-occurrence).The larger number of lines in a diagram implies increased diversity. An example would be the pharmacological treatment of LUTS – lower urinary symptoms (Fig. 2.6a). Here the centre of the image or “star” is placebo, to which all the treatments a–g are compared. The lines for a (a-Adrenoceptor antagonists) and b (5-a-Reductase inhibitors) are thicker than g (Antimuscarinics) and f (PDE5 inhibitors) as the former two are the subject of more studies as they are older drugs for this indication. Accordingly, lines g and f are in turn thicker than c, d and e, as these later
When studying the surgical evidence, most systematic reviews and meta-analyses concentrate on studying one procedure in isolation or by comparing one treatment to another. This methodology yields an incomplete view of the treatments available to treat each condition, and does not deliver a holistic comparison of all the treatments available to treat each specific surgical disease [31]. The corpus of different treatments available for each condition can be considered to comprise a “treatment network”, where each treatment can be represented in a common frame of reference against all the others. Recently, mathematical techniques such as “multiple treatment” comparisons and “mixed treatment metaanalysis” have been introduced to compare the data for medical treatments and interventions within such networks [16, 29]. Applying these techniques to study networks has resulted in two conjectures: 1. Any representation of the evidence network needs to account for the fact that all the treatments have not been studied equally. 2. Any representation of the evidence network needs to account for the fact that there are varying amounts of published data for different treatment comparisons.
a
b b
a b
a
c
c
h
g
f e
f
d e
Fig. 2.6 (a) Treatment Network Geometry for the pharmacological treatment of LUTS – lower urinary symptoms. a: a-Adrenoceptor antagonists, b: 5-a-Reductase inhibitors, c: Luteinizing hormone releasing hormone antagonists, d: b-3-adrenoceptor agonists, e: Vitamin D3 agonists, f: Antimuscarinics, g: PDE5 inhibi-
d
tors, h: placebo. (b) Treatment Network Geometry for the laparoscopic surgical treatment of morbid obesity. a: vertical banded gastroplasty, b: gastric banding, c: sleeve gastrectomy, d: roux-en-y gastric bypass, e: duodeno-jejunal bypass, f: no surgery
2
Evidence-Based Surgery
three are much newer drugs and have only a few studies considering their use. Another example would be that for the laparoscopic surgical treatment of morbid obesity Fig. 2.6b. Here, one can see that there are many studies comparing a (vertical banded gastroplasty), b (gastric banding) and d (roux-en-y gastric bypass) to no surgery f. However, there are also many studies purely comparing b (gastric banding) and d (roux-en-y gastric bypass) alone in the absence of a comparison to f (no surgery), leading to a non-star shape. As c is a newer procedure, it is more commonly compared to established treatments such as b (gastric banding) and d (roux-en-y gastric bypass) and less so with f (no surgery), hence the thinner line of c–f. Being a much older procedure, e (duodeno-jejunal bypass) is not commonly performed laparoscopically, and hence, has not been extensively compared to the other laparoscopic procedures, explaining the thin line e, f. The application of these geometric networks to surgery allows individuals to visualise in one diagram the overall treatment evidence for a particular condition. This empowers each individual surgeon to assess specific studies within the context of all known treatments for a condition. Furthermore, it can also allow for mathematical applications to place values on the strength on the levels of evidence for each treatment.
17
combine multiple sources of quantitative data from trials of heterogeneous quality and design (Fig. 2.7). Furthermore, these techniques can also allow for qualitative data to be included in these mathematical analyses. This technique, therefore, adds a new paradigm shift in the inclusion of traditional psychosocial and economic research to traditional trial evidence in order to allow for a totally holistic data analysis. The role of evidence synthesis is increasingly applied in the assessment of health care outcomes and technology, and now has an expanding role in The National Health and Medical Research Council of Australia [33] and the United Kingdom’s National Institute for Health and Clinical Excellence [34]. Many of the recent advances of Evidence Synthesis in Health Care assessment have been as a result of the application of Bayesian statistical theory, where the mathematical techniques allow the supplementation and enhancement of conventional meta-analysis. Other techniques include Qualitative Comparative Analysis wherein qualitative data can be modelled into mathematical values to facilitate knowledge comparison and analysis. These methods account for the study quality in evidence synthesis, as they can incorporate a broader body of evidence to support decision-making, and therefore, successfully address many analytical problems in EBS. These include baseline risk effects, study heterogeneity and indirect comparisons.
2.11.1 Surgical Thinking and Evidence Synthesis
2.11.2 Bayesian Techniques
As previously described, a good method of attaining good quality quantitative evidence for EBS is the application of meta-analysis and systematic reviews to integrate the results of high quality clinical trials. However, not all surgical evidence is in the format of high quality randomised trials, and some surgical questions may never be practical enough or ethically appropriate to obtain randomised controlled data. As a result, applying traditional methods of comparing trials by meta-analysis may not always be possible, and furthermore, publication of data based on these techniques alone can lead to a bias in the literature (focusing only on data that can only be accrued for RCTs). In order to accommodate for this lack of data whilst also applying the concept of Best Evidence, a technique has been introduced known as “Evidence Synthesis”. Here, statistical models are employed to
The perceived current “gold standard” of evidencebased research is information in the form of Randomised Control Trials or Systematic reviews appraising them. These study designs, however, are based on a “frequentist school”, where the results are expressed as a probability value (p value) that is extracted from a model whereby an infinite number of probabilities can occur on a distribution with an unknown estimated effect value that can only be expressed with confidence intervals [35]. Another approach is the Bayesian one, where all the analysis work is on known data and a prior belief which yields a credible value near the sample mean. This is now being increasingly used in medicine and is defined as the “explicit quantitative use of external evidence in the design, monitoring, analysis, interpretation and reporting” of research.
18
H. Ashrafian et al.
Quantitative Observational Studies
Randomized controlled trial (RCT)
PROSPECTIVE cohort or Case-Control Study
RETROSPECTIVE Cohort or Case-Control Study
Other Studies
Case-Series or Case-Reports
Anecdotal Evidence
Unpublished data
Surveys and Policy Data
Qualitative Data
Evidence Synthesis
Decision
Fig. 2.7 Evidence synthesis and the numerous sources that it can integrate and analyse
In order to calculate a probability using Bayesian Statistics (Fig. 2.8), there are the following parameters [36]: 1. Prior distribution – the probability of a parameter based on previous experience and trial data 2. Likelihood – probability of a parameter based on data from the current research study or trial. 3. Posterior distribution – the updated probability of a parameter based on our observation and treatment of the data. These techniques have proven to be particularly useful in [37]: • • • •
Grouped meta-analysis (cross design synthesis) Cost-effectiveness studies Comprehensive decision modelling Decision analysis
2.11.3 Qualitative Comparative Analysis (QCA) This method employs the use of truth tables to categorise qualitative data into mathematical values. This
requires the setting of thresholds to classify a finding as either positive or negative (binary 1 or 0). Based on these thresholds, studies can be classified into scores and can be assessed by Boolean algebra [37]. An example is given in Table 2.1. In this hypothetical set of manuscripts the Boolean equation follows that D = A + B + C. Thus, it can be seen that surgical errors can result from any or all of: poor communication, distractions in theatres and inexperienced operators.
2.12 Surgical Decision-Making and Clinical Judgement Analysis The main reason to perform EBS is to improve the quality of care and outcome of our patients by modifying our surgical judgements and decisions. According to a well-known surgical aphorism, “a good surgeon knows how to operate; a better surgeon knows when to operate; the best surgeon knows when not to operate” [39]. Appropriate clinical judgement and decisionmaking skills are considered to be of paramount importance in surgery, and hence, in the UK, they have
2
Evidence-Based Surgery
19
Fig. 2.8 Bayesian statistics Prior Distribution (resonable opinion excluding trial evidence)
Bayes’ Theorem:
Posterior Distribution
Posterior Probability = Prior × Likelihood × Constant
(Final Opinion)
Likelihood (trial evidence)
Table 2.1 Hypothetical truth table showing causes of surgical error Conditions (explanatory Surgical error Number of variable) (dependent reports variable) A
B
C
D
0
0
0
0
37
1
0
0
1
12
0
1
0
1
24
0
0
1
1
19
1
1
0
1
8
1
0
1
1
26
0
1
1
1
14
1 1 1 1 21 A = poor communication; B = distractions in theatres; C = inexperienced operators. Table format based on [38]
recently been explicitly included in the Intercollegiate Surgical Curriculum Project [26]. Surgical judgement and decision-making can range from very well-defined levels, with relatively narrow range of options (e.g. to use one surgical forceps over another during an operation), to less well-defined situations in which surgeons need to consult their patients before reaching a decision (e.g. whether to offer surgery to a patient given the stage of his/her disease and lifestyle factors) [28, 36]. Regardless of the level of
complexity of the situation, from a psychological perspective, optimal surgical judgment encompasses three components: Experience, Evidence and Inference (Fig. 2.9). Surgeons make judgements and decisions on the basis of available information and their interpretation of it. The gathering and processing of the information are cognitive processes, which are open to influences from both our “cognitive architecture” and also the external environment (Fig. 2.10). Simply put, we examine our patients and gather relevant diagnostic information (e.g. blood tests, laboratory findings, etc.). Each one of these pieces of information is a “cue”, which we use to form a clinical judgement; this judgement will then lead us to make a decision (e.g. to treat or not, whether to offer a laparoscopic or open procedure). In the process of forming a judgement, these cues are weighted – although we are usually not consciously aware of this weighting process (unless we are dealing with a very difficult decision). Our final decision is driven by the weightings of the individual cues that we have considered and also the influences of internal cognitive factors (e.g. our inherent limitations in processing large quantities of information) and external environmental influences (e.g. the time we have to see a patient in clinic or in a ward round) [28]. Psychologists have developed models that explain how this integration of the various cues works. A model of particular relevance to surgery is that
20
H. Ashrafian et al. Bayesian Pros - facilitates decision-making in the absence of rigorous data Inference
Cons - depends on assumptions and is subjective
Surgical Judgement Anecdotal Pros - can answer undefined elements Cons - affected by selective memory and ego
Frequentist Pros - applies rigorous probabilty data from large patient cohort
Experience
Evidence Cons - may not represent specific individual case
Fig. 2.9 Components of judgement. Based on Marshall [27]
Information Cue A (e.g. based on blood test) − given an importance weighting of x
Information Cue B (e.g. clinical finding) − given an importance weighting of y
Surgical Judgement leading to Decision Time Resources Physical discomfort Personal risk
Environmental factors
Cognitive factors
Fatigue Anger Competitiveness Guilt
Fig. 2.10 Factors influencing judgements leading to decisions
developed by Egon Brunswik and known as Social Judgement Theory – with its quantitative application, Clinical Judgement Analysis [40–43]. Social Judgement Theory treats surgical (or any other) judgement as a linear multiple regression model. Different cues are considered and assigned weights by the surgeon – thus, the relative importance for each cue can be algebraically estimated and surgeons can be classified into different subgroups, depending on the importance they assign to different cues: this reveals how different surgeons approach a clinical decision. This is important in EBS as it can allow judgements to be assessed before and after teaching and training in the adoption and application of best evidence. Clinical Judgement Analysis has been used in a number of surgical studies. It has been used to clarify how demographic and lifestyle factors impact the prioritisation decision for patients due to have elective general surgery [44] or cardiac surgery [45], what clinical factors urological surgeons consider when deciding the treatment of prostate cancer [46] and how expert
nephrologists diagnose non-end stage renal disease [47]. Importantly, this approach allows quantitative feedback to be provided to individual surgeons as a training intervention to improve their clinical decision-making [5, 23, 48].This is done by assessing personal decisions and the breakdown of individual weights (i.e. importance) given to each information cue. These results can then provide feedback and be modified for each clinician to achieve results that are based on the best evidence.
2.13 Cost Effectiveness in EvidenceBased Surgery It is becoming increasingly evident that although EBS can reveal the best treatments for a specific disease, the provision of these treatments may not be possible, particularly from a financial standpoint. In order to supply patients with the best care, both “best treatment” and “available funds” need to be considered. Achieving a
2
Evidence-Based Surgery
balance between these two factors can be complex and involves not only clinicians, but also health care management, economists and politicians. Economic considerations have traditionally been poorly represented in the evidence-based literature, for example, a systematic review examining the cost effectiveness of using prognostic information to select women with breast cancer for adjuvant chemotherapy only revealed five published papers in the field. Health care costs are now of utmost importance in today’s complex financial markets. In the United States alone, medical care consumes more than 14% of the gross domestic product [49], which could increase to 17.7% by 2012 [50]. Despite the constant rise in new medical treatments, it is now widely recognised that “health interventions are not free, people are not infinitely rich, and the budgets of health care programmes are limited. For every dollar’s worth of health care that is consumed, a dollar will be paid. While these payments can be laundered, disguised or hidden, they will not go away [13]. In order to incorporate economic considerations in evidence-based guidelines, cost-effectiveness analyses (CEAs)are now being utilised and are gaining increased importance in medical guidelines at all levels (local, national, international). CEAs can reveal the expected benefits, harms, and costs of adopting and translating a clinical recommendation into practice [51]. It is a tool that allows decisions to be made within the realistic constraints of health care budgets. This takes place through the expression of a cost-effective ratio where the differences in the cost between two interventions are divided by the difference of the health effects or outcomes of these interventions [52, 53]. C1–C2/O1–O2 C=Cost, O=Outcome The cost C is measured by: C = cost of intervention + cost induced by the intervention – costs averted by the intervention Outcomes O can be measured by: Life-years saved (LYS) = amount by which an intervention reduces or mortality or Quality-adjusted life years (QALY) = effect on an intervention on both loss and quality of life. Cost-effectiveness analyses that use QALYs are termed cost utility analyses (CUAs) and have become increasingly important as the incorporation of quality
21
of life in the assessment better reflects clinical reality and clinical decision-making. Other economic analyses applied to health services include cost-minimisation analyses and cost-benefit analyses (CBAs), although in contrast to CEA’s, they do not have the capacity to compare the value of cost compared to clinical outcomes [53]. CEA’s provide valuable information for developing and modifying health service interventions and preventative measures to obtain the best care with the best value. They can be used to compare the costs and benefits of various interventions for the same pathology or disease (for example colorectal screening by examining occult blood tests, barium enemas or colonoscopies). Furthermore, they can clarify the cost-benefit of which intervention is appropriate for: • Specific population subgroups (e.g. Off-Pump vs. On-Pump coronary artery bypass grafting in patients with renal dysfunction) • Specific population age (breast screening by mammography between the age of 50–70) • Various treatment frequencies and times (e.g. PAP testing for cervical neoplasia every 3 years) The use of QALYs as an outcome measure in CUAs has shown particular benefits in accounting for patient preferences for some health conditions over others. For example, although numerous trials report the effectiveness of tamoxifen in improving morbidity and mortality in breast and endometrial cancer patients, its effects on perceived health status in these different conditions vary. CUAs allow for such variation and provide policymakers with data that reflect both financial suitability but importantly population needs and preference [53]. Arguments against the use of CEAs include: • A historical lack of standardised CEAs making comparisons difficult • A paucity of studies • A lack of transparency in the complex models applied • QALYs being non-intuitive • Ethical concerns (for example is a year of life saved or QALY for a 70 year old equivalent to that for a 1 year old? Or the perception that CEAs can be used as tools for “rationing” in health care.) Financial considerations are nevertheless inevitable, and there are a number of considerations that can allow
22
best use of CEAs in evidence-based practice. These include the following [54]: • Consideration of resource use and not monetary valuesalone. • Consideration of the specific context of an intervention and the resources needed. • Applying a broad perspective, particularly at national and international levels. • Consideration of the quality of evidence and the quantity of resource expenditure. • Applying up-to-date economic models. CEAs, therefore, are a powerful tool in selecting evidence-based interventions and protocols that are best suited to the budgets restraints of health care institutions, while also accommodating the preferences of both clinicians and patients. As a result, CEA scores can be classified by a league table which permits the selection and prioritisation of treatments either locally within a health care institution, or at a broader national or international level.
2.14 Surgical Training in Evidence-Based Techniques In order to adhere to an evidence-based culture, training programmes in EBS are essential. When teaching this topic, it is important not only to teach the techniques of evidence searching (as above), but also to contextualise the whole process to make sense to the individual user at an individual institution. Principles of teaching these processes include the following [28]: • Compiling a list of sources of evidence • Identifying the influences on the decision-making and the role of evidence • Applying appropriate levels of evidence base for decisions in context • Discussion of the tactics on how to acquire evidence at an appropriate level • Discussion of implementation strategies To successfully fulfil all these steps, the teaching needs to be an interactive process. Ideally, it requires pairing of junior and senior clinicians together to allow mutual insight to be discerned from each others’ clinical experience. Furthermore, it is vital that a
H. Ashrafian et al.
variety of “open-minded” teaching methods are applied to facilitate the learning process. These include brainstorming, role-playing and adoption of a variety of multimedia tools. In these situations, some individuals should be chosen to help lead the brainstorming and minute-keep the conclusions of each individual pair or group, so as to spread the knowledge to a wider teaching group [28]. Toedter et al. [55] have designed and implemented an EBS teaching schedule to enable all the residents in their programme to develop and refine their EBS skills in a context as close as possible to that in which they will use EBS in their clinical practice. To achieve this, they are given a clinical question (something they might very well be asked by an attending surgeon during rounds) and are asked to demonstrate their competence in finding the best available evidence to answer it. They apply a multi-disciplinary collaborative approach to address “the four steps of EBS”, and each EBS group includes a resident or registrar (junior clinicians), attending surgeon or consultant (senior clinician) and medical school or university librarian. In this context, the senior clinician would lead the formulation of an evidence-based question and also integrate the information in accordance with the best available evidence. The resident or registrar would act as a research coordinator to critically appraise the literature to find the best evidence, and the medical librarian would lead the focused search on the relevant literature. It was demonstrated that evidence-based performance of a resident was related to his or her ability to gather the best evidence in answer to a clinical question (P = 0.011). However, it was also revealed that after additional training, the residents improved their evidence-based skills. It can be concluded that these skills can no longer be limited to academic surgeons, but are to encompass all surgeons universally. Evidence-based concepts will necessarily be required at all levels of surgical education; beginning at the formative years of medical school continued to the end of surgical practice.
2.15 Ethics At a cursory level, EBS seems very straightforward from an ethical perspective; using “best evidence” is literally the optimum strategy to use for our patients
2
Evidence-Based Surgery
Fig. 2.11 Ethics in evidencebased surgery and research based on Stirrat [56] and Burger et al. [57]
23
Informed Consent
Challenges of Distributive Justice
Ethics in Evidence−Based Surgery and Research
What is the most appropriate research design
Patient When should a procedure be formally evaluated?
and applying anything less could be considered as suboptimal. However, with a more in-depth analysis, a number of ethical questions arise when studying the processes involved in EBS (Fig. 2. 11). Two broad concepts need to be addressed: • What qualifies a surgical procedure or technique to have “sufficient” evidence for use? • At what point of introducing a new procedure/technique are we protecting our patients or exposing them to an unknown risk? To answer these questions, the fundamental principles of medical ethics [58, 59] need to be considered. These include beneficence, non-maleficence, autonomy, justice, dignity and truthfulness. Topics arising specifically in EBS [60, 61] (Fig. 2.11) cover issues in informed consent where patients need to be clearly aware of when an operation is for research, based on evidence or based purely on tradition. Furthermore, the reasons for the use of this operation by the responsible surgeon performing it need to be clearly identified. Whether the operation is experimental, new or well-practised, the risk benefit to the patient needs to be specified. If the operation is selected on evidence-based grounds, then the level of evidence needs to be communicated to the patient in a way that he or she understands. Surgeons need to specify the research design of the evidence and discuss its appropriateness. Both surgeons and patients have their own equipoise, and their awareness of surgical choice based on personal biases should be negated in favour of the best evidence and objectivity. There also remains the issue of ethics in a health care world of limited finances and the challenges of distributive justice. What is the cost-effectiveness of
Equipoise
Surgeon
each procedure? Should there be rationing in health care on evidence-based grounds? And how does one address the situation in which the best evidence alludes to an expensive treatment that cannot be afforded by some communities? These considerations should be made by individual surgeons, and also by surgical institutions at both national and international levels. The concept of EBS is to ensure that each patient is treated along the grounds of best knowledge. Application of ethics to this evidence adds compassionate morals to the evidence-based decision-making, which in turn leads to the best possible humane care for patients.
2.16 Conclusion EBS is no longer only about doing randomised control trials and for senior academic surgeons. It is for all surgeons, their colleagues and their patients. It works on the principles that best surgical practice is achieved through best surgical evidence. It is now inevitable and has the potential to address all the primary needs of our patient-oriented surgical practice, namely: • • • • •
Patient management Patient care Patient safety Patient outcomes Patient satisfaction
The steps required from deciding what evidence to find and how to implement changes to best reflect this evidence are illustrated in Fig. 2.12. Many of these steps include clear and logical questioning, dedication in
24
H. Ashrafian et al.
Surgical Question
Collect Information /Evidence
Problem Clarification
Identify need for change
Surgical Research
Development of Treatment Guidelines:
• Reflect on Similar Questions • Rational use of Technology
Compare performance with Standards
Goal orientated Evidence Synthesis
• Local • National • International
Ethics
Budget
Values
Judgement Analysis for Optimal Decision
Local context
Preferences
Monitor effect Decision
Ensure adequate training & education
Implement Change
Fig. 2.12 Evidence-based surgery algorithm
pursuit of excellence and ultimately a culture in which best-evidence is not a bonus, but rather a fundamental requirement. This cannot be done simply by individuals, but requires teamwork at all levels of health care, from local to national and international. Furthermore, EBS is not a one-way process, but requires reassessment, revision, repeated searches and constant updating to reflect the new advances in surgical evidence. As surgeons, it is not only our duty to contribute to the momentum of evidence-based practice, but actually a
necessity for us to lead in many of these strategies. This requires universal training and re-training in evidence-based methods to reach a level of understanding that would place best-evidence at the heart of our surgical careers. For the vast majority of surgeons worldwide, the traditional concept of hand-hygiene before operating is now intuitive and “second nature” to them. For the next generation of surgeons, it would also be ideal to consider “best evidence” instinctively before coming to make any surgical decision.
2
Evidence-Based Surgery
References 1. Brater DC, Daly WJ (2000) Clinical pharmacology in the Middle Ages: principles that presage the 21st century. Clin Pharmacol Ther 67:447–450 2. Evidence Based Medicine Working Group (1992) Evidencebased medicine. A new approach to teaching the practice of medicine. JAMA 268:2420–2425 3. Darzi A (2008) High quality care for all: NHS next stage review final report. Department of Health, London 4. Birch DW, Eady A, Robertson D et al (2003) Users’ guide to the surgical literature: how to perform a literature search. Can J Surg 46:136–141 5. Jacklin R, Sevdalis N, Darzi A et al (2008) Efficacy of cognitive feedback in improving operative risk estimation. Am J Surg 197:76–81 6. Oxford Centre for Evidence-based Medicine (2001) Levels of evidence. Available at: http://www.cebm.net/index.aspx? o = 1025 7. Sackett DL, Straus SE, Richardson WS et al (2000) Evidence-based medicine: how to practice and teach EBM. Churchill Livingstone, London 8. Eddy DM (2005) Evidence-based medicine: a unified approach. Health Aff (Millwood) 24:9–17 9. Pang T, Gray M, Evans T (2006) A 15th grand challenge for global public health. Lancet 367:284–286 10. NHS (2008) National knowledge Service (of the National Health Service, United Kingdom). Available at: http://www. nks.nhs.uk/ 11. American College of Surgeons (2008) Continuous quality improvement. Available at: http://www.facs.org/cqi/index.html 12. Jones RS, Richards K (2003) Office of Evidence-Based Surgery: charts course for improved system of care. Bull Am Coll Surg 88:11–21 13. Eddy DM (1992) A manual for assessing health practices and designing practice policies: the explicit approach. American College of Physicians, Philadelphia 14. Nelson R, Edwards S, Tse B (2007) Prophylactic nasogastric decompression after abdominal surgery. Cochrane Database Syst Rev:CD004929 15. Porter ME, Teisberg EO (2007) How physicians can change the future of health care. JAMA 297:1103–1111 16. Neuhauser D (1990) Ernest Amory Codman, M.D., and end results of medical care. Int J Technol Assess Health Care 6:307–325 17. Wennberg J (2008) Commentary: a debt of gratitude to J. Alison Glover. Int J Epidemiol 37:26–29 18. DeBakey ME (1947) Military surgery in World War II: a backward glance and a forward look. N Engl J Med 236: 341–350 19. Cochrane AL (1972) Effectiveness and efficiency: random reflections on health services. Nuffield Provincial Hospitals Trust, London 20. Kouchoukos NT, Ebert PA, Grover FL et al (1988) Report of the Ad Hoc committee on risk factors for coronary artery bypass surgery. Ann Thorac Surg 45:348–349 21. Horton R (2004) A statement by the editors of the lancet. Lancet 363:820–821 22. McCulloch P, Taylor I, Sasako M et al (2002) Randomised trials in surgery: problems and possible solutions. BMJ 324:1448–1451
25 23. Jacklin R, Sevdalis N, Harries C et al (2008) Judgment analysis: a method for quantitative evaluation of trainee surgeons’ judgments of surgical risk. Am J Surg 195:183–188 24. McCulloch P, Badenoch D (2006) Finding and appraising evidence. Surg Clin North Am 86:41–57; viii 25. University of Virginia Health System Navigating the maze: obtaining evidence-based medical information (2009) Available at: http://www.hsl.virginia.edu/collections/ebm/overview.cfm 26. ISCP (2005) The New Intercollegiate Curriculum for Surgical Education. Intercollegiate Surgical Curriculum Project, London. (http://www.iscp.ac.uk/) 27. Marshall JC (2006) Surgical decision-making: integrating evidence, inference, and experience. Surg Clin North Am 86:201–215; xii 28. Sevdalis N, McCulloch P (2006) Teaching evidence-based decision-making. Surg Clin North Am 86:59–70; viii 29. OpenClinical (2000) The medical knowledge crisis and its solution through knowledge management (White Paper), London 30. Glickman SW, Baggett KA, Krubert CG et al (2007) Promoting quality: the health-care organization from a management perspective. Int J Qual Health Care 19:341–348 31. Ioannidis JP (2006) Indirect comparisons: the mesh and mess of clinical trials. Lancet 368:1470–1472 32. Salanti G, Kavvoura FK, Ioannidis JP (2008) Exploring the geometry of treatment networks. Ann Intern Med 148:544–553 33. National Health and Medical Research Council (1999) A guide to the development, evaluation and implementation of clinical practice guidelines. Available at: http://www.nhmrc. gov.au/publications/synopses/_files/cp30.pdf 34. National Institute for Health and Clinical Excellence (2008) Moving beyond effectiveness in evidence synthesis – methodological issues in the synthesis of diverse sources of evidence. Available at: http://www.nice.org.uk/niceMedia/docs/ Moving_beyond_effectiveness_in_evidence_synthesis2.pdf 35. Greenland S (2006) Bayesian perspectives for epidemiological research: I. Foundations and basic methods. Int J Epidemiol 35:765–775 36. Spiegelhalter DJ, Myles JP, Jones DR et al (2000) Bayesian methods in health technology assessment: a review. Health Technol Assess 4:1–130 37. Pope C, Mays N, Popay J (2007) Synthesizing qualitative and quantitative health research: a guide to methods. Open University Press, Maidenhead 38. Ragin CC (1992) The comparative method: moving beyond qualitative and quantitative strategies. University of California Press, Berkeley 39. Kirk RM, Mansfield AO, Cochrane JPS (1999) Preface. In: Kirk RM, Mansfield AO, Cochrane JPS (eds) Clinical surgery in general. Churchill Livingstone, London 40. Brunswik E (1952) The Conceptual framework of psychology. University of Chicago Press, Chicago 41. Cooksey RW (1996) Judgment analysis: theory, methods, and applications. Academic Press, San Diego 42. Cooksey RW (1996) The methodology of social judgement theory. Think Reason 2:141–173 43. Sevdalis N, Jacklin R (2008) Opening the "black box" of surgeons' risk estimation: from intuition to quantitative modeling. World J Surg 32:324–325 44. MacCormick AD, Parry BR (2006) Judgment analysis of surgeons’ prioritization of patients for elective general surgery. Med Decis Making 26:255–264
26 45. Kee F, McDonald P, Kirwan JR et al (1997) The stated and tacit impact of demographic and lifestyle factors on prioritization decisions for cardiac surgery. QJM 90:117–123 46. Clarke MG, Wilson JR, Kennedy KP et al (2007) Clinical judgment analysis of the parameters used by consultant urologists in the management of prostate cancer. J Urol 178:98–102 47. Pfister M, Jakob S, Frey FJ et al (1999) Judgment analysis in clinical nephrology. Am J Kidney Dis 34:569–575 48. Denig P, Wahlstrom R, de Saintonge MC et al (2002) The value of clinical judgement analysis for improving the quality of doctors’ prescribing decisions. Med Educ 36: 770–780 49. Levit K, Smith C, Cowan C et al (2004) Health spending rebound continues in 2002. Health Aff (Millwood) 23: 147–159 50. Heffler S, Smith S, Keehan S et al (2003) Health spending projections For 2002–2012. Health Aff (Millwood) Suppl (Web Exclusives):W354–W365 51. Gold MR, Siegel JE, Russell LB et al (1996) Costeffectiveness in health and medicine. Oxford University Press, New York 52. Gazelle GS, McMahon PM, Siebert U et al (2005) Costeffectiveness analysis in the assessment of diagnostic imaging technologies. Radiology 235:361–370
H. Ashrafian et al. 53. Saha S, Hoerger TJ, Pignone MP et al (2001) The art and science of incorporating cost effectiveness into evidencebased recommendations for clinical preventive services. Am J Prev Med 20:36–43 54. Guyatt GH, Oxman AD, Kunz R et al (2008) Incorporating considerations of resources use into grading recommendations. BMJ 336:1170–1173 55. Toedter LJ, Thompson LL, Rohatgi C (2004) Training surgeons to do evidence-based surgery: a collaborative approach. J Am Coll Surg 199:293–299 56. Stirrat GM (2004) Ethics and evidence based surgery. J Med Ethics 30:160–165 57. Burger I, Sugarman J, Goodman SN (2006) Ethical issues in evidence-based surgery. Surg Clin North Am 86:151–168; x 58. Coughlin SS, Beauchamp TL (1992) Ethics, scientific validity, and the design of epidemiologic studies. Epidemiology 3:343–347 59. Weijer C, Dickens B, Meslin EM (1997) Bioethics for clinician: 10. Research ethics. CMAJ 156:1153–1157 60. Stirrat GM (2004) Ethics and evidence based surgery. J Med Ethics 30:160–165 61. Burger I, Sugarman J, Goodman SN (2006) Ethical issues in evidence-based surgery. Surg Clin North Am 86:151–168
3
The Role of the Academic Surgeon in the Evaluation of Healthcare Assessment Roger M. Greenhalgh
Contents 3.1
Introduction ............................................................
27
3.2
Clinical Practice .....................................................
28
3.3
Training Programme..............................................
28
3.4
Advance of Subject and Historical Perspective ..............................................................
28
3.5
Health Technology ..................................................
30
3.5.1 3.5.2 3.5.3 3.5.4 3.5.5
Clinical Trial Expertise ............................................ Statistical Knowledge............................................... Health Economics .................................................... Cost Effectiveness Modelling .................................. Health-Related Quality of Life and Patient Preference ..............................................
30 31 31 31
References ...........................................................................
32
32
Abstract The academic surgeon needs to have much energy and be intent on moving the subject forwards. It is first necessary to set up a fine regional facility and integrate a regional training programme. This is merely the beginning as the academic surgeon must have historical perspective and know where the subject is in historical terms and where it is likely to go next. There are clues as to how this can be anticipated, as it is explained in this chapter. The surgeon then needs to integrate with a multidisciplinary team to bring the subject forward. The team will have clinical trial expertise, statistical knowledge and cost effectiveness opportunities. The opinion of the patient is always paramount and must be measured. These issues bring more benefit to more patients the world over than a single well-performed operation. Thus, the academic surgeon must be a great surgeon and a humble coordinator of disciplines in the patient interest.
3.1 Introduction
R. M. Greenhalgh Division of Surgery, Oncology, Reproductive Biology & Anaesthetics, Imperial College, Charing Cross Hospital, Fulham Palace Road, London W6 8RF, UK e-mail:
[email protected]
Should I go into an academic career in surgery? What can I expect, and am I suited for it? For the right person, it is the most exhilarating experience. For others, the mere expectation of “research” is anathema. “Let me get on with the practice of surgery! This is what professors hear from some. A trainee spends years satisfying academic requirements before being able to consider becoming a doctor and then years more training in surgery. Is that not enough of the studying load? The problem is that the subject is not finite, in the sense that there are no agreed boundaries of minimal knowledge required to practise as a surgeon. When we face the uphill of knowledge base needed to reach the standard to practise surgery, we reasonably ask which books have what is
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_3, © Springer-Verlag Berlin Heidelberg 2010
27
28 Clinical practice & service
Health technology
R. M. Greenhalgh Training the next generation
Subject advance
Fig. 3.1 The life of the academic surgeon is summarised in these four orbits of activity which are interlinked
required for the purpose. The inference is that, once read and understood, that is enough! It often gets candidates through assessment hurdles, but then comes the “rude awakening” that the subject moves on. Fortunately, there are some who are not put off by subject advance. Such surgeons in training actually like their research attachment. What is it that they like? Undoubtedly, some “pretend” to enjoy research, and believe it will hasten their promotion! They “bite the bullet” and get the research over and move on or undergo a career change. There are some who love the experience of research, not so much the hard grind as the analysis and the delivery of results on the podium. Some young surgeons are born to be “on the podium”. This seems to be a stimulus for some, and these are to be the good undergraduate and postgraduate teachers. They are a constant joy for an academic group. There are, then, some who are inclined towards a career in academic surgery, and what is involved long term? I recognise a life of four orbits of activity as they are outlined in (Fig. 3.1).
a good team, and referrals will follow. It is vital at the beginning of the setting up of the group to define the clinical area carefully and to work industriously to achieve very good documentation of patients from the very start. At the beginning of my professorial life, I found that the hospital notes were in an appalling state, and I determined to design specific documentation sheets that would be required for the speciality. This was vital so that colleagues worked almost by protocol in that they answered set questions of history and recorded specific clinical findings. This is no substitute for prospective data, but it helps enormously to reduce data gaps, and most importantly, it points the trainee in the best direction for optimal patient management and documentation. Every link with the regional hospitals is worthwhile, and it can be an advantage to set up rotations of junior staff to a variety of regional hospitals. This achieves a number of objectives. Firstly, it provides an easy route for referral of a patient to a specialist service. Secondly, it improves relationships between colleagues at the hospitals to patient benefit by aiding transfer when needed. Thirdly, it is good for training programmes for trainees to have a wide variety of types of environment to work.
3.3 Training Programme The academic surgeon has a responsibility towards the trainees of the region. The academic head is not necessarily the person who arranges the rotations, but must be very aware of the training issues and the conditions of work of the trainees. Each trainee must have a clear aim. Each must have a goal and a good idea about what type of surgeon they wish to be in the end. What type of place do they wish to work? Will they teach? Will they have a postgraduate responsibility? Will they practise research? A good matching process is vital.
3.4 Advance of Subject and Historical Perspective 3.2 Clinical Practice The chief of an academic group must first be certain that there is a high throughput of patients, and the best way to achieve this is to perform a speciality well with
Whichever branch of academic surgery is chosen, it is vital to aim to advance the subject, and to do that, it is necessary to understand the basic science, which is the basis of the subject. For example, it would be
3
The Role of the Academic Surgeon in the Evaluation of Healthcare Assessment
Fresh air – sanatoria
Surgical pioneering era
Lesser interventions
Medical non-operative management
Prevention
Fig. 3.2 Surgical advance passes through defined phases – the pattern applies in many instances
incomprehensible for a vascular surgeon to fail to understand the basic vascular biology of the subject. It is similar for every branch of academic surgery, and a thorough grasp of the basic subject is essential. Why is this? It is so because an academic surgeon must have historical perspective. A moment of contemplation will indicate that subject advance frequently passes through distinct phases (Fig. 3.2). I will give examples to show what I mean as this is a vital message for the academic surgeon to perceive the relentless advance and not be taken by surprise. Rather, he must drive advance. In the 1950s, as a child, I was aware of a dreadful condition known in lay terms as “galloping consumption”, which I later recognised as tuberculosis, so frequently of the lung but also the bone. I was musical and hated that Mozart died so young. Children with tuberculosis from Mozart’s time till my childhood were taken to sanatoria where the air was fresh. There the young child would be hospitalised for years in a so-called “sanatorium”, one for an ankle, two for a knee, three for a hip and five for a spine. Years in hospital – for what? “Fresh air and rest”! The temples to Aesclepius, the healing god in the ancient Greek world, were always built where there is raised ground and a gentle wind for the healing process. Sanatoria were, thus, always found on a hill where the air is good.
29
This habit continued until recent years. Thus, the Empress, Sissi of Austria, wife of Kaiser Franz-Josef was sent, by her doctors, to Corfu to recover, and “fresh air” was all that the doctors could offer with rest, rest and more rest. This state persisted till well after the Second World War. Then, suddenly, the surgical era came and dramatic surgery was prescribed. For tuberculosis of the lung, thoracoplasty was introduced and crushing of the phrenic nerve to paralyse the diaphragm. Thus, by the old principle of localised rest, the body had its only chance of healing. In around 1956, Streptomycin [1] which was the first antibiotic found to be active against tuberculosis was introduced, and in no time, the sanatoria started to empty, and the surgical procedures were abandoned. At the time, there were trained thoracic surgeons available, and suddenly, there was less thoracic surgery required and depression set in for the speciality. It is a very sad sight to see a trained and disgruntled surgeon who has been sidelined as a result of change in the advancement of treatment. This naturally incites some older surgeons to be “conservative” by nature, but it is better to predict change, to drive inevitable change, rather than resist what is obviously better for the patient. For tuberculosis, this was not the end, and very shortly, the physicians became relatively superfluous because a programme of tuberculin testing of cows was commenced in the community, and those young people who had never had the disease were inoculated; so, by prevention in the community, the condition was virtually eradicated at that time. Which phase does the patient like best? I will leave that to the reader but suffice it to say that surgical dominance and patient needs do not necessarily go hand in hand! Another more recent example is the management of peptic ulceration. After years of admission to a “lying in bed” for rest only, the surgical era brought great relief with such mammoth operations as Polya and Billroth gastrectomy [2]. This carried a significant mortality and was associated with the post-operative complications of “dumping”. However, patients still queued up for gastrectomy as they were promised relief of symptoms. Lesser operations followed, in particular, vagotomy [3], which was commonplace in the 1970s. The extent of the procedure was reduced in the version known as “highly selective vagotomy” in which the nerve of Latarget was preserved and with it, the function of the pylorus. So we had “surgery without dumping and surgery without diarrhoea” [4, 5]. Here was the expected
30
R. M. Greenhalgh
Clinical trial expertise
Patient prefernce
Health related quality of life
Statistical knowledge
Health economics
Cost effectiveness modelling
Fig. 3.3 Some of health technology areas of vital importance
refinement of major surgery which no patient ever wanted to face, and less invasive surgery, more patient friendly, was inevitably introduced and was popular compared with the larger procedures. It was not to end here. Very soon, came in the drugs, which switched off the acid pump, and how were these designed? By knowledge of an understanding of the digestive process, many academic surgeons, who had a Scottish background, brought about this advance. Again, prevention was the final stage once it was better understood which patients get peptic ulceration and so how a preventative approach could be used. An example of this is to be found in the intensive care situation when peptic bowel perforation is common if unprevented. Today, it is prevented by the timely use of drugs to turn off the acid pump. The historical perspective is relevant to every disease situation and every branch of surgery. It helps to be able to “step back” and to see where the subject is now and so where it will go next. It will not stay as it is. You drive it or be left behind!
3.5 Health Technology We have thus far considered the role of the academic surgeon and what skills he needs to be in a position to start the evaluation of healthcare assessment. He now is ready to commence this demanding need. How to set about it?
As is clear from the above, there is much more than the pure subject of surgery in the evolution of all managed conditions, and advancement in current management needs a team of skilled experts to work together. The surgeon is but a cog in the machine and many parts are required. Sometimes it falls to the surgeon to convene such a group and at least, he needs to be part of one. Gone are the days when an academic surgeon would have his laboratory for trainee surgeons to “do a bit of lab work for a thesis”. It is an era past and rightly so. Why is this? It is because subject advance can only be achieved with multi-disciplinary skills, and the group will need the full range of “health technology assessment”. This is summarised in Fig. 3.3.
3.5.1 Clinical Trial Expertise When I became an academic surgeon, this skill was not defined and so not available. I realised the need to retrain and work with clinical trial experts when I saw what the results of large trials did for clinical practice. I will give an example but there are many. In the 1980s, in the United States, it was alleged by vociferous neurologists that surgeons were slashing open necks willy-nilly to operate on the carotid artery and there was not a shred of evidence to support the intervention. The Society for Vascular Surgery took serious umbrage, but Dr Henry Barnett, neurologist in London, Ontario, Canada, was right. He implied that physicians need a better justification even before being allowed to prescribe drugs, let alone to operate on a neck as surgeons did! This provoked neurologists and surgeons to put the operation to the ultimate test. A multi-centre trial was organised and symptomatic patients were entered into the North American Symptomatic Carotid Endarterectomy Trial [1, 6] and a similar European Carotid Surgical Trial lead by Charles Warlow started almost simultaneously [2, 7]. Critically, both trials were set up with surgeons and neurologists working together. A clinical alert from NASCET was released after 18 months to the effect that surgical patients did vastly better than non-surgical patients in a randomised trial in which both groups were as near as possible, identical but treated differently. There was a 17% benefit for surgery with best medical treatment over best medical treatment alone. At 18 months, the trial was stopped. The European trial showed exactly similar findings, and the power of the
3
The Role of the Academic Surgeon in the Evaluation of Healthcare Assessment
two taken together was to inform the whole world what to do in the circumstances of the so-called “transient ischaemic attacks” and “mini strokes”. The operation had been with us since 1953, but it took so long to prove it worked! I had performed carotid surgery for years and remember being sad that it was challenged by Barnett, but he was right. I then witnessed a massive increase in referrals for the operation as doctors felt it was the right course of action to take in the patient interest. I had seen the power of clinical trials and needed to learn how to do them. There are times when a randomised controlled trial is not possible and other techniques may then be used such as case–control studies and longitudinal cohort studies. These will be described elsewhere in this book. It is important for the academic surgeon to have a grasp of the “levels of evidence” quoted elsewhere and the technique of meta-analysis, a most useful function. The concept of the Cochrane review is also vital to understand.
3.5.2 Statistical Knowledge To begin a randomised controlled trial, a statistician is required very soon. At first, we made the mistake of turning to a statistician at the end of some documentation of procedures. We might say, “Make some sense of this”, and the answer should not be printed! It is crucial to work with a statistical group in the design and setting up of the trial. There are a number of ways to approach a problem. Some statisticians favour large groups and preferably only two groups. Others favour “propensity analysis” and many deplore subgroup analysis. In general, it is a good rule to discuss and agree the statistical plan before any data are collected and certainly before data are analysed for fear of introducing bias. The statistician will also perform “power calculations” which are aimed at calculating the numbers required for statistical significance, given certain assumptions. Very occasionally, a surgeon becomes involved in the development of a new method, for example, the “tracker trial” concept [8]. This is a randomised controlled trial of a comparison of “generic treatments” and subgroups are expressed as a percentage of the alternative generic method and the proportions are compared one with another.
31
The so-called endovascular aneurysm repair (EVAR) trials were cited in this methodology. In this, all endovascular devices were compared with open repair as two generic treatments. Then, different types of EVAR devices were compared with the whole open repair group and this is repeated for each EVAR type. Finally, the results of one EVAR as related to open repair are compared with the performance of another specific EVAR type compared with open repair [9–11]. It is also possible to use a “propensity analysis” and we actually presented the results that way [12].
3.5.3 Health Economics It soon became clear that the purchasers of health care must have a grasp of the costs of various methods. Thus, it is absolutely vital to have a health economic group working in the multi-disciplinary team. Wherever there is more than one way to treat, cost comes into it. Many of the costs are hidden costs. Some are obvious, others less so. For example, it was shown in Vienna that the cost of aortic aneurysm repair is determined by three factors, which account for 80% of the costs [13]. These are costs in the operating room, use of intensive or critical care and length of stay. Re-interventions and follow-up scans in hospital are another big cost. Clinicians think they can guess at costs but they cannot. It takes special expertise. I would go so far as to say that every procedure comparison today requires a cost analysis. Journals do not always want the results. I find it best to include economic details with clinical results and not to separate them. This avoids the rejection of cost details by some editors.
3.5.4 Cost Effectiveness Modelling This is an attempt to gaze into the future. It is needed by health care purchasers and they want to have an early prediction of what it costs to adopt a new treatment over the years ahead. Of course, to know the actual answer, time must elapse and so “assumptions” are made to “model” the result. If a treatment is very effective clinically, the cost could be good “value for money”. The poorer the clinical benefit, the less likely the procedure to prove cost effective. These skills are relatively recently described and not as widely available as they
32
should be, but no multi-disciplinary group is complete without these skills. The so-called “Markov” model is commonly used.
3.5.5 Health-Related Quality of Life and Patient Preference It is not all about cost. It is crucial to be sensitive as to what is best for the patient and at what cost. Patient satisfaction is difficult to measure, but it is vital to assess it. There is a tendency to be obsessed with cost as this determines whether organisations decide to buy the method, but what the patient thinks must be the key issue. There are established methods in commerce, which deal with quality of life (QoL), and the “health related” (HRQoL) is a mere extension of this. Opinion polls are well known to politicians, and these techniques can be applied to patients with suitable modification. Thus, patient feedback is assessed formally, but the methodology of this is a new skill, which must be learned and applied.
References 1. Anon (1991) North American Symptomatic Carotid Endarterectomy Trial. Methods, patient characteristics, and progress. Stroke 22:711–720 2. Anon (1991) MRC European Carotid Surgery Trial: interim results for symptomatic patients with severe (70–99%) or
R. M. Greenhalgh with mild (0–29%) carotid stenosis. European Carotid Surgery Trialists’ Collaborative Group. Lancet 337:1235–1243 3. Billroth T (1881) [Reports on Billroth’s operation]. Wien Med Wochenschr 31:595–634 4. Brown LC, Greenhalgh RM, Kwong GP et al (2007) Secondary interventions and mortality following endovascular aortic aneurysm repair: device-specific results from the UK EVAR trials. Eur J Vasc Endovasc Surg 34:281–290 5. Dragstedt LR, Owens FMJ (1943) Supradiaphragmatic section of the vagus nerves in treatment of duodenal ulcer. Proc Soc Exp Biol Med 53:152–154 6. Greenhalgh RM, Brown LC, Kwong GP et al (2004) Comparison of endovascular aneurysm repair with open repair in patients with abdominal aortic aneurysm (EVAR trial 1), 30-day operative mortality results: randomised controlled trial. Lancet 364:843–848 7. Greenhalgh RM, Brown LC, Kwong GP et al (2005) Endovascular aneurysm repair versus open repair in patients with abdominal aortic aneurysm (EVAR trial 1): randomised controlled trial. Lancet 365:2179–2186 8. Greenhalgh RM, Brown LC, Kwong GP et al (2005) Endovascular aneurysm repair and outcome in patients unfit for open repair of abdominal aortic aneurysm (EVAR trial 2): randomised controlled trial. Lancet 365:2187–2192 9. Holzenbein J, Kretschmer G, Glanzl R et al (1997) Endovascular AAA treatment: expensive prestige or economic alternative? Eur J Vasc Endovasc Surg 14:265–272 10. Humphrey CS, Johnston D, Walker BE et al (1972) Incidence of dumping after truncal and selective vagotomy with pyloroplasty and highly selective vagotomy without drainage procedure. Br Med J 3:785–788 11. Johnston D, Humphrey CS, Walker BE et al (1972) Vagotomy without diarrhoea. Br Med J 3:788–790 12. Keller H, Krupe W, Sous H et al (1956) [Raising of tolerance to streptomycin by the introduction of pantothenic acetates]. Wien Med Wochenschr 106:63–65 13. Lilford RJ, Braunholtz DA, Greenhalgh R et al (2000) Trials and fast changing technologies: the case for tracker studies. BMJ 320:43–46
4
Study Design, Statistical Inference and Literature Search in Surgical Research Petros Skapinakis and Thanos Athanasiou
Contents 4.1
The Basics of Study Design....................................
33
4.1.1 4.1.2 4.1.3 4.1.4 4.1.5 4.1.6
Ecological or Aggregate Studies .............................. Cross-Sectional Surveys ........................................... Case–Control Studies ............................................... Cohort or Longitudinal Studies ................................ Randomised Controlled Trials ................................. Systematic Reviews and Meta-Analyses..................
33 34 35 35 36 38
4.2
The Basics of Statistical Analysis ..........................
39
4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 4.2.6 4.2.7 4.2.8
The Study Population ............................................... Hypothesis Testing ................................................... Type 1 and Type 2 Errors ......................................... Statistical Power ....................................................... Interpreting “Statistically Significant” Results ........ Confidence Intervals................................................. Interpreting “Negative” Results ............................... Correlation and Regression ......................................
39 39 39 40 40 40 40 40
4.3
Causal Inference .....................................................
41
4.3.1 What Is a Cause? ...................................................... 4.3.2 The Multi-Factorial Model of Causation ................. 4.3.3 Evaluating Causality in the Multi-Factorial Model ....................................................................... 4.3.4 Bradford Hill’s Criteria for Causality ......................
41 41
4.4
42 43
Clinical Importance of the Results: Types of Health Outcomes and Measures of the Effect .............................................................
44
4.4.1 Health Outcomes ...................................................... 4.4.2 Clinical Importance ..................................................
44 44
4.5
Searching Efficiently the Biomedical Databases ................................................................
46
4.5.1 Structure of a Database ............................................ 4.5.2 Structure of PubMed ................................................
46 46
References ...........................................................................
53
P. Skapinakis () University of Ioannina, School of Medicine, Ioannina 45110, Greece e-mail:
[email protected]
Abstract The aim of this chapter is to provide the reader with the theoretical skills necessary to understand the principles behind critical appraisal of the literature. The presentation will follow roughly the order by which a researcher carries out the research. First, we discuss the main types of study design. Second, we briefly mention the basic statistical procedures used in data analysis. Third, we discuss on the possible interpretations of an observed association between an exposure and the outcome of interest, including any causal implications. Fourth, we discuss the issue of clinical significance and distinguish it from statistical significance by referring to the types of outcomes used in research and the measures of the effect of a potential risk factor. Finally, we give practical advice to help readers increase their ability to search efficiently in the biomedical databases, and especially medline.
4.1 The Basics of Study Design The main study designs used in research (see Table 4.1) can be described as either observational (ecological, cross-sectional, case–control and cohort studies), experimental (mainly the randomised controlled trial – RCT) or summary in nature (systematic reviews and meta-analyses) [1–3].
4.1.1 Ecological or Aggregate Studies Ecological studies examine the association between disease (or the outcome of interest) and the characteristics of an aggregation of people, rather than the characteristics of individuals. The main difficulty with this design
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_4, © Springer-Verlag Berlin Heidelberg 2010
33
34
P. Skapinakis and T. Athanasiou
Table 4.1 Types of study in epidemiological research Types of study Primary research
The graph (Fig. 4.1) shows the generally positive association between statin prescribing rates and episodes of CHD. However, there are differences between different primary care trusts and it is now known that these differences are explained by the different needs of the individual patients attending the particular practices.
Secondary research
Observational
Experimental
Summary
Ecological Cross-sectional Case–control Cohort
Randomised controlled trials (RCTs)
Systematic reviews Meta-analyses
4.1.2 Cross-Sectional Surveys is that the association between exposure and disease at an aggregate level may not be reflected in an association at the individual level. In this context, the confounding is often termed the ecological fallacy. An example of an ecological study is the one conducted by Ward et al. [4], which aimed at examining the association between statin prescribing rates by general practitioners and several proxies of health care need, including hospital morbidity statistics for coronary heart disease (CHD) episodes.
This type of descriptive study relates to a single point in time and can therefore report on the prevalence of a disease, but is adversely affected by the duration of illness. A cross-sectional survey eliminates the problems of selection bias and has been frequently used for the study of common conditions. However, any association found in a cross-sectional survey could either be with incidence or duration. For example, Skapinakis et al. [6]
200.00
CHD Hospital Episode Statistics Rate
x 150.00
x x
x x
x
100.00 x
50.00
x
xx x
x xx
x
x
x x x xx x
x
x x
x
x
x
x xx
x
x x
x x
x x
x x x
0.00
Fig. 4.1 Ecological study of the association between Statin prescribing rates and coronary heart disease hospital episode statistics in four primary care trusts in the UK (from Ward et al. [5])
10.00
20.00
30.00
Statin Prescribing Rates
x
PCT1 PCT2 PCT3 PCT4
40.00
50.00
4
Study Design, Statistical Inference and Literature Search in Surgical Research
studied the sociodemographic and psychiatric associations of chronic fatigue syndrome in a cross-sectional survey of the general population in Great Britain. They found that chronic fatigue syndrome was strongly associated with depression. They also found that other risk factors were independently associated with chronic fatigue (older age, female sex, having children and being in full-time employment) after adjustment for depression. This finding supported the hypothesis that chronic fatigue syndrome has a unique epidemiological profile that is distinct from depression, but longitudinal studies should explore this hypothesis further.
4.1.3 Case–Control Studies In a case–control study, individuals with the disease (cases) are compared with a comparison group of controls. If the prevalence of exposure is higher in the cases than in the controls, the exposure might be a risk factor for the disease, and if lower, the exposure might be protective. Case–control studies are relatively cheap and quick, and can be used to study rare diseases. However, great care is needed in the design of the study in order to minimise selection bias. It is important to ensure that the cases and controls come from the same population, because the purpose of the “control” group is to give an unbiased estimate of the frequency of exposure in the population from which the cases are drawn. For example, Kendell et al. [7] conducted a case–control study to examine the association between obstetric complications (OCs) and the diagnosis of schizophrenia. They found a highly significant association and concluded that a history of OCs in both pregnancy and delivery is a risk factor for developing schizophrenia in the future. However, in a new paper [8], the same group re-analyzed the data set of the previous study and reported that the previous findings were not valid due to an error in selecting controls. The method used had inadvertently selected controls with lower than normal chances of OCs, thereby introducing a serious selection bias. In reality, there was no association between schizophrenia and OCs in this data set. A nested case–control study is one based within a cohort study or sometimes a cross-sectional survey. The cases are those that arise as the cohort is followed prospectively and the controls are a random sample of the non-diseased members of the cohort [3].
35
In a matched case–control study, one or more controls are selected for each case to be similar for characteristics that are thought to be important confounders. The analysis of case–control studies results in the reporting of odds ratios (ORs); case–control studies cannot directly estimate disease incidence rates. If the study is matched, a more complex matched analysis needs to be performed (conditional logistic regression).
4.1.4 Cohort or Longitudinal Studies A cohort (or longitudinal, or follow-up) study is an observational study in which a group of “healthy” subjects who are exposed to a potential cause of disease, together with a “healthy” group who are unexposed, are followed up over a period of time. The incidence of the disease of interest is compared in the two groups. Ideally, the exposed and unexposed groups should be chosen to be virtually identical with the exception of the exposure. The ability of a cohort study to rule out reverse causality as a reason for an observed association is of great benefit. One of the most well-known cohort studies in Medicine is the Framingham Heart Study. From this study, Wilson et al. [9] followed up 2,748 participants aged 50–79 for 12 years and reported in 1988 that low levels of high density lipoprotein cholesterol (HDL-C) were associated with increased mortality, especially from CHD or other cardiovascular causes. Cohort studies always “look forward” from the exposure to disease development, and therefore, can be time-consuming and expensive. To minimise costs, historical data on exposure, i.e. information already collected, can be used. The disadvantage of these studies is that exposure measurement is dependent on the historical record that is available. The completeness of follow-up is particularly important in cohort studies. It is essential that as high a proportion of people in the cohort as possible are followed up and those who migrate, die or leave the cohort for any reason should be recorded. The reasons for leaving the cohort may be influenced by the exposure and/or outcome, and incomplete follow-up can therefore introduce bias. The analysis of cohort studies involves calculation of either the incidence rate or risk of disease in the exposed cohort compared to that in the unexposed cohort. Measures of relative and absolute measures of effect can then be calculated.
36
P. Skapinakis and T. Athanasiou
4.1.5 Randomised Controlled Trials RCTs (Fig. 4.2) are most frequently used (when possible) to investigate the effectiveness of medical interventions [10]. They are the strongest design to investigate causality between an intervention and outcome because randomly allocating sufficient patients to two or more treatments should eliminate both selection bias and confounding when comparing outcomes [11]. Selection bias and confounding are explained later, but the principle of the RCT is that the subjects in the randomised groups should be as similar as possible. The main argument for randomisation is that it is impossible to measure all the potential confounding variables that may affect outcome. If one could predict outcome very accurately, a longitudinal study would be a satisfactory design. If an RCT is to influence clinical practice, it must address an area of clinical uncertainty. If there is a consensus that a treatment is effective, then there is little point in conducting a trial without some other good reasons. The more common the dilemma, the more important and relevant becomes an RCT. It is important that we recognise areas of clinical uncertainty in
our work in order to inform the design of future RCTs. Clinical uncertainty is also related to the ethical justification for randomisation. If a clinician is uncertain about the most effective treatment, then randomisation becomes an ethical option or even an ethical requirement. It is therefore important that RCTs address the important clinical dilemmas. Subjects must be allocated to the treatments in an unbiased way. This is done by concealing the process of randomisation, so that the person who has assessed the patient cannot interfere with the randomisation. The concealment of randomisation is an important aspect of RCT methodology and has been used as a proxy for the quality of an RCT [12]. The validity of the comparison between the randomised groups in an RCT depends critically on ensuring that the measurement of outcome is not affected by the allocation of treatment. This is usually done by disguising the random allocation from the person making the assessment or “blinding” the person as to the allocation. A double-blind study refers to one in which both the patient and assessor are blind. A triple-blind study refers to those in which the person analysing the data is also unaware of the treatment allocation.
Selection Criteria
Source Population
Eligible Population
Consenting to Randomization
Treatment
Outcome known
Fig. 4.2 Design of RCT
Non Participants
Did not consent
Control
Outcome unknown
Outcome known
Outcome unknown
4
Study Design, Statistical Inference and Literature Search in Surgical Research
One of the main difficulties in interpreting the results of an RCT concerns the influence of subjects withdrawing from treatment or from follow-up. As subjects drop out of an RCT, the treatment groups depart from the balanced groups created at randomisation. If the drop-outs are substantial in number, then there is a possibility that confounding is reintroduced. Even more importantly, since non-compliers usually tend to be those subjects at a higher risk of adverse health outcomes, there is a risk of bias creeping in, especially if there is differential drop-out between the groups. Therefore it is important to minimise the noncompliance rate and loss to follow-up rate. The main way in which this problem is circumvented is by the use of an intention-to-treat strategy in the analysis in which all the randomised subjects are included irrespective of whether they continued with the treatment or not. If there is missing follow-up data, one can use data from a previous time-point, assuming a poor outcome for drop-outs. There are also more complex ways of substituting values for missing data that rely upon multivariate methods. An intention to treat strategy ensures that all the randomised individuals are used in the analysis. In this way, the benefits of randomisation are maintained and the maximum number of subjects can be included in the analysis. Using an intention-to-treat analysis is one of the characteristics of pragmatic trials [10]. They aim to study the long-term consequences of one clinical decision, e.g. to prescribe the treatment or not, and to follow best clinical practice after that. The treatment effect may be less (i.e. the effect is diluted) than in the ideal case of 100% compliance, but it gives a far more realistic estimate of the treatment effect. There is an ongoing debate between those who argue that randomisation is the only safe, unbiased means of assessing new interventions, and those who view randomisation as a narrow methodology of limited usefulness except for assessing drug treatments [13]. There are three sets of arguments as follows: 1. External validity: RCTs might lead to findings that overestimate treatment effects or do not have relevance to the settings that interest clinicians the most. 2. Feasibility: Sometimes it is impossible to mount RCTs for practical reasons. For example, an RCT of suicide prevention would need to randomise tens of thousands of people.
37 Patient non-participation (p) (patient has preference for specified treatment or aversion to research) Not invited to particioate (I) (administrative oversight or paractitioner preference)
Potential to benefit
Centre/docotor non-participation (d) (not invited or center/parctitioner preference) Subjects (s)
Ineligible (e)
Intervention A
Intervention B
Fig. 4.3 RCTs may have limitations in external validity, which imposes difficulties on the application of their findings in real clinical situations. Graph taken from: McKee et al. [14]
3. Rarity: The number of well-conducted RCTs of sufficient size to draw conclusions will always be limited. There are going to be many clinically relevant issues that will not be addressed with RCTs. Perhaps the main criticism is the limited external validity or generalisability [13, 14]. RCTs are strong on internal validity, i.e. drawing conclusions about the effectiveness of the treatment used in that particular setting, on those patients. However, clinicians, if not primarily interested, are also in the external validity of a trial. The key question is “Do the results apply to the circumstances in which the clinician works?” There are probably the following three main reasons why this can be a problem: 1. The professionals: The doctors and other professionals involved in trials are atypical, often with a special interest and expertise in the problem. 2. The patients: It is often difficult to recruit subjects to RCTs and the group of patients included is often very unrepresentative of the group eligible for treatment. This difficulty is often exacerbated by the investigators choosing a large number of “exclusion criteria” (Fig. 4.3). 3. The intervention: Many studies are carried out in prominent services, perhaps with dedicated research funds providing additional services. It is often difficult to know about the effectiveness of a similar intervention if it were applied to other services both in the country of the study or elsewhere in the world. Pragmatic RCTs are designed to answer clinically relevant questions in relevant settings and on representative groups of patients [10]. One of the priorities
38
of pragmatic trials is to ensure external as well as internal validity. Choosing clinically relevant comparisons is also essential and pragmatic trials are designed to reduce clinical uncertainty. In assessing a pragmatic trial, one should consider the representativeness and relevance of the following: (a) the patients in relation to the intended clinical setting, (b) the clinical setting, (c) the intervention(s) and (d) the comparisons. Economic assessment is often an important aspect of pragmatic trials. Clinicians, patients, and commissioners also need to know how much an intervention costs as well as whether it works. There will always be limitations on resources available for health care, and economics should help to make judgements on the best place to invest. This has to be done in conjunction with knowledge about the size of treatment effect. Trials also need to examine outcomes concerned with the “quality of life” of the subjects in addition to clinical outcomes. These measures should assess whether subjects are working, pursuing their leisure activities or require additional support.
4.1.6 Systematic Reviews and Meta-Analyses Secondary research aims to summarise and draw conclusions from all the known primary studies on a particular topic (i.e. those which report results at first hand) [5]. Systematic reviews apply the same scientific principles used in primary research to reviewing the literature. In contrast, more traditional or narrative review relies upon an expert to remember the relevant literature and to extract and summarise the data he or she thinks important. Systematic reviews ensure that all the studies are identified using a comprehensive method and that data are extracted from the studies in a standardised way. Meta-analysis provides a summary estimate of the results of the studies that are identified using a systematic review. It enables the results of similar studies to be summarised as a single overall effect with confidence intervals using formal statistical techniques. The main advantage of these studies is the resulting increase in the combined sample size (Table 4.2). A problem of secondary research is the presence of publication bias, i.e. small negative results are less likely
P. Skapinakis and T. Athanasiou Table 4.2 Advantages and disadvantages of secondary research Secondary research Advantages
Disadvantages
All evidence is used to assess an intervention
Publication and citation bias Limited by the quality of the primary studies
Increased statistical Pooling disparate studies may be power invalid (but such heterogeneity can be investigated Can investigate heterogeneity and test generalisability
to be published. Therefore, ideally, one should attempt a comprehensive search strategy that includes not only published results, but also those reported in abstracts, personal communications, and the like. Systematic reviews have mostly been used to summarise the results from RCTs (see Cochrane Collaboration below), but the same arguments apply to reviewing observational studies. A central issue in secondary research is heterogeneity [15]. This term is used to describe the variability or differences between studies in terms of clinical characteristics (clinical heterogeneity), methods and techniques (methodological heterogeneity) and effects (heterogeneity of results). Statistical tests of heterogeneity may be used to assess whether the observed variability in study results (effect sizes) is greater than that expected to occur by chance. Heterogeneity may arise when the populations in the various studies have different characteristics, when the delivery of the interventions is variable or when studies of different designs and quality are included in the review. Interpreting heterogeneity can be complex, but often clinicians are interested in heterogeneity in order to inform clinical decision-making [16]. For example, clinicians want to know if a particular group of patients responds well to a particular treatment. Meta-analysis has also been criticised for attempting to summarise studies with diverse characteristics. Investigating heterogeneity can also be used to address such concerns. The use of systematic reviews for the assessment of the effectiveness of healthcare interventions is largely promoted by Cochrane Collaboration. Archie Cochrane, a British epidemiologist who was based in Cardiff for much of his working life, recognised that people who
4
Study Design, Statistical Inference and Literature Search in Surgical Research
want to make more informed decisions about health care do not have ready access to reliable reviews of the available evidence. Cochrane emphasised that reviews of research evidence must be prepared systematically and they must be kept up-to-date to take account of new evidence. In 1993, seventy-seven people from eleven countries co-founded “The Cochrane Collaboration”. The Cochrane Collaboration aims to systematically review all the RCTs carried out in medicine since 1948. They are also committed to update the reviews as new evidence emerges. The mission statement of the Cochrane Collaboration is “Preparing, maintaining and promoting the accessibility of systematic reviews of the effects of healthcare interventions”. The Cochrane Collaboration’s website is www.cochrane.org. This has links to the Cochrane library which contains the Cochrane database of systematic reviews and the Cochrane-controlled trials register.
4.2 The Basics of Statistical Analysis 4.2.1 The Study Population The study population is the set of subjects about whom we wish to learn. It is usually impossible to learn about the whole population, so instead we look in detail at a subset or sample of the population. Ideally, we choose a sample at random so that it is representative of the whole study population. Our findings from the sample can then be extrapolated to the whole study population. Suppose that two random samples are selected from a large study population. They will almost certainly contain different subjects with different characteristics. For example, if two samples of 100 are chosen at random from a population with equal numbers of males and females, one may contain 55 females and the other 44 females. This does not mean that either sample is “wrong”. The randomness involved in sample selection has introduced an inaccuracy in our measurement of the study population characteristic. This is called sampling variation. Our aim is to extrapolate and draw conclusions about the study population using findings from the sample. Most of the statistical tests are therefore trying to infer something about the study population by taking account and estimating the sampling variation.
39
4.2.2 Hypothesis Testing Studies are usually designed to try to answer a clinical question, such as: “Is there any difference between two methods for treating coronary heart disease?” In hypothesis testing we formulate this question as a choice between two statistical hypotheses, the null and alternative hypotheses. The null hypothesis, H0, represents a situation of no difference, no change, equality, while the alternative hypothesis, H1, specifies that there is a difference or change. So we might have H0: there is no difference between two treatments for CHD H1: there is a difference between the methods We have to make a decision as to which we think is true. This is usually based on the P-value. This is essentially the probability of obtaining results at least as extreme as those obtained if the null hypothesis is true. A small P-value, such as 0.05, means it is unlikely that such a result would be obtained by chance and this offers evidence against H0, while a large P-value, such as 0.5, tends broadly to support H0. How small should the P-value be to reject H0? Traditionally, the critical level has been set at 0.05 or 5%. If P < 0.05 is the criterion for rejecting H0, then we say the result is significant at 5%. Other levels can be taken, but this is the most common. If the P-value exceeds 0.05, the decision is that we do not reject H0, rather than accepting H0. It is difficult to prove that there is absolutely no difference and we simply conclude that we cannot show there is a difference. The P-value is often incorrectly interpreted as the probability that the null hypothesis is true. This is not correct, the P-value just quantifies the evidence against the null hypothesis.
4.2.3 Type 1 and Type 2 Errors There are two types of wrong decision that can be made when a hypothesis test is performed. A Type I error occurs when the null hypothesis H0 is true but rejected. Five percent of all tests that are
40
significant at the 5% level are Type 1 errors. Carrying out repeated tests increases the chance of a Type 1 error. A Type II error occurs when the null hypothesis H0 is false but not rejected. For example, in a small study it is possible to record a non-significant P value, despite large true differences in the study population. Type II errors need to be considered in all “negative” studies. Confidence intervals will help to interpret negative findings (see below).
4.2.4 Statistical Power The statistical power of a study is the probability of finding a statistical significant result, assuming the study population has a difference of a specified magnitude. It is the probability of not having a Type II error. The power depends upon the following: • The level of statistical significance – usually 5% • The size of effect you assume in the study population • The sample size Calculating the power of a study is useful at the planning stage. The power calculation depends critically on the size of effect one wishes to find. When designing studies, the power is often set to 80% in order to determine the sample size. 80% is an arbitrary value, similar in that way to the 5% significance value.
4.2.5 Interpreting “Statistically Significant” Results Five percent (one out of every 20) of statistical tests will be statistically significant at the 5% level by chance. The 5% significance level is of course fairly arbitrary. There is no real difference in interpreting a 4 and 6% significance. If a study reports twenty P values, one would expect one to be “significant” by chance. Repeated tests increase the chance of Type 1 errors.
P. Skapinakis and T. Athanasiou
effect and so we also need to know how accurately we are estimating our proportion. Confidence intervals are based on an estimate of the size of the effect, together with a measure of the uncertainty associated with the estimate of the size. The standard error (SE) tells us how precisely our sample value estimates the true population value. If we take a lot of similar size samples from the same population, the SE can be thought of as the standard deviation of the sample means. If we increase the sample size, we decrease the SE as we are estimating the study population value with more accuracy. A 95% confidence interval is constructed so that in 95% of cases, it will contain the true value of the effect size. It is calculated as: 95% CI = Estimated value + (1.96*SE) We can use different levels of confidence if we wish. Based on the same information, a 99% confidence interval will be wider than a 95% one, since we make a stronger statement without any more data; similarly, a 90% one will be narrower. In recent years it has been generally agreed that results should be summarised by confidence intervals rather than P-values, although ideally both will be used. The P-value does not give an indication of the likely range of values of the effect size, whereas the confidence interval does.
4.2.7 Interpreting “Negative” Results When a trial gives a “negative” result, in other words, there is no statistically significant difference, it is important to consider the confidence intervals around the result. We must remember that a study estimates the result inaccurately. The confidence intervals give the range within which we are 95% confident the “true” value lies. Small negative trials will usually have confidence intervals that include differences that would correspond to potentially important treatment effects. One way of thinking about results is that they are excluding unlikely values. The confidence interval gives the range of likely values and an effect size outside the confidence interval is unlikely.
4.2.6 Confidence Intervals
4.2.8 Correlation and Regression
We know that our sample estimate from a study (e.g. a proportion, a mean value or an OR) is subject to sampling error. We are primarily interested in the size of
Linear regression allows the relationship between two continuous variables to be studied. The regression line is the straight line that fits the data best. The correlation
4
Study Design, Statistical Inference and Literature Search in Surgical Research
Residual
Predicted
x Fig. 4.4 The regression line
coefficient varies between −1 and 1. The more that correlation coefficient departs from 0, the more variation is explained by the regression line. A negative correlation coefficient arises when the value of one variable goes down as the other goes up. Each observation can be thought of as a “predicted” value, i.e. that would lie on the regression line, and a “residual” that is the difference between the predicted value and the observed value (Fig. 4.4). The total variance is therefore the predicted variance added to the residual variance. The correlation coefficient is the predicted variance divided by the total variance. If all the points lie on the line, then the correlation coefficient is 1. The slope of the line is sometimes called the regression coefficient. It gives the increase in the mean value of y for an increase in x of one unit.
4.3 Causal Inference 4.3.1 What Is a Cause? One of the principal aims of epidemiology is to investigate the causes of health-related states and events in specified populations. In general, we are interested in finding the causes of disease because we want to be able to intervene to prevent disease occurring. But what is a cause? It is useful to recall that our ideas about causes are not static and universal and the concepts
41
of disease causation have changed dramatically over time. In the past, several conceptual models of causation had been developed, including the miasma theory in the early century (all diseases were due to bad air) and the germ theory in the second half of the nineteenth century and first half of the twentieth century (diseases were caused by single agents operating at the level of the individual). Robert Koch, one of the key figures of the germ theory, proposed in 1880 some general criteria for causality (the Henle-Koch postulates). According to these, a particular agent can be considered as the cause of a disease if it is both necessary (“The agent must be shown to be present in every case of disease in question”) and specific (“The agent must not be found in cases of any other disease”). In the second half of the twentieth century, ideas about causation began to change for a number of reasons. First, it was realized that recognized pathogens such as the tubercle bacillus could be carried for long periods of time without causing disease (i.e. the bacillus was not sufficient to cause disease). Second, there was a shift of attention from infectious diseases to heart disease and cancer, where various factors were related to risk but none absolutely necessary. As the “new” chronic diseases did not appear to have a single specific cause, epidemiologists became interested in how multiple factors could interact to produce disease. This led to the development of the multifactorial paradigm, which is the dominant theory of causation in contemporary epidemiology.
4.3.2 The Multi-Factorial Model of Causation In the multi-factorial model, a cause of a specific disease is any factor that plays an essential role in the occurrence of the disease. A cause can be either an active agent or a static condition. Within this framework, a single cause is not sufficient to produce disease. Rather, a number of different causal factors act together to cause each occurrence of a specific disease. Rothman has elaborated a model of component causes that attempts to accommodate the multiplicity of factors that contribute to an outcome (Fig. 4.5). In this model, a sufficient cause is represented by a complete circle (a “causal pie”), the segments of which represent component causes. When all of the component causes are present, the sufficient cause is complete and the outcome occurs. As shown in the figure, there may be more than one sufficient cause (i.e. circle) of the
42
P. Skapinakis and T. Athanasiou
Causal pie
Unknown Causes
Fig. 4.6 Interpre ting an association
Is it due to systematic bias?
Components of Sufficient causes No U
U A
B
A
U C
A
E
Causal Complement of A
Fig. 4.5 Rothman’s model of sufficient and component causes
outcome, so that the outcome can occur through multiple pathways. A component cause that is a part of every sufficient cause is a necessary cause. If the causal complement of a factor is absent in a particular population, then the factor will not cause disease in that population. If every individual in a population is exposed to the causal complement of a factor, then exposure to that factor will always produce disease. The strength of a factor’s effect on the occurrence of a disease in a population therefore depends on the prevalence of its causal complement in that population. Because of this, a particular factor may be an important cause of a specific disease in one population, but may not cause any of the disease in another.
4.3.3 Evaluating Causality in the Multi-Factorial Model In contemporary research, we spend a lot of time looking for associations between exposures and outcomes. If we find an association between an exposure and a disease, how can we judge whether this relationship is causal? An association between an exposure and a disease can be explained by five alternative interpretations as follows (Fig. 4.6): • • • • •
Chance Bias Confounding Reverse causality Causation
It is important to emphasise that all study designs, including the RCT, are concerned with causal inference. In an RCT, we are interested in whether the treatment allocation “causes” an increased rate of recovery.
Could it be due to Confounding ?
No
Could it be a result of Chance ?
No
Is it causal?
Apply positive criteria of causality
Chance: Significance testing assesses the probability that chance alone can explain the findings. Calculating confidence intervals gives an estimate of the precision with which an association is measured. A Type 1 error occurs when a statistically significant result occurs by chance. It is a particular problem when many statistical tests are conducted within a single study in the absence of a clearly stated prior hypothesis. A Type 2 error occurs when a clinically important result is obscured by chance or random error, often made more likely by inadequate sample size. For “negative” findings, the confidence interval gives a range of plausible values for the association. Bias: Systematic error or bias can distort an association in any direction, either increasing or decreasing the association. No study is entirely free of bias, but attention to the design and execution of a study should minimise sources of bias. There are two main types of bias in research studies: selection bias and information or measurement bias. Selection bias results from systematic differences in the characteristics between those who are selected for a study and those who are not. Therefore, any observed association between the exposure and the outcome may not be real, but may be due to the procedure used to select the participants. Information or measurement bias refers to errors that result from inaccurate measurement of the exposure,
4
Study Design, Statistical Inference and Literature Search in Surgical Research
43
Table 4.3 Causality criteria Coffee consumption
Cancer of the Pancreas
The Bradford Hill criteria Temporality (the exposure occurs before the outcome)
Smoking
Fig. 4.7 Smoking is a confounder for the association between coffee and cancer of the pancreas because it is associated with both the exposure (coffee) and the outcome (cancer)
Strength of the association (strong associations more likely causal) Consistency (same results with different methods) Dose–response relationship Specificity of the association
the outcome or both. For example, it is well known that retrospective assessment of one’s own fat intake may be inaccurate and could introduce an information bias in either direction in a study aiming to find whether there is an association between fat intake and colon cancer. If subjects with colon cancer were more likely to overestimate their daily fat intake relative to controls, this would increase the chances of finding a statistically significant association. If, on the other hand, controls were more likely to underestimate their fat intake, this would reduce the possibility of finding a significant association. Confounding: Confounding occurs when an estimate of the association between an exposure and disease is an artefact because a third confounding variable is associated with both exposure and disease (see Fig. 4.7 for an example). All observational studies are susceptible to confounding. Always think of potential confounders when interpreting studies. Even in an RCT there can be an imbalance (by chance) in important confounders between the allocated interventions. Reverse causality: This is the possibility that the exposure is the result rather than the cause of the disease. For example, is marital discord a risk factor for depression or is depression causing marital discord? This issue is more important in case–control studies and cross-sectional surveys that assess exposure after the onset of disease. Cohort studies usually eliminate this possibility by selecting people without the disease at the beginning of the study. RCTs also select people at the beginning of the trial who are ill in order to examine outcome. However, it can remain a problem for some conditions where the timing of the onset of disease remains a matter of debate. Causation: An association may indicate that the exposure causes the disease. Trying to infer causation is a difficult task. It is usually helpful to review the epidemiological literature to decide whether there is a consistent finding, irrespective of the population or
Biological plausibility Coherence (no conflicts with current knowledge) Experimental evidence Analogy (similar factors cause similar diseases)
study design. When there is a strong association, the likelihood that the relationship is causal is increased. For example, for relative risks over 3 or 4, confounding and bias have to be quite marked to explain the findings. However, there is generally little confidence in findings when the relative risk is 1.5 or below. A dose–response relationship can also provide additional evidence for causality, depending upon the hypothesised mechanism of action. For example, one would expect that more severe obstetric complications would lead to higher rates of schizophrenia than milder forms if those were causal agents. Finally, the scientific plausibility of the findings has to be considered.
4.3.4 Bradford Hill’s Criteria for Causality In 1965, the epidemiologist Sir Austin Bradford Hill proposed a set of criteria to help evaluate whether an observed association between an exposure and an outcome is likely to be causal (Table 4.3). These criteria help us to judge both whether an association is valid, and whether it is consistent with existing knowledge. None of the criteria alone can prove that a relationship is causal. However, used together, they help us to make an overall judgement about whether a causal relationship is likely. As these criteria are still used today, we will now look at them in some detail: 1. Strength of the association – The stronger an association, the less it could merely reflect the influence of some other aetiological factor(s).
44
2. Consistency – Replication of the findings by different investigators, at different times, in different places, with different methods and the ability to convincingly explain different results. 3. Specificity of the association – There is an inherent relationship between the specificity and strength, in the sense that the more accurately defined the disease and exposure, the stronger should be the observed relationship. But the fact that one agent contributes to multiple diseases is not an evidence against its role in any one disease. 4. Temporality – Does the exposure precede the disease or is reverse causality possible? 5. Biological gradient – Results are more convincing if the risk of disease increases with the level of exposure. 6. Plausibility – We are much readier to accept that a specific exposure is a risk factor for a disease if this is consistent with our general knowledge and beliefs. Obviously this tendency has pitfalls. 7. Coherence – How well do all the observations fit with the hypothesized model to form a coherent picture? 8. Experiment – The demonstration that under-controlled conditions changing the exposure causes a change in the outcome is of great value for inferring causality. 9. Analogy – Have similar associations between similar exposures and other diseases been shown? We are readier to accept arguments that resemble others we accept.
4.4 Clinical Importance of the Results: Types of Health Outcomes and Measures of the Effect 4.4.1 Health Outcomes We are interested in the changes (referred to as outcomes) amongst the research subjects which are associated with exposure to risk factors or therapeutic or preventive interventions. There are two main types of outcomes (Table 4.4) [17]: (a) biological or psychosocial parameters not directly related to disease (for example cholesterol values or scores in a scale measuring social support) and (b) clinical outcomes directly related to disease.
P. Skapinakis and T. Athanasiou Table 4.4 Outcomes in the course of a disease. Adapted from Muir Gray et al. [9] Types of health outcomes Death Disability Disease status Dissatisfaction with process of care Discomfort about the effects of disease
Non-clinical outcomes can only be viewed as surrogates for the clinical outcomes of interest and cannot be used directly to change clinical practice unless there is a clear causal association between this and a clinical outcome. Clinicians are, thus, more interested in research papers that have used relevant clinical outcomes. Outcomes in the course of disease include the following: death, disease status, discomfort from symptoms, disability and dissatisfaction with the process of care. These can easily be memorized as the five Ds of health outcomes. In establishing the clinical importance of a study, one should always check that the outcome is relevant.
4.4.2 Clinical Importance A study may be methodologically valid, with an outcome of interest to clinicians, but still not be clinically relevant because the effect of treatment for example is negligible. A new antihypertensive drug which lowers systolic blood pressure by 5% compared to routine treatment is probably not clinically significant in the sense that it has no implications for patient care. There are two broad categories of measures of effect, relative measures (e.g. relative risk, OR) and absolute measures (e.g. attributable risk) (Table 4.5). In the clinical context, we are more interested in absolute measures because the relative ones cannot discriminate large treatment effects form small ones. For example, in a clinical trial if 90% of the placebo group developed the disease compared to 30% of the treatment group, the relative risk reduction would be [(90−30)/90%] = 66% and the absolute risk reduction (ARR) 90−30% = 60%, a clinically important result. However, in a trial with a 9% control event rate vs. a 6% experimental event rate, the relative risk reduction is the same but the ARR is 3%, a figure not important from the clinical
4
Study Design, Statistical Inference and Literature Search in Surgical Research
45
Table 4.5 Measures of effects Effect measures Effect measures for binary data
Absolute
Relative
Odds: The number of events divided by the number of Absolute risk reduction (ARR): The absolute difference in risk between the experimental and non-events in the sample control groups Number needed to treat (NNT): The number of patients that need to be treated with the experimental therapy in order to prevent one of them to develop the undesirable outcome. It is the inverse of ARR
Odds ratio (OR): The ratio of the odds of an event in the experimental group to the odds of an event in the control group Risk: The proportion of participants in a group who are observed to have an event Relative Risk (RR): The ratio of risk in the experimental group to the risk in the control group
Effect measures for continuous data
Mean difference: The difference between the means of two groups Weighted mean difference: Where studies have measured the outcome on the same scale, the weight given to the mean difference in each study is usually equal to the inverse of the variance Standardised mean difference: Where studies have measured an outcome using different scales (e.g. pain may be measured in a variety of ways) the mean difference may be divided by an estimate of the within-group standard deviation to produce a standardised value without any units
Effect measure for survival data
Hazard ratio: A summary of the difference between two survival curves. It represents the overall reduction in the risk of death on treatment compared to control over the period of follow up of the patients
perspective. In the following paragraphs we present briefly the basic measures of the effect used in clinical research. Attributable risk (ARR) (risk difference, rate difference) is the absolute difference in risk between the experimental and control groups. A risk difference of zero indicates no difference between the two groups. For undesirable outcomes, a risk difference that is less than zero indicates that the intervention was effective in reducing the risk of that outcome useful for interpreting the results of intervention studies. The number needed to treat (NNT) is an alternative way of expressing the attributable risk between two groups of subjects. It has been promoted as the most intuitively appealing way of presenting the results of RCTs and its use should be encouraged in interpreting the results of trials [2]. The NNT is the number of patients who need to be treated with the experimental therapy in order to prevent one of them to develop the undesirable outcome. It is calculated as the reciprocal of the absolute difference in risk (probability of recovery) between the groups. A NNT of 5 indicates that 5 patients need to be treated with treatment A rather than treatment B if one person is to recover on treatment A who would not have recovered on treatment B.
The following example can help to better understand these measures: An RCT of depression finds a 60% recovery with an antidepressant and a 50% recovery on placebo after 6 weeks treatment. The absolute risk difference is 10% (or P = 0.1). The NNT is 1/0.1 = 10. Therefore, if 10 patients were treated with the antidepressant, 1 would get better who would not have got better if treated with placebo. Another way of thinking of this is if there were 10 patients in each group, 6 would get better on the antidepressant and 5 on the placebo. Relative risk is a general and rather ambiguous term to describe the family of estimates that rely upon ratio of the measures of effect for the two groups. They are not the best way of summarising treatment trials. This is because we are interested in the absolute change in risk rather than the relative risk. Ratio measures are more useful in interpreting possible causal associations. Ratio measures estimate the strength of the association between exposure and disease. Incidence Rate Ratio is a further “relative risk” measure. It is when incidence rates are compared. These are commonly used in longitudinal or cohort studies. Epidemiologists often prefer to use odds rather than probability in assessing the risk of disease. The following are three main reasons for this:
46
P. Skapinakis and T. Athanasiou
1. The mathematics of manipulating ORs is easier. 2. The results of logistic regression can be expressed as ORs. Therefore, it is possible to present results before and after multivariate adjustment in terms of ORs. 3. Finally, ORs are the only valid method of analysing case–control studies when the data are categorical. The OR from the case–control study corresponds to the OR in the population in which the case–control study was performed. For rare outcomes, the OR, risk ratio and incidence rate ratio have the same value. To illustrate calculating odds and ORs, the following table can be thought of as the results of either a cross-sectional survey, cohort study or case–control study. The odds of an event = number of events/number of non-events. Cases
Controls
Exposed
a
b
Unexposed
c
d
The OR = odds in the treated or exposed group/ odds in the unexposed group. The odds of being a case in the exposed group is a/b. Similarly, in the unexposed group, the odd of being a case is c/d. The OR is therefore (a/b)/(c/d), and after manipulating algebraically, (ad)/(bc). The OR is, therefore, a “relative odds” and gives an estimate of the “aetiological force” of an exposure. If the OR is greater than 1, the exposure may be a risk factor for the disease. If the OR is less than 1, the exposure (often a treatment) may protect against the disease. If the OR is exactly equal to 1, there may be no association between exposure and disease.
4.5 Searching Efficiently the Biomedical Databases With so many papers published annually in medicine, it is important to know how to search more efficiently for research papers that have the two following characteristics: • They are relevant to our clinical or research question. • They are of high quality.
Sometimes we need a few high quality papers to quickly answer a clinical question. Other times, we need to find all available studies on our chosen topic. In both situations, it helps to know how to improve our skill in searching the biomedical databases, and especially, the “Pubmed” which is provided for free by the US National Library of Medicine (http://www.ncbi. nlm.nih.gov/sites/entrez).
4.5.1 Structure of a Database A database is simply consisting of fields (columns) and records (rows). The example of the telephone catalogue is very familiar to all. A telephone catalogue includes fields such as the surname, the first name, the address and the telephone number. Each entry in the catalogue is one unique record that has values for each one of the fields. Each field may take particular values, for example in a research database the field “marital status” may only take the values: single, married, divorced, widowed. Knowledge of the field structure of a database is essential if we want to search efficiently. We would never open the telephone catalogue if we wanted to select all those who are married in a given area, simply because this field does not exist in this database. Additionally, in the previously mentioned research database, if we search for all those people who are separated we will get zero records, simply because the field marital status does not take the value “separated”. Therefore, knowing the exact field structure of a database (field names, description of fields and typical values) is essential if we want to search efficiently.
4.5.2 Structure of PubMed Each record in the Pubmed database consists of a unique research paper published in one of the thousand journals that are indexed in this database. Clearly, not all the journals are indexed by Pubmed, but it is good to know that the few hundreds of high quality journals are indexed. Most users of Pubmed simply look at the title, authors or abstract of an indexed paper. However, these are only a few of the available fields provided by the
4
Study Design, Statistical Inference and Literature Search in Surgical Research
database. There is a very simple way to get a copy of the record that will provide us with more information on the structure of the database than the default display. To do this, we need to select from the upper-left dropdown menu the display option “MEDLINE” (Fig. 4.8). This will give us information on some of the field types and field values included in a particular record. The following example gives some of the field values
Fig. 4.8 Structure of MEDLINE on Pubmed
Fig. 4.9 Specific Paper Search on Pubmed
47
assigned to a particular paper indexed by Medline. By looking at this figure we can see that there is a field called “MH”, which takes the value “Aorta, Thoracic/*surgery” (Fig. 4.9). This field refers to the Medical Heading fields that we will discuss later. In order to get to know all the possible fields of Pubmed, we need to open the help file provided by the database on the left (Fig. 4.10).
48 Fig. 4.10 Pubmed Menu
P. Skapinakis and T. Athanasiou
Pubmed help includes a detailed description of each field and its associated values. Each field has a unique field tag most often consisting of two capital letters (but there are exclusions) and it takes specific values. To search for records that contain a specific value of a field, we should include the value of the field in inverted commas and then the field tag in square brackets. For example, if you want to search for papers on depression and get only articles written in English, you should include the field Language as in the example below: (Fig. 4.11). Apart from the obvious fields such as the author field (AU), the Title/Abstract field (TIAB) or the Publication Date (DP) field, there are some other important fields that we may need in order to increase our search efficiency. A list of the fields and their corresponding tags can be found in the figure below (Fig. 4.12). By clicking on them you can get the description of the field and the range of accepted values. 4.5.2.1 Some Important Fields 1. Medical subject headings or MeSH terms Field tag: [MH] According to pubmed help: “Medical Subject Headings are the controlled vocabulary of biomedical terms that have been used to describe the subject of each journal
Fig. 4.11 Example Pubmed Search
Fig. 4.12 Pubmed Field Description and Tags
4
Study Design, Statistical Inference and Literature Search in Surgical Research
article in MEDLINE.” MeSH contains more than 23,000 terms and is updated annually to reflect changes in medicine and medical terminology (in other words, the field MH can take 23,000 values). Skilled subject analysts examine journal articles and assign to each the most specific MeSH terms applicable – typically ten to twelve. Applying the MeSH vocabulary ensures that articles are uniformly indexed by subject, whatever the author’s words”. Given that papers indexed by MEDLINE have already been assessed by skilled librarians and subject analysts, the use of MeSH terms instead of simple text words (in titles and/or abstracts) is expected to increase the sensitivity (the number of relevant papers retrieved divided by the total number of relevant papers available into the database – the denominator is usually unknown and can
Fig. 4.13 Pubmed MESH database search
Fig. 4.14 Pubmed MESH database search for “depression”
49
only be roughly estimated) and specificity (number of relevant records retrieved divided by the total number of records retrieved – this is always known) of our search. The problem in using the MeSH terms is that we may do not know the exact MeSH term for the topic of our interest. PubMed, however, allows us to search and browse the MeSH vocabulary. To do this, we should select the MeSH database from the available databases of the menu on the left and search for a particular topic of interest. The MeSH database will give us some suggestions for some of the MeSH terms that may be related to our search (Fig. 4.13). For example, if we want to search for papers on depression, we can search the MeSH database for depression (Fig. 4.14). We learn that the subject analysts of MEDLINE assign the term “depressive disorder” in
50
papers that include information on depression, independently of whether the author used the term “depression” or some other term, e.g. sadness, melancholia, psychological distress and so on. Searching Pubmed for depression as a text word yields about 200,000 papers, while searching for the MeSH term “depressive disorder” (to do that we write in the pubmed search box “Depressive Disorder” [MH]) yields 50,000 papers. Thus, we have efficiently limited this very broad category by almost 150,000 papers! 2. Publication type Field tag: [PT] This field describes the type of material the article represents and includes values such as “Review”, “Randomized Controlled Trial”, “Meta-Analysis”, “Journal Article”, “Letter” etc. For example, to search for all randomized controlled trials included in the database, we can search for “Randomized Controlled Trials” [PT] (Fig. 4.15). This search returned 247,264 records at the time this text was written. 3. Citation status subset Field tag: [SB] The citation status indicates the processing stage of an article in the PubMed database. This may take the values: “publisher”, “in process” and “medline”. Normally there is a delay from the date a paper appears in the database to the date this has been assessed by the subject analysts. MeSH terms are only available for the records that have been indexed by medline. Records that are “supplied by the publisher” or are “in process” may not have been assigned MeSH terms. Quite often, these papers will be the most recent ones. One has to take this into account when searching with MeSH terms. For this reason, it is always a good practice to supplement the search with a general keyword search in those records that are in process or are supplied by the publisher.
Fig. 4.15 Pubmed Field tag: [PT] search
P. Skapinakis and T. Athanasiou
For the randomized controlled trials we mentioned before we could also search for “randomized controlled trial” AND (“publisher” [SB] OR “in process” [SB]). This search will return 59 records, but not all of them will be randomized controlled trials. When searching for more specific topics, one can search for the text word “randomized” or “random*” in the “publisher” or the “in process” subsets to increased sensitivity.
4.5.2.2 Steps for an Efficient Search Step 1: Start with stating clearly your objectives You should be able to identify the main objective of your question. Think of what would be your ideal article – what question do you want to find an answer to? What would be the design of the study, if you would carry out your own study to answer this question. Step 2: Choose the relevant MeSH terms or keywords This will depend on the specific question you are interested in. It is implied that a basic knowledge of study design and research methodology is required. For example, if you want to search for the best available treatment for depression after a major coronary heart event, it will be important to narrow down your search only to randomized controlled trials. Or, if you need to search for papers on the prognosis of a condition, it will be better to focus on prospective studies only (by searching for the MeSH term “Prospective Studies” [MH]). In the case of a new diagnostic test, searching for the relevant MeSH terms (“Sensitivity and Specificity” [MH]) might help. Step 3. Choose language, year span and any other limitations you wish to apply, such as publication type or age range You should be careful, however, when you apply limitations, and you should only do this if your first search had a very low specificity.
4
Study Design, Statistical Inference and Literature Search in Surgical Research
Step 4: Identify a relevant paper from the search list and find out which MeSH terms have been assigned to it This is a very efficient way to increase the sensitivity and specificity of your search. Sometimes it is difficult to think of all the relevant keywords or MeSH terms. However, we may come across a particular paper that it is very relevant to answer our question (or perhaps such a paper was the reason to start searching the literature in this particular topic). We can identify this paper in pubmed and look at the MeSH terms that have been assigned to it. Then we can include in our search string some of these MeSH terms. Step 5: Search for papers that have cited a very important (preferably older) paper in the chosen topic This is a very important step, especially if your first search did not yield enough relevant papers. In your a
b
Fig. 4.16 (a, b) Using Google Scholar to find citations to a paper
51
topic of interest there may be a very important paper, the one that perhaps started the discussion some years ago. It is reasonable to make the hypothesis that many of the subsequent papers would have cited this first or would have treated this as an importantpaper. Looking at who has cited this paper will give you very relevant papers that you can include to your list. Until very recently, in order to find citations to a paper you should use a service like the one provided by ISI (institute of scientific information), which is not free. Google, however, revolutionized this aspect by bringing us scholar google (scholar.google.com). If you search for a particular paper in google scholar, you can see that below the title on the left there is a hyperlink with the
text (Fig. 4.16a & 4.16b). By clicking on that link you can get a list of all papers citing this particular paper. Certainly, this is a nice present of google to researchers!
52
Using Google Scholar to find citations to a paper Step 6: Search the list of references of the relevant papers This is the last necessary step to make sure that we have retrieved all relevant papers for our topic. Very often we will find papers that we were unable to get from our search, no matter how sensitive it was. Step 7: Use (with caution) the related papers link in Pubmed You may find that clicking on the related papers link on Pubmed returns some useful records, but this technique is not so strong as the steps 5 and 6 that were mentioned before. Therefore, use it with caution. 4.5.2.3 Some Last Tips 1. The use of clinical queries Pubmed provides some very useful search filters in order to help clinicians get high quality results on a given topic. There are filters customized for high sensitivity and specificity on diagnosis, aetiology, prognosis and
Fig. 4.17 Pubmed search using Clinical Queries
P. Skapinakis and T. Athanasiou
therapy (Fig. 4.17). You have the option to choose between a broader or a narrower search (higher or lower sensitivity). The broad filter used for therapy for example is the following: ( (clinical[Title/Abstract] AND trial[Title/Abstract]) OR clinical trials[MeSH Terms] OR clinical trial[Publication Type] OR random*[Title/Abstract] OR random allocation[MeSH Terms] OR therapeutic use[MeSH Subheading])
and the narrow filter is: (randomized controlled trial[Publication Type] OR (randomized[Title/Abstract] AND controlled[Title/Abstract] AND trial[Title/Abstract]) )
and as you can see, it is based heavily on controlled trials, especially randomized controlled trials. (a) Last (but not least): Read the manual and take the online tutorial This is the easiest thing you can do to get familiar with Pubmed, but in our experience, very few physicians have ever taken this tutorial or read the pubmed help manual. Spending some time to read this material or
4
Study Design, Statistical Inference and Literature Search in Surgical Research
take the online tutorials will save you many hours of purposeless searching through the various databases. Knowing how to make efficient searches in the literature should not be considered as a minor procedure. This is a very serious task and all doctors should learn how to do this from very early in their career.
References 1. Last JM (1995) A dictionary of epidemiology, 3rd edn. Oxford University Press, New York 2. Sackett B, Straus S, Richardson WS et al (2000) Evidence based medicine: how to practice and teach EBM, 2nd edn. Churchill Livingstone, Edinburgh 3. MacMahon B, Trichopoulos D (1996) Epidemiology: principles and methods. Little Brown, Boston 4. Ward PR, Noyce PR, St Leger AS (2007) How equitable are GP practice prescribing rates for statins?: an ecological study in four primary care trusts in North West England. Int J Equity Health 6:2 5. Lewis G, Churchill R, Hotopp M (1997) Systematic reviews and meta-analysis. Psychol Med 27:3–7 6. Skapinakis P, Lewis G, Meltzer H (2000) Clarifying the relationship between unexplained chronic fatigue and psychiatric morbidity: results from a community survey in Great Britain. Am J Psychiatry 157:1492–1498
53
7. Kendell RE, Juszczak E, Cole SK (1996) Obstetric complications and schizophrenia: a case control study based on standardised obstetric records. Br J Psychiatry 168: 556–561 8. Kendell RE, McInneny K, Juszczak E et al (2000) Obstetric complications and schizophrenia. Two case-control studies based on structured obstetric records. Br J Psychiatry 176: 516–522 9. Wilson PW, Abbott RD, Castelli WP (1988) High density lipoprotein cholesterol and mortality. The Framingham Heart Study. Arteriosclerosis 8:737–741 10. Hotopf M, Churchill R, Lewis G (1999) Pragmatic randomised controlled trials in psychiatry. Br J Psychiatry 175: 217–223 11. Pocock SJ (1983) Clinical trials: a practical approach. Wiley, Chichester 12. Kunz R, Oxman AD (1998) The unpredictability paradox: review of empirical comparisons of randomised and nonrandomised clinical trials. BMJ 317:1185–1190 13. Black N (1996) Why we need observational studies to evaluate the effectiveness of health care. BMJ 312: 1215–1218 14. McKee M, Britton A, Black N et al (1999) Methods in health services research. Interpreting the evidence: choosing between randomised and non-randomised studies. BMJ 319: 312–315 15. Thompson SG (1994) Why sources of heterogeneity in meta-analysis should be investigated. BMJ 309: 1351–1355 16. Lau J, Ioannidis JP, Schmid CH (1998) Summing up evidence: one answer is not always enough. Lancet 351: 123–127 17. Muir Gray JA (1997) Evidence-based health care. how to make health policy and management decisions. Churchill Livingstone, New York
5
Randomised Controlled Trials: What the Surgeon Needs to Know Marcus Flather, Belinda Lees, and John Pepper
Contents 5.1
Introduction ............................................................
55
5.2
Current Concepts in Clinical Trials .....................
56
5.3
Basic Concepts of Trial Design..............................
56
5.4
What Do I Need to Know Before Starting a Clinical Trial? ......................................................
57
5.5
How Do I Evaluate Surgical Procedures? ............
59
5.6
The Process of Surgical Evaluation ......................
59
5.7
What Do I Do When the Learning Curve Has Been Completed and the New Procedure Appears Feasible and Safe? ................
60
5.8
Randomised Trials of Surgical Procedures ..........
60
5.9
Selection of Outcomes for Surgical Trials ............
62
5.10
Practical Issues for Designing Randomised Trials in Surgery .....................................................
63
5.10.1 5.10.2 5.10.3 5.10.4 5.10.5 5.10.6
Eligibility Criteria .................................................... Process of Randomisation ........................................ Blinding of Treatment Allocations ........................... Sample Size and Enrolment Issues .......................... Costs of Doing Surgical Trials ................................. Balance of Benefits and Risk ...................................
63 63 63 63 64 64
5.11
Conclusions .............................................................
64
References ...........................................................................
64
Abstract In this chapter, we discuss some of the methodological issues in the design and conduct of clinical trials of surgical procedures. The opinion of experts in trial design, management and analysis should be sought at an early stage of a surgical clinical trial programme, recognising that other study designs may also be important including the prospective protocol-driven registry and the case control study.
5.1 Introduction
M. Flather () Clinical Trials and Evaluation Unit, Royal Brompton Hospital and Imperial College, London SW3 6NP, UK e-mail: [email protected]
The basic principle of assessing the usefulness of treatments can be applied across the whole spectrum of diseases whether they are complex surgical procedures or simple to administer pharmacological therapies. Answering a clinically important question reliably is the basis of any therapeutic evaluation [1]. In a notorious article in the Lancet in 1996, Richard Horton, the editor, used the title “Surgical research or comic opera: questions, but few answers” [2] and raised concerns about the methods used to evaluate surgical treatments and the research base of most surgical procedures. He argued that most of the information about surgical procedures was observational or anecdotal, whereas spending on surgery using this evidence was a major factor in NHS health care costs.Surgeons were naturally upset by this accusation, especially as there had already been a number of important randomised trials involving surgical treatments [3, 4], but it seemed true that most trials were small and not conclusive. However, the message hit home, and there are growing departments of academic surgery where clinical trials and other systematic evaluations are taking place alongside more traditional basic, experimental and observational science, and surgeons are seeking new skills in the design and interpretation of clinical research and linking with specialist trials units [5].
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_5, © Springer-Verlag Berlin Heidelberg 2010
55
56
How do we reliably evaluate surgically-based treatments, which by their very nature are complex, and invasive strategies involving many health professionals (surgeon, anaesthetist, specialist nurses, intensivists, etc.), specialist facilities (operating rooms, intensive care, specialist wards) and equipment (special instruments, cardiolpulmonary bypass, prosthesis)? If we define a “treatment” as “any activity, procedure or drug provided to a patient with the intention of improving health”, we can, for the purposes of designing clinical trials and other evaluations, suppose that all treatments are like a “black box”, and irrespective of its contents, we can evaluate one treatment much the same as another. While this concept is intuitively attractive, the complexity of the treatment has a huge impact on the practical elements of trial design, delivery of the treatment and its costs, and therefore, it is too simplistic to imagine that surgical treatments can be evaluated in randomised controlled trials in the same way as drug treatments [6].
5.2 Current Concepts in Clinical Trials In essence, clinical trials should not be regarded in any different way to an experiment in the laboratory. A hypothesis is posed, an experiment designed with “treatment” and “control” groups, observations made and analysed, and inferences made about whether the hypothesis has been supported or not. It is probably the multi-dimensional approach and organisational complexity of clinical trials that set them apart from laboratory experiments and also the fact that subjects taking part are human. Clinical trials are also often complicated and costly to set up and run. A broad range of treatments can be evaluated. Trials can be led and organised by independent academic groups to improve health care and enhance reputations, or by industry to improve longerterm profits. Industry and academia have very different philosophies, cultures and aims, and yet many of the most successful health care improvements have been the result of fruitful partnerships between industry on one hand, with its expertise in new product development and commercial drive, and academic investigators on the other, who have experience of dealing with patients and their diseases and who understand the application of potentially complex trial protocols [7]. This partnership is often required in surgical research as most surgical treatments involve
M. Flather et al.
innovative instruments and devices designed to make operations easier, safer and more effective.
5.3 Basic Concepts of Trial Design One key issue to address is “what is a clinical trial” as this term has many different apparent meanings. In its simplest form, a clinical trial is a systematic evaluation of a treatment in human subjects comparing the treatment of interest with a control. Usually, a clinical trial is synonymous with “randomised controlled trial” in which the experimental treatment and control treatment are allocated to participating subjects in a random manner. Study design is critically dependent on the existing knowledge of the treatment under evaluation. For example, if we were to evaluate aspirin for new indication (e.g. bowel cancer), the prior knowledge and experience of aspirin in clinical trials and clinical practice (literally millions of patient-years of experience) would inform the rate of expected side effects and it would be the efficacy of treatment in this indication that would be the major unknown. However, if a new surgical procedure, e.g. left ventricular trans apical implantation of an aortic valve, was being evaluated, then safety and feasibility of the procedure are the main goal. Thus, in the evaluation of treatments with relatively little human experience, safety and feasibility are the key early goals. In spite of this, it is often not acceptable to perform safety and feasibility trials without trying to detect some additional information on efficacy. For treatments in the early stages of development, these outcome measures will usually be determined by the postulated mechanism of benefit of the treatment. For example, a clinical trial designed to test a new knee prosthesis may initially study parameters such as ease of insertion, length of operation and post-operative function, whereas with more experience, the main aims would be longer-term durability, function and costs [8]. Outcome measures that ultimately determine whether a treatment could be used routinely in clinical practice are as follows: 1. Clear and reliable evidence of efficacy on clinically important parameters (reduction of clinically important outcome events or improvement of symptoms) 2. Acceptable safety 3. Acceptable health care costs
5
Randomised Controlled Trials: What the Surgeon Needs to Know
Fig. 5.1 Proposed sequence of evaluation of surgical procedures. In this diagram, we have emphasised the key roles of identifying the clinical question to be addressed and the prospective protocol-driven registry. Randomised trials are desirable, but may not be practical in all situations
Small Case series
57 Laboratory studies
Animal Models
Clinical question and hypothesis
Literature review, rationale, protocol Funding applications
Prospective observational protocol driven registry
Smaller randomised trials
Case control studies
Larger randomised trials
Larger observational registries
Overall assessment of safety, efficacy, resourse use and cost-effectiveness
In order to achieve these goals, we need to have reasonable evidence of safety, feasibility and efficacy on intermediate or mechanistic measures prior to embarking on large and resource hungry trials that may take several years to complete. In addition, it is almost impossible to obtain funding for larger randomised trials without this basic information. Examples of mechanistic variables include reduction of blood pressure to avoid future stroke, or measurement of international normalised ratio (INR) to monitor warfarin efficacy in patients with atrial fibrillation. Figure 5.1 shows a sequence of evaluation of potential new surgical treatments.
5.4 What Do I Need to Know Before Starting a Clinical Trial? There are two main aspects that need to be covered before a clinical trial programme can be set up. First, we need to know the “basics”, which are the building blocks of knowledge that underpin the scientific, philosophical, organisational and analytical aspects of clinical trials such as the hypothesis, randomisation, sample size and protocol (Fig. 5.2). Second, we need to establish “resources” made up of personnel with skills to lead a clinical trial (motivated and knowledgeable investigators), experts in design, management and analysis of trials (usually found in specialised statistical clinical trials units), personnel to identify, enrol and
Ingredients for a successful clinical trial Knowledge of the disease (pathophysiology, epidemiology)
Understanding how the treatment works
Measuring appropriate outcomes in the right way
Large enough study to detect plausible treatment differences
Fig. 5.2 Ingredients for a successful clinical trial. When identifying the question to be addressed in a clinical trial, certain key criteria need to be met to reduce the chance of failure. Most of these parameters are intuitive, but it is surprising how many trials are implemented when one or more of these criteria are not met
follow-up participating patients (usually found in hospitals or community care facilities) and funds to cover the costs of the clinical trial [9, 10]. We can deal with cost and organisational issues first, as these are conceptually simpler than the methodological issues involved. Clinical trials are by definition complicated ventures. It is almost impossible to fund even small clinical trials (e.g. 30–40 patients) using available health care resources. Funding to properly cover the
58
M. Flather et al.
costs of enrolling and following-up patients, study management (data management, study monitoring and statistical analysis) and additional tests for research purposes must be obtained in advance of a study starting. There are multiple sources of funding for clinical trials ranging from commercial companies, independent grant funders, to government organisations and private benefactors. Inevitably obtaining funds from any of these organisations is extremely competitive and timeconsuming. The time taken to prepare a good application including the scientific rationale, plan of investigation and costs is usually 6–12 months, and a further 1–2 years are needed for funds to be awarded if the application is successful, and this time scale seems to hold true irrespective of whether the funding application is made to a commercial or independent source. Any clinical research involving human subjects is governed by complicated ethical codes of conduct and laws (“regulations”). Most countries have government organisations to regulate the use of medicines and medical devices for clinical care and for research with the aim of trying to ensure that effective and safe medicines are developed and marketed. The most well known of these is the Food and Drug Administration (FDA) of the United States. In the UK, the Medicines and Healthcare Regulation Agency (MHRA) performs a similar role and the European Medicines Evaluation Agency (EMEA) has a growing role to regulate medicines across the European Community. Approval is
required from the appropriate national agencies for most clinical trials involving drugs or devices. Any clinical research involving human subjects requires approval from an independent, properly constituted research ethics committee [11]. The exact nature of these approvals varies from country to country, but the principles are the same. The final legal hurdle before starting a trial is approval from the institution where the research is being carried out. This usually follows signing of a contract with the sponsor of the trial. The sponsor is the legal entity which takes overall responsibility for the conduct and quality of the trial. Sponsors are often commercial companies, especially for drug or new device development trials, but increasingly Universities and academic hospitals are sponsoring clinical trials. In many cases, the sponsor may delegate the running of a trial to another group, for example, a contract research organisation in the case of a pharmaceutical company, or an academic clinical trials group for independent trials. In any case, the sponsor has a responsibility to ensure that the trial runs properly and complies with ethical and legal requirements. This is usually done by interim monitoring visits to participating sites to verify that informed consent has been obtained, the protocol is being followed and that other aspects are of high quality including drug storage, procedures and investigations and recording of data. Figure 5.3 summarises a number of key concepts in setting up a clinical trial.
Collaboration with Trials unit and Statistical team
Fig. 5.3 Key stages of starting a clinical trial. This diagram shows the main steps that need to be completed when starting a clinical trial. Collaboration with an experienced trials unit and statistical team is essential at the beginning of the process. Once funding has been obtained, there are many stages to complete, and it may take up to a year before the first patient can be enrolled. Trials comparing two accepted surgical procedures generally do not require regulatory approval, but these approvals are needed for new devices. For multi-centre trials, centres may start enrolment at very different times mainly due to delays in obtaining local agreements
Funding available
Full protocol written
Who is the sponsor?
Agree data collection method
Set up committees (Steering, Safety and Adjudication)
Regulatory approval
Ethical approvals
Sponsor approval
Agreements with other centres
Site Monitoring plan
Training of Sites
Supply of study materials
Local Site Approvals
Start of enrolment
5
Randomised Controlled Trials: What the Surgeon Needs to Know
5.5 How Do I Evaluate Surgical Procedures? We have established that a surgical procedure is a complicated intervention, and this poses a number of important challenges to the design and conduct of clinical research in surgery. Surgical procedures can only be carried out in settings that are able to support them. For example, even basic issues of aseptic technique, appropriate anaesthesia and post-operative care are usually found only in a hospital setting. Trained professionals are needed to carry out and support these procedures. Complex surgery such as neurosurgery, organ transplantation and cardiac surgery can only be carried in highly specialised centres. In addition, these procedures are expensive: costs include those of health care professionals, specialised equipment, hospital intensive care and medicines. Against this background it has been small wonder that the reliable evaluation of surgical procedures is methodologically difficult and many of the fundamental issues including study design, ethics of randomisation, blinding of treatments and definition of outcome measures are still in development [9, 10]. There is a lot of debate about whether surgical procedures should be evaluated through randomised trials because of potential logistic, ethical, methodological and funding issues. There is no simple answer to this question, but many surgical procedures, especially those that are well established, can certainly be compared with alternative treatment strategies [11, 12]. Problems arise when new or very complex surgical strategies are subjected to the rigorous environment of the randomised trial as the variability of the procedure itself (e.g. operator differences and variability in postoperative care) may be larger than the potential benefits that can be detected. These issues are discussed further below.
5.6 The Process of Surgical Evaluation As seekers of information and evidence, we must start with the premise that all health care treatments should be evaluated. The simplest evaluation, which should be routine for all patients and procedures, is a comprehensive but practical description of what has been
59
done, to whom and what the consequences were. This is the most basic of all evaluations: the collection of high quality observational data. In the UK, this is often called “audit”, and in North America, “outcomes research” [13, 14]. Sadly most health care systems fall far short of this basic criterion. In surgery, it is vital that key information from all operations is properly documented on an electronic database including subsequent outcomes of patients, that this data is checked for quality and tabulation, and that analyses and reports are produced by experts with relevant expertise. The definition of surgical mortality is vital status at 30 days, but this is a crude metric: patients should be followed for at least a year and preferably longer for major operations using national registers of death and health outcomes. Some operations are already recorded on national databases of cardiac surgery in the UK and North America [15–17]. A new surgical procedure can loosely be defined as one which involves an operative technique which has been developed relatively recently (e.g. within the previous 2 years) or one that involves a new piece of equipment (prosthesis or surgical instrument). These procedures should all start with a systematic evaluation in a prospective, observational protocol-driven registry (Fig. 5.1). This process involves writing a comprehensive protocol summarising the rationale for the new procedure, a comprehensive literature search, aims and hypotheses, eligible patients, description of the procedure and outcomes of interest. The outcomes will include simple clinical ones (death or major complications), but should also include more detailed evaluations of quality outcomes, e.g. function after hip replacement using a new prosthesis or haemodynamic characteristics of replacement heart valves using echocardiography. These protocols should undergo peer review and approval for ethical, practical, cost and safety considerations. Periodic review of the results should be made and general conclusions about safety and efficacy can be made. In its simplest form, there is no control group for these protocol-driven registries, and therefore, the next reasonable step is to add a control group, and the simplest method is to use a case control design [18]. In this study, design information from concurrent “control” patients receiving a more conventional surgical procedure for the same condition is also collected and evaluated. Simple comparisons can be made between patients who have similar characteristics based on age,
60
gender, severity of disease, etc. The case control design is of course prone to many biases, not least that the cases and controls may differ in important ways, but it is simple to carry out and does not require complex preparation, approval or costs associated with randomised trials. Case control studies for surgical procedures serve us best if they are carried out within the same institution, but if this is not possible, seeking cases and controls from different institutions is also helpful [19]. Probably the least useful is the use of historical controls (patients who have had procedures in the year or two previously) as this introduces many more biases, in particular the cases and historical controls are likely to be different due to changes in treatment over time which can lead to very misleading conclusions. Case control studies can provide relatively reliable information on the length of procedures, resource use and outcome measures, but can rarely provide conclusive information that the new procedure is better than an existing one. It is fair to say that “traditional” evaluation of surgical procedures has involved case series (a simpler form of the protocol-driven registry) and loose comparisons with previous case series which is a type of case control design, but our recommendations take these traditional study designs to more rigorous and modern standards. Where can we go beyond the case control design? The key question that needs to be answered after the case control study is: “is the procedure ready for routine use or are there fundamental aspects that need refining?” Associated with this question is the whole issue of the “learning curve”. The learning curve has not been properly defined, but it is a concept familiar to all surgeons and those evaluating surgical procedures [20, 21]. In its basic form, this refers to the time taken to become familiar with the new procedure both from the operative point of view and anaesthetic and postoperative care. We are assuming that surgeons, operating staff and anaesthetists are all experienced so the learning curve refers to the period of assimilation of a new procedure by highly trained health professionals. We do not really know when the learning curve has finished because, of course, we are always “learning” and even established procedures are always being refined, and the learning process will be different for each new procedure. However, experience and judgement can tell us when we have mastered the basic issues of a new procedure, and ideally, these parameters should be prespecified when we start the programme.
M. Flather et al.
5.7 What Do I Do When the Learning Curve Has Been Completed and the New Procedure Appears Feasible and Safe? When a new procedure appears to be feasible and safe and to offer genuine advantages over the more conventional approaches in an ideal world, it should be subjected to a robust comparison with the existing procedure or treatment. A randomised comparison between two surgical procedures is generally quite feasible if sufficient numbers can be entered into a study to make the results meaningful. To determine the effects of one procedure vs. another on mortality or major clinical outcomes (serious infections, myocardial infarction, stroke, cancer recurrence etc) may require large numbers of patients which is often impractical under present funding and organisational systems. A simple example of a barrier to larger surgical randomised trials is the difficulty of obtaining independent (non-commercial) funds to carry out studies in different countries. The National Institutes of Health has an established funding mechanism in the United States to support larger studies of surgical treatments. Procedures for supporting multi-national clinical trials in the European Union are evolving, but the amount of financial support for these studies is still relatively small. Companies developing new devices for use in surgery are not able to invest large sums of money that are often spent on pharmaceutical drug development because the markets for surgical products are much smaller and the devices expensive to provide for larger randomised trials. In spite of these funding issues, there have been a number of successful large randomised trials in surgery including trials of carotid endarterectomy and coronary revascularisation [22–25].
5.8 Randomised Trials of Surgical Procedures Randomised trials of any complex treatment, especially surgical interventions, poses a number of additional methodological, design and ethical issues compared to simpler pharmacological treatments [26]. Prior to planning a randomised comparison of a promising new surgical procedure, we need to satisfy four major, related questions:
5
Randomised Controlled Trials: What the Surgeon Needs to Know
1. Do the results of a carefully conducted observational protocol-driven registry (case series) satisfy the basic criteria of feasibility and safety? Pragmatically, we might say that between 50 and 100 such cases need to be carried to have any hope of providing reliable observational data. 2. Is there sufficient experience of carrying out the new procedure to allow it to be tested against established surgical or non-surgical alternatives (have we gone beyond the learning curve?)? 3. Is there sufficient enthusiasm and support from the surgical community to introduce the new technique and, therefore, subject it to reliable evaluation in a randomised trial? 4. What will be the comparator for the new technique? Will it be another surgical or interventional treatment or “medical” therapy? Identifying the comparator group is often more difficult than it seems. When the new procedure is a variation of a previous operation, or involves a new prosthesis or device, it may be relatively simple to select an appropriate comparator. For example, insertion of a new aortic
61
valve may be simply compared against implanting the conventional valve, resection of a bowel tumour using different operative methods new vs. old. In these cases, the only real variation between the new and established procedure is the difference in technique or device – the pre-operative work-up, anaesthesia, basic operative methods, use of circulatory support, recovery methods and post-operative care are essentially the same. Thus, any variations in outcomes may be reasonably attributed to the new surgical methods. Problems of comparison arise when the two surgical methods vary considerably (more invasive vs. less invasive), if a surgical procedure is compared to a percutaneous procedure, or even more problematic if a surgical procedure is compared to medical therapy. Figure 5.4 summarises some of the concepts of study design and comparator groups for surgical studies. In these situations, there are many variables involved, but the way we generally manage these comparisons is whether the treatments given to the two groups are regarded as “black boxes”, i.e. the multiple variables between the two treatments are ignored, and whether we simply compare the outcomes between the two groups
Surgery vs medical
Comparison group
More complicated Surgery vs other intervention Surgery vs surgery Case matched control
No Control
Less complicated
Case series
Protocol driven registry
Case control study
Small randomised trial
Large randomised trial
Study Design Fig. 5.4 Schematic interaction of study design and comparison group on the evaluation of surgical procedures. This figure shows that large randomised trials are the most complicated study design to implement in surgery (X axis), and that comparing surgery vs. no surgery is difficult to implement in practice. A hypothetical
relationship is proposed between the complexity of study design and the feasibility of the comparison group. The comparison of one surgical procedure to another similar one (e.g. comparison of two surgical methods to remove a bowel tumour) is the most feasible to implement in practice in a randomised controlled trial
62
as we would for a comparison of a pharmacological treatment vs. placebo. Clearly comparing a surgical procedure against medical therapy is a difficult task, not least because in clinical practice, we would rarely give a patient a true choice between a surgical operation and no operation. A full discussion of these issues is beyond the scope of this chapter.
5.9 Selection of Outcomes for Surgical Trials Evaluation of new surgical procedures should follow the simple rules for all new treatments: 1. Understanding how the new procedure might provide additional benefits from experimental models (in vitro, laboratory and animal models) 2. Safety and feasibility in patients 3. Measures of mechanistic improvements specific for that procedure (e.g. for new heart valves better haemodynamics, or for knee replacement better mobility) and general improvements (less infections, less blood loss, shorter operations, quicker recovery times, etc.) 4. Improved clinical outcomes (better survival, reduced complications, better quality of life, etc.) in properly controlled studies Appropriate selection of outcomes is very important for all research studies. When attempting to demonstrate that a new surgical procedure is potentially clinically worthwhile, demonstration of “mechanistic efficacy” is critical in addition to safety and feasibility. Thus, if we are developing a new knee joint, we must, as a very basic goal, demonstrate that it can be implanted safely (similar, or fewer, operative and post operative complications than the standard existing knee prostheses) and feasibility (the operation does not require extra staff, unreasonable extra operative time or is much more costly). In addition, the functional status of patients with the new joint must be compared rigorously with patients with the standard knee joint. Outcome measures for this might include recovery time, presence of pain, extent of knee movement, walking ability and durability over a reasonable time period (e.g. 1 year) and costs. The most robust way to do this is in a randomised trial, but a carefully designed case control study could provide reasonable evidence while randomised trials are being prepared. For conditions
M. Flather et al.
which are non-life threatening like knee and hip degeneration, properly designed randomised trials of mechanistic outcomes and costs will generally be sufficient to provide a level evidence to inform a decision to introduce the new procedure or not. Issues of subgroup analysis and external validity of surgical trials are also beyond the scope of this chapter but are discussed in detail in several articles [26–29]. Health economic issues are increasingly influencing decisions regarding whether to introduce new surgical procedures or not. Common sense tells us that we should weigh up the costs and effectiveness of new surgical procedures just as we would when buying a new car (although we might also weigh up the “status” effect of a more expensive car as well as its “effectiveness” at transporting us from A to B). How we go about doing this is complicated by the fact that metrics for measuring cost-effectiveness are still evolving [30, 31]. The quality adjusted life year (QALY) is proving to be the most popular, and in theory, can be applied to all health care outcomes irrespective of the disease or intervention [32]. From the practising surgeon’s point of view, the most important aspect is to collect information on the use of key “cost-drivers”, i.e. those aspects of health care that make up most of the costs of a surgical procedure. These are some of the common cost drivers for surgical procedures: 1. 2. 3. 4. 5. 6. 7.
Staff time Use of the operating room Disposable equipment Prosthesis, implants or devices Intensive or high dependency care Length of hospital stay Expensive associated treatments (antibiotics, immunosuppressants, blood products)
To perform a detailed analysis of all potential costs is in itself a costly and time consuming (the so-called “bottom-up” cost analysis), so most of the time, the most expensive cost drivers are identified and are used to calculate costs. For surgical procedures that are designed to treat common life threatening illnesses like cancer or cardiovascular diseases, it is appropriate to select important adverse clinical outcomes for randomised trials that compare new procedures with existing ones. Common clinical outcomes include survival, freedom from disease recurrence (e.g. cancer or cardiovascular events) and major complications (bleeding, infection,
5
Randomised Controlled Trials: What the Surgeon Needs to Know
bowel obstruction). The selection of outcome is specific to the disease being managed and the procedure under evaluation. Clearly it would be inappropriate to select survival as the main outcome for surgical procedures that are designed to improve quality of life such as knee and hip surgery.
63
We have summarised the main issues in the subsections below.
time of day or night. Similarly data collection using web-based electronic forms is also becoming more common. One of the key issues in the randomisation process is that in some cases eligibility can only be determined during the operative procedure. In this case we recommend that patients are screened prior to surgery as usual, informed consent obtained and any preoperative tests carried out. The patient is then registered as being “trial eligible”, but not randomised [34]. During the operation, if eligibility is confirmed, the patient can then be randomised by telephone or internet. This process ensures that study drop-outs (patients randomised but who do not receive their allocated procedure) are kept to a minimum.
5.10.1 Eligibility Criteria
5.10.3 Blinding of Treatment Allocations
Patients undergoing surgery for “routine” clinical indications are already subjected to selection criteria (can they withstand an anaesthetic, do they have particular high-risk features, etc.) and when enrolled in a clinical trial, further selection criteria are used. Thus, the generalisability of a surgical study (ability to apply the results outside of the trial to more general clinical populations) may be quite limited [27, 29]. Thus, it is important to keep the eligibility criteria as broad as possible, and this is also important to maximise enrolment and achieve the planned sample size.
It is generally accepted that evaluation of surgical procedures will not be done in a blinded manner, i.e. the investigator and the patient know which procedure has been carried out [12]. This can obviously lead to bias in assessing outcome measures unless these are very robust like mortality. For mechanistic outcomes such as walking ability, heart function, lung function, etc., assessments in an “open label” study (where allocations are known) need to be made by observers blinded to the original allocation using a PROBE (prospective observational blinded endpoint) design [35].
5.10.2 Process of Randomisation
5.10.4 Sample Size and Enrolment Issues
Randomisation should be carried by a unit with expertise in this area. Methods using envelopes are considered to be obsolete because of the high rates of tampering and bias. Randomisation methods using telephone or internet are considered to be standard [33]. Investigators can either call the randomisation centre and speak to an individual who will provide the randomisation allocation, or receive an allocation using an automated telephone-based randomisation system which requires keying in numeric information before the allocation is provided. Web-based systems are gaining popularity and investigators can obtain the required study treatment allocation by entering their study site details and a basic eligibility checklist any
Surgical procedures usually require substantial resources and most are carried out in the hospital setting. More complex surgical procedures, which require evaluation in clinical trials, may be uncommon, so the notion that we can perform large studies of complex surgical procedures assessing their impact on robust clinical outcomes is often not feasible, although large randomised trials in surgery have been carried out [36, 37]. Thus, we need innovative study designs and appropriate robust outcomes to evaluate complex surgical interventions. Unfortunately this is precisely an area where we do not have the methodological solutions to the problem. The problem is partly alleviated if the complex procedures are relatively common such
5.10 Practical Issues for Designing Randomised Trials in Surgery
64
as coronary artery bypass grafting, carotid endarterectomy, hip replacement or bowel resection for cancer. Enrolment in surgical trials is also a challenge and for most studies it is important to obtain agreement from as many centres as possible to collaborate in a multicentre study to help achieve the required sample size.
5.10.5 Costs of Doing Surgical Trials Administrative, ethical and cost issues are a great barrier to the implementation of surgical trials. Health care providers may demand reimbursement for some or all of the costs of surgical procedures and devices when they are evaluated in clinical trials. We need processes to make it much easier to carry out high quality randomised trials in surgery. One proposal is to have an ongoing register of planned or ongoing randomised studies of surgical procedures on a national or international basis, and all centres with the necessary expertise are notified and invited to participate. When a patient is scheduled for an operation, they are considered for a trial if one exists for that procedure. In this scenario, additional costs of running a trial at a centre over above routine care are not really required, and simple and limited data collection can be carried using a web-based interface.
5.10.6 Balance of Benefits and Risk Elective surgical procedures almost always carry a higher earlier risk than not carrying out any procedure. The benefits of surgical procedures accrue over time, hence the need to follow patients properly. Some complex, high-risk surgical procedures are carried out for palliation, symptom control and because there are very few other alternatives for patients. In these situations, trying to prove that the procedure prolongs life or improves outcomes can be difficult because of the high early morbidity and mortality associated with the procedure itself. Some examples include surgery for major trauma, repair of ruptured aortic aneurysm and surgery for symptomatic cancers (“bulk removal”) where the chance of cure is small. Trying to prove one strategy is better than another can be very difficult and a lot of thought and time needs to go into the design and
M. Flather et al.
co-ordination of appropriate clinical trials and other evaluations, and the appropriate use of surrogate outcomes to determine of larger studies is warranted [38].
5.11 Conclusions Following traditional rules for the design and conduct of clinical trials in surgery is intuitively attractive, but there are many hurdles related to the complexity of the interventions being assessed, associated costs of running trials and major design and ethical issues including the ability to enrol enough patients into surgical trials. These problems should not be an absolute barrier to surgical trials, but they should be recognised by those involved in designing and running these trials as well as those who review grant applications and manuscripts submitted for publication. The opinion of experts in trial design, management and analysis should be sought at an early stage of a surgical clinical trial programme, recognising that other study designs may also be important including the prospective protocoldriven registry and the case control study. However, the basic elements of every surgical procedure should be recorded (patient characteristics, procedural details, associated treatments and short and long-term outcomes). These data should be pooled in national and international registries to evaluate the current standards of care and provide hypotheses for improvement. Similarly the results of smaller surgical trials can be carefully pooled, ideally using individual patient data, in collaborative meta-analyses [39, 40]. Without these basic steps the careful evaluation of surgery cannot advance, and the development of large-scale clinical trials for the evaluation of selected surgical procedures may be impossible.
References 1. Peto R, Collins R, Gray R (1995) Large-scale randomized evidence: large, simple trials and overviews of trials. J Clin Epidemiol 48:23–40 2. Horton R (1996) Surgical research or comic opera: questions, but few answers. Lancet 347:984–985 3. Sculpher MJ, Seed P, Henderson RA et al (1994) Health service costs of coronary angioplasty and coronary artery bypass surgery: the Randomised Intervention Treatment of Angina (RITA) trial. Lancet 344:927–930
5
Randomised Controlled Trials: What the Surgeon Needs to Know
4. Yusuf S, Zucker D, Peduzzi P et al (1994) Effect of coronary artery bypass graft surgery on survival: overview of 10-year results from randomised trials by the Coronary Artery Bypass Graft Surgery Trialists Collaboration. Lancet 344: 563–570 5. Rahbari NN, Diener MK, Fischer L et al (2008) A concept for trial institutions focussing on randomised controlled trials in surgery. Trials 9:3 6. Garrett MD, Walton MI, McDonald E et al (2003) The contemporary drug development process: advances and challenges in preclinical and clinical development. Prog Cell Cycle Res 5:145–158 7. Feldman AM, Koch WJ, Force TL (2007) Developing strategies to link basic cardiovascular sciences with clinical drug development: another opportunity for translational sciences. Clin Pharmacol Ther 81:887–892 8. Boutron I, Ravaud P, Nizard R (2007) The design and assessment of prospective randomised, controlled trials in orthopaedic surgery. J Bone Joint Surg Br 89:858–863 9. Balasubramanian SP, Wiener M, Alshameeri Z et al (2006) Standards of reporting of randomized controlled trials in general surgery: can we do better? Ann Surg 244:663–667 10. Tiruvoipati R, Balasubramanian SP, Atturu G et al (2006) Improving the quality of reporting randomized controlled trials in cardiothoracic surgery: the way forward. J Thorac Cardiovasc Surg 132:233–240 11. Burger I, Sugarman J, Goodman SN (2006) Ethical issues in evidence-based surgery. Surg Clin North Am 86:151–168; x 12. Boyle K, Batzer FR (2007) Is a placebo-controlled surgical trial an oxymoron? J Minim Invasive Gynecol 14:278–283 13. Mann CJ (2003) Observational research methods. Research design II: cohort, cross sectional, and case-control studies. Emerg Med J 20:54–60 14. Wolfe F (1999) Critical issues in longitudinal and observational studies: purpose, short versus long term, selection of study instruments, methods, outcomes, and biases. J Rheumatol 26: 469–472 15. Bridgewater B, Grayson AD, Brooks N et al (2007) Has the publication of cardiac surgery outcome data been associated with changes in practice in northwest England: an analysis of 25,730 patients undergoing CABG surgery under 30 surgeons over eight years. Heart 93:744–748 16. Ferguson TB Jr, Dziuban SW Jr, Edwards FH et al (2000) The STS national database: current changes and challenges for the new millennium. Committee to Establish a National Database in Cardiothoracic Surgery, The Society of Thoracic Surgeons. Ann Thorac Surg 69:680–691 17. Keogh BE, Bridgewater B (2007) Toward public disclosure of surgical results: experience of cardiac surgery in the United Kingdom. Thorac Surg Clin 17:403–411; vii 18. Chautard J, Alves A, Zalinski S et al (2008) Laparoscopic colorectal surgery in elderly patients: a matched case-control study in 178 patients. J Am Coll Surg 206:255–260 19. Zondervan KT, Cardon LR, Kennedy SH (2002) What makes a good case-control study? Design issues for complex traits such as endometriosis. Hum Reprod 17:1415–1423 20. Moran BJ (2006) Decision-making and technical factors account for the learning curve in complex surgery. J Public Health (Oxf) 28:375–378 21. Murphy GJ, Rogers CA, Caputo M et al (2005) Acquiring proficiency in off-pump surgery: traversing the learning
65
curve, reproducibility, and quality control. Ann Thorac Surg 80:1965–1970 22. Anon (1998) Randomised trial of endarterectomy for recently symptomatic carotid stenosis: final results of the MRC European Carotid Surgery Trial (ECST). Lancet 351: 1379–1387 23. Anon (2002) Coronary artery bypass surgery versus percutaneous coronary intervention with stent implantation in patients with multivessel coronary artery disease (the Stent or Surgery trial): a randomised controlled trial. Lancet 360: 965–970 24. Barnett HJ, Taylor DW, Eliasziw M et al (1998) Benefit of carotid endarterectomy in patients with symptomatic moderate or severe stenosis. North American Symptomatic Carotid Endarterectomy Trial Collaborators. N Engl J Med 339: 1415–1425 25. Serruys PW, Unger F, Sousa JE et al (2001) Comparison of coronary-artery bypass surgery and stenting for the treatment of multivessel disease. N Engl J Med 344:1117–1124 26. Rothwell PM, Mehta Z, Howard SC et al (2005) Treating individuals 3: from subgroups to individuals: general principles and the example of carotid endarterectomy. Lancet 365:256–265 27. Flather M, Delahunty N, Collinson J (2006) Generalizing results of randomized trials to clinical practice: reliability and cautions. Clin Trials 3:508–512 28. Rothwell PM (2005) Treating individuals 2. Subgroup analysis in randomised controlled trials: importance, indications, and interpretation. Lancet 365:176–186 29. Rothwell PM (2005) External validity of randomised controlled trials: “to whom do the results of this trial apply?” Lancet 365:82–93 30. Aziz O, Rao C, Panesar SS et al (2007) Meta-analysis of minimally invasive internal thoracic artery bypass versus percutaneous revascularisation for isolated lesions of the left anterior descending artery. BMJ 334:617 31. Rao C, Aziz O, Panesar SS et al (2007) Cost effectiveness analysis of minimally invasive internal thoracic artery bypass versus percutaneous revascularisation for isolated lesions of the left anterior descending artery. BMJ 334:621 32. McNamee P (2007) What difference does it make? The calculation of QALY gains from health profiles using patient and general population values. Health Policy 84:321–331 33. Vaz D, Santos L, Machado M et al (2004) Randomization methods in clinical trials. Rev Port Cardiol 23:741–755 34. Perez de Arenaza D, Lees B, Flather M et al (2005) Randomized comparison of stentless versus stented valves for aortic stenosis: effects on left ventricular mass. Circulation 112:2696–2702 35. Smith DH, Neutel JM, Lacourciere Y et al (2003) Prospective, randomized, open-label, blinded-endpoint (PROBE) designed trials yield the same results as double-blind, placebo-controlled trials with respect to ABPM measurements. J Hypertens 21:1291–1298 36. Qureshi AI, Hutson AD, Harbaugh RE et al (2004) Methods and design considerations for randomized clinical trials evaluating surgical or endovascular treatments for cerebrovascular diseases. Neurosurgery 54:248–264; discussion 264–267 37. Taggart DP, Lees B, Gray A et al (2006) Protocol for the arterial revascularisation trial (ART). A randomised trial to compare
66 survival following bilateral versus single internal mammary grafting in coronary revascularisation [ISRCTN46552265]. Trials 7:7 38. Sellier P, Chatellier G, D’Agrosa-Boiteux MC et al (2003) Use of non-invasive cardiac investigations to predict clinical endpoints after coronary bypass graft surgery in coronary artery disease patients: results from the prognosis and evaluation of risk in the coronary operated patient (PERISCOP) study. Eur Heart J 24:916–926
M. Flather et al. 39. Lim E, Drain A, Davies W et al (2006) A systematic review of randomized trials comparing revascularization rate and graft patency of off-pump and conventional coronary surgery. J Thorac Cardiovasc Surg 132:1409–1413 40. Sedrakyan A, Wu AW, Parashar A et al (2006) Off-pump surgery is associated with reduced occurrence of stroke and other morbidity as compared with traditional coronary artery bypass grafting: a meta-analysis of systematically reviewed trials. Stroke 37:2759–2769
6
Monitoring Trial Effects Hutan Ashrafian, Erik Mayer, and Thanos Athanasiou
Contents 6.1
Introduction ..............................................................
67
6.2
Definition and Development of DMCs ....................
68
6.3
Roles of the Committee ............................................
68
6.4
DMC Members .........................................................
69
6.5
Conclusion .................................................................
70
References ...........................................................................
73
Abstract In order to address difficulties in running RCTs, clinicians introduced the concept of a totally objective group of assessors to continually evaluate and review trial results and execution. These groups are known as Data Monitoring Committees (DMC) and have been set up to minimise trial complications while also optimizing trial implementation. Furthermore, as it quickly became apparent that the results of some trials had significant implications for patient care before trials had been completed, the DMCs who are responsible for the continual review of trial data also adopted the power to stop or advise extension of each experiment if deemed scientifically necessary. This chapter delineates the role of these committees, and also clarifies some of their functions through some surgical trial examples.
6.1 Introduction
H. Ashrafian () The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
Randomised clinical trials (RCTs) are regarded as the “gold standard” model in order to answer clinical questions in surgery. The benefits of this type of study are that they help to accurately define the differences in patient outcomes according to the various treatment arms to which the patients belong; if numbered adequately, they can reveal subtle differences in outcomes and have the added advantage of limiting the effect of bias in each experiment. As study trialists have become more familiar with the workings and intricacies of running RCTs, the nature of the experiments undertaken has become larger and more complex. This has led to two fundamental issues in trial execution, namely
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_6, © Springer-Verlag Berlin Heidelberg 2010
67
68
H. Ashrafian et al.
• Increased difficulty for one group of clinicians to execute a large trial (both in terms of manpower and academic support) • Increased difficulty for one group clinicians to adequately interpret the incoming results of a large trial
6.2 Definition and Development of DMCs The United States Food and Drug Administration (FDA) clearly specifies the definition of a DMC: “A data monitoring committee is a group of individuals with pertinent expertise that reviews on a regular basis accumulating data from an ongoing clinical trial”. The DMC advises the sponsor regarding the continuing safety of current participant and those yet to be recruited, as well as the continuing validity and scientific merit of the trial [1]. These committees are increasingly recognised through a number of alternate titles (Table 6.1). Table 6.1 Alternative titles for Data Monitoring Committee’s (DMCs) Data Monitoring Committee’s (DMCs) Data Review Board (DRB) Data and Safety Monitoring Committee (DSMB) Independent Data Monitoring Committee (IDMC) Treatment Effects Monitoring Committee (TEMC) Safety Monitoring Committee (SMC) Policy Advisory Board (PAB) Policy Board (PB) Policy and Data Board (PDB)
The role of these committees was first proposed in the 1960s by the Greenberg Report of the National Heart Institute in order to aid in the setting up and execution of large clinical trials [2]. Here, they recommended the use of “an oversight Policy Advisory Board”, who would review the policies and progress of the trial, advising both the trial group, while also communication with the trial sponsor, which at the time was the NIH. Members of the Policy Advisory Board were to be independent of the sponsor, and thus, could provide objective external advice for the trialists. Therefore, in essence, they were well-informed, objective communicators and advisors acting in between the sponsors of the trial and those taking part in it. The first formal use for such as committee was in the 1980s by the Coronary Drug Project Policy Advisory Board [3, 4], who set up a subcommittee to review the accumulating safety and efficacy data for the trial. Since that trial, DMCs have been a fundamental component of nearly every single large RCT performed worldwide.
6.3 Roles of the Committee The DMC is an autonomous committee that ensures that both project and patient standards are regulated and upheld during a clinical trial (Fig. 6.1). Furthermore, it is a body that can communicate its decisions and findings to the trial sponsor and the investigators. The National Institutes of Health stipulates that trial monitoring "should be commensurate with risks", and that a “monitoring committee is usually required to determine safe and effective conduct and to recommend conclusion of the trial when significant benefits or risks have developed” [5]. The National Cancer Institute Ensuring adherence to experimental protocol Modification of experimental protocol (if necessary) Ensuring adequate patient numbers in the study
Monitoring adverse effects Monitoring follow-up
Ensuring project time keeping −
Project − Ensuring accuracy of data entry
Patient
Ensuring patient care is upheld
Stopping or halting experiment (if necessary) Upholding ethics in project
Data Monitoring Committee Project investigators Project sponsor Media
−
Dialogue
Fig. 6.1 Roles of the data monitoring committee
Supporting project completion
6
Monitoring Trial Effects
Fig. 6.2 Clinical trial review by the data monitoring committee
69
Idea + Question
Design Clinical trial
Awarded Grant
Start Trial
Interim Analysis 1
Data Monitoring Committee
Review
Interim Analysis 2
Interim Analysis N
End Trial
[6, 7] also stated that DMCs should be independent of study leadership and free from conflicts of interest. They are also required to • Familiarise themselves with the research protocol(s) and plans for data and safety monitoring. • Ensure reporting results are made competently. • Ensure rapid communication of adverse event reporting and treatment-related morbidity information. • Perform periodic evaluation of outcome measures and patient safety information. • Ensure that patients in the clinical trial are protected. • Review interim analyses of outcome data (Fig. 6.2) to determine whether the trial should continue as originally designed, should be changed, or should be terminated based on these data. • Review trial performance information such as accrual information. • Determine whether and to whom outcome results should be released prior to the reporting of study results. • Review reposrts of related studies to determine whether the monitored study needs to be changed or terminated. • Review major proposed modifications to the study prior to their implementation (e.g. termination, dropping an arm based on toxicity results or other reported trial outcomes, increasing target sample size).
• Following each DMC meeting, provide the study leadership and the study sponsor with written information concerning findings for the trial: For example, any relevant recommendations related to continuing, changing, or terminating the trial. • Perform ongoing assessment of patient eligibility and evaluability. • Assure that the credibility of clinical trial reports and the ethics of clinical trial conduct are above reproach.
6.4 DMC Members DMC members are selected by the principal investigator, project manager or appointed designee at the design of a surgical trial. These should include: • • • • •
Surgeons Physicians Statisticians Other scientists Lay representatives
These individuals are to be selected on their experience, reputation for objectivity, absence of conflicts of interest and knowledge of clinical trial methodology. Furthermore, a DMC chairperson who has a tenured
70
history of experience in clinical trials should be picked. Members can belong to the institution performing the trial, although the majority should be externally appointed [6]. Although there is currently no formal requirement to be trained to be a member of a DMC, this is now changing so that those selected on the committee have a taught comprehension of the idiosyncrasies of a trial. Thus, for example their understanding and management of serious, unexpected or unanticipated adverse effects during a trial can be well rehearsed. Training could take the form of study of previous trials, specialist courses in statistics or formal university courses aimed for DMC members. In order to familiarise the readers to some examples of the roles of DMCs, we have listed six examples where DMCs played a prominent role in a surgical trial (please see below). As statisticians have the duty of preparing and presenting interim analyses to the DMC, it has been argued that for an industry funded trial, these experts may have the potential to display bias in their presentation of results. This has led to the concept of having an independent external statistician preparing the interim analyses, thereby reducing inadvertent bias [8].
6.5 Conclusion DMCs are now an integral part of any clinical trial. They are independent body that objectively reviews study protocols and results during a trial (interim reviews). They have the further ability of suggesting and initiating any required changes to the study including its termination. Furthermore, they uphold the quality of care to patients and communicate their findings with the study sponsor and study investigators. The placing of surgeons as DMC members has until now been scarce. However, with the increasing number of surgical clinical trials, this has become a necessary responsibility for academic surgeons, and will continue to rise in the future. Training for DMC membership is becoming increasingly applied, and includes familiarity of the role of DMCs in previous trials. As a result, we conclude this chapter by describing six case studies where DMCs have played a prominent role in surgical trials. Surgical case studies 1. DMC ending trial recruitment due to a difference in two surgical treatment arms.
H. Ashrafian et al.
2. DMC ending trial recruitment due to a difference in two pre-surgical treatment arms. 3. DMC ending trial recruitment due to difference noted in a subgroup of subjects. 4. DMC ending trial recruitment as a result of subjects dropping out of a trial. 5. DMC requesting re-evaluation of trial subjects. 6. DMC altering a study protocol.
Case study 6.1 DMC ending trial recruitment due to a difference in two surgical treatment arms Trial name
The Leicester randomised study of carotid angioplasty and stenting vs. carotid endarterectomy [9]
Null hypothesis/ objective
Is Carotid angioplasty (CA) a safer and a more cost-effective alternative to carotid endarterectomy (CEA) in the management of symptomatic severe internal carotid artery (ICA) disease?
Trial methods
RCT comparing carotid angioplasty and stenting vs. carotid endarterectomy for patients with symptomatic severe ipsilateral internal carotid artery (ICA) stenosis (70–99%) in a university teaching hospital
Treatment arms
Carotid angioplasty (CA) and optimal medical therapy vs. carotid endarterectomy (CEA) and optimal medical therapy
Follow-up
Patients were examined by a consultant neurologist 24 h after intervention, and any new neurological deficit was recorded. A stroke was defined as any new neurological deficit persisting for more than 24 h. The neurologist reassessed all patients at 30 days
Endpoints
Death, disabling or non-disabling stroke within 30 days
Results
All ten CEA operations proceeded without complication, but 5 of the 7 patients who underwent CA had a stroke (P = 0.0034), 3 of which were disabling at 30 days
Role of the DMC
After referral and review of these results, the Data Monitoring Committee invoked the stopping rule and the trial was suspended. Re-evaluation with the DMC, ethics and the trialists led to the study being restarted, primarily due to an issue of trial methodology as a result of problems associated with informed consent
6
Monitoring Trial Effects
Case study 6.2 DMC ending trial recruitment due to a difference in two pre-surgical treatment arms
71 Trial methods
RCT enrolling healthy sexually active men 18 years and older who had had chosen vasectomy for contraception. The trial was set up to compare two arms, namely ligation/excision with vs. without fascial interposition (FI), a technique in which a layer of the vas sheath is interposed between the cut ends of the vas
Trial name
GeparDUO of the German Breast Group [10]
Null hypothesis/ objective
Is the combination chemotherapy regimen ddAT (combined doxorubicin and docetaxel) capable of obtaining similar rates of pathologic complete remission (pCR) as the AC-DOC regimen (sequential doxorubicin/cyclophosphamide followed by docetaxel), in patients with primary operable breast cancer when used as a neoadjuvant treatment
Treatment arms
Vasectomy with and without fascial interposition (FI)
Follow-up
RCT enrolling pre-operative patients with primary operable breast cancer (T2–3 N0–2 M0) confirmed histologically by core or true-cut biopsy. Patients would receive either ddAT or AC-DOC and would then undergo breast surgery
Semen collections by blinded technicians, occurring every 4 weeks post-operatively until week 34 or until azoospermia. All participants were to be evaluated again 12 months after surgery
Endpoints
The primary endpoint was azoospermia in the first of two consecutive semen samples after surgery. Secondary outcomes included surgical difficulties and the occurrence of adverse events
Interim results
Data were analysed for 552 vasectomised men (276 in each technique group); 414 of whom had completed at least 10 weeks of follow-up were reviewed
Trial methods
Treatment arms
ddAT neoadjuvant chemotherapy vs. AC-DOC neoadjuvant chemotherapy
Follow-up
Breast surgery outpatient follow-up
Endpoints
The primary endpoint was defined as no microscopic evidence of viable tumour (invasive and non-invasive) in all resected breast specimens and axillary lymph nodes. Secondary aims are to determine disease-free survival, overall survival rates, local tumour and lymph node response
Interim results
The first interim analysis of 208 patients showed a considerable difference in the rate of the primary endpoint (pCR) in treatment groups
Role of the DMC
The DMC recommended a second interim analysis for the primary endpoint. Based on this second analysis of 395 patients, the DMC recommended stopping enrolment into the study. Due to their recommendation, only 913 of 1,000 planned patients were included in this trial, following which recruitment was halted. Treatment was continued in the patients who were already enrolled in the trial
Case study 6.3 DMC ending trial recruitment due to difference noted in a subgroup of subjects Trial Name
Null hypothesis/ objective
Family Health International (FHI) & EngenderHealth multicenter RCT evaluating fascial interposition (FI) as a component of a vas occlusion [11] The Hazard Ratio for achieving the primary endpoint of azoospermia in patients undergoing vasectomy with FI vs. the non-FI group is 1.0
The overall HR was significant (HR = 1.54, P < 0.01). However, the estimate of the HR was significantly greater than 1.0 for the age <35 years subgroup (HR = 2.13, P < 0.01), but not for the 35 years or older subgroup (HR = 0.996, P = 0.5087) To address the effect of surgeon experience, an additional nonparametric analysis of the results (a surgeon-stratified log-rank test) revealed results to be consistent with the overall HR analysis (P < 0.01) Kaplan-Meier estimates for men of 35 years and older showed a slightly higher success rate than the 34 and younger group Role of the DMC
The DMC noted that, given the striking difference observed, continuing the trial implied that younger men would be randomised to a technique that is less effective. However, the DMC concluded that there were no major safety concerns in doing so. These younger men would experience a delay only in reaching vasectomy success, and all men with vasectomy failure could undergo a repeat vasectomy A meeting between the study senior management and the DMC followed. After much deliberation, trial enrolment was terminated. Follow-up of participants already enrolled in the study continued as planned
72
H. Ashrafian et al.
Case study 6.4 DMC ending trial recruitment as a result of subjects dropping out of a trial Trial name
Italian Tamoxifen Prevention Study [12]
Null hypothesis/ objective
Tamoxifen is chemopreventive for breast cancer
Trial methods
RCT administering Tamoxifen to women (mainly in Italy) who did not have breast cancer and who had had a hysterectomy. Women were recruited via national advertising and also from direct contact with gynaecologists
Treatment arms
Tamoxifen vs. No Tamoxifen therapy
Follow-up
Five-year follow-up strategy, including a minimum of twice yearly clinical visits to monitor side effects and compliance
Endpoints
The primary endpoints were the occurrence of and deaths from histologically confirmed breast cancer. Secondary endpoints included the eventual changes that the drug could cause in cardiovascular variables, psychological assessment of the participants’ life-style and assessment of cognitive capacity and its relation to Alzheimer’s disease
Interim results
Role of the DMC
Treatment arms
Carotid endarterectomy and medical therapy vs. only medical therapy
Follow-up
Patients were interviewed about neurological symptoms and medical status every 3 months, alternating between telephone and in-clinic interviews. During the clinic visit, a neurologist examined the patient and the ACAS surgeon or his designee made a second assessment if symptoms or signs were found
Endpoints
The primary end point for evaluation was any stroke or death following randomisation and within the 30 day perioperative period for patients receiving surgery, a comparable 42 day period from randomisation for those not assigned to surgery, and any ipsilateral stroke or stroke death thereafter. All neurological symptoms and/or signs were evaluated by a neurologist
Results
Utilizing Kaplan-Meier projections in an intention-to-treat analysis, the aggregate risk over 5 years for the primary outcome was 4.8% for patients who were assigned to receive surgery and 10.6% for patients who were treated medically. The relative risk reduction conferred by surgery was 55% (23–73, 95% confidence interval, P = 0.004)
Five years after starting enrolment, the principal investigators were concerned about the large numbers of women withdrawing from the study, the unexpected finding with hypertriglyceridaemia, the findings of vascular events and the number of recovered women complaining about the side effects of Tamoxifen The trialists and the data-monitoring committee decided to end recruitment primarily because of the number of women dropping out of the study. The study was continued in patients already enrolled
Following endarterectomy, men had 69% relative risk reduction of primary endpoint, whilst women had a 16% relative risk reduction Role of the DMC
Case study 6.5 DMC requesting re-evaluation of trial subjects Trial Name
ACAS (Asymptomatic Carotid Atherosclerosis Study) [13, 14]
Null hypothesis/ objective
Among patients with severe but asymptomatic carotid artery stenosis, does carotid endarterectomy, despite a perioperative risk of any stroke or death from any cause, reduce the overall 5-year risk of fatal and non-fatal ipsilateral carotid stroke
Trial methods
RCT in patients 40–79 years of age, who had a life expectancy of at least 5 years, gave informed consent and had at least 60% carotid stenosis near the bifurcation of the common or internal carotid artery. Patients were randomised into two arms, namely surgical endarterectomy and no surgery. All patients were treated medically and were started on 325 mg of aspirin daily and aggressive reduction of modifiable risk factors
As a consequence of the trial reaching statistical significance in favour of endarterectomy, and on the recommendation of the study’s data monitoring committee, physicians participating in the study were immediately notified and advised to re-evaluate patients who did not receive surgery
Case study 6.6 DMC altering a study protocol Trial name
POSCH (program on the surgical control of the hyperlipidemias) [15]
Null hypothesis/ objective
Whether maximal lipid reduction, achieved by the partial ileal bypass (PIB) operation, can retard, arrest, or reverse the atherosclerotic process in individuals with demonstrated coronary atherosclerotic disease?
Trial methods
RCT in patients with a single documented myocardial infarction between the ages of 30–59 years, who had: A plasma cholesterol ≥5.69 mmole/L (220 mg/dL)
6
Monitoring Trial Effects Would undergo either partial ileal bypass (PIB) and an optimised diet (American Heart association phase 2), or just the optimised diet. Patients were encouraged not to take hypocholesterolaemic drugs during the trial Treatment arms
Partial ileal bypass (PIB) with an optimised diet vs. just the optimised diet
Follow-up
Patients were interviewed about neurological symptoms and medical status every 3 months, alternating between telephone and in-clinic interviews. During the clinic visit, a neurologist examined the patient and the ACAS surgeon or his designee made a second assessment if symptoms or signs were found
Endpoints
The primary monitored outcome would be for death. These were further categorised as atherosclerotic cardiovascular deaths and nonatherosclerotic deaths Secondary criteria were for fatal and nonfatal myocardial infarction, other clinical events, electrocardiography, exercise test changes, Doppler evaluation of peripheral pulses and sequential assessment of coronary arteriographic changes
Results
Original projection that a population of 500,000 individuals to randomise 12 patients annually; although in actuality, a population of 2.4–10.5 million was needed to achieve this goal. Furthermore, the trial was only eventually funded for four trial clinics as opposed to an expected six. These factors all led to difficulties in recruitment of adequate numbers of patients for the trial
Role of the DMC
With the advice and cooperation of the DMC, the trialists were able to increase patient recruitment by increasing the upper age limit for inclusion from 59 to 64 Other changes included changing the cholesterol inclusion criteria to patients who had: A plasma cholesterol >5.69 mmol/L (220 mg/dL) A plasma cholesterol >5.17 mmol/L (200 mg/dL) if LDL >3.62 mmol/L (140 mg/dL) These changes were implemented, which resulted in the required increase in patient inclusion into the study. As a result, the trial was successfully completed
73
References 1. United States Food and Drug Administration (2001) Guidance for clinical trial sponsors: on the establishment and operation of Clinical Trial Data Monitoring Committees. Available at: http://www.fda.gov/Cber/gdlns/clindatmon.htm 2. Anon (1988) Organization, review, and administration of cooperative studies (Greenberg Report): a report from the Heart Special Project Committee to the National Advisory Heart Council, May 1967. Control Clin Trials 9:137–148 3. Anon (1973) The coronary drug project. Design, methods, and baseline results. Circulation 47:I1–50 4. Canner PL, Berge KG, Wenger NK et al (1986) Fifteen year mortality in Coronary Drug Project patients: long-term benefit with niacin. J Am Coll Cardiol 8:1245–1255 5. National Institutes of Health (NIH) (1998) NIH policy for data and safety monitoring. Available AT: http://grants.nih. gov/grants/guide/notice-files/not98–084.html 6. National Cancer Institute (1999) Policy of the national cancer institute for data and safety monitoring of clinical trials. Available at: http://deainfo.nci.nih.gov/grantspolicies/datasafety.htm 7. National Cancer Institute (2006) Guidelines for monitoring of clinical trials for Cooperative Groups, CCOP research bases, and The Clinical Trials Support Unit (CTSU). Available at: http://ctep.cancer.gov/monitoring/2006_ctmb_ guidelines.pdf 8. DeMets DL, Fleming TR (2004) The independent statistician for data monitoring committees. Stat Med 23: 1513–1517 9. Naylor AR, Bolia A, Abbott RJ et al (1998) Randomized study of carotid angioplasty and stenting versus carotid endarterectomy: a stopped trial. J Vasc Surg 28:326–334 10. Jackisch C, von Minckwitz G, Eidtmann H et al (2002) Dose-dense biweekly doxorubicin/docetaxel versus sequential neoadjuvant chemotherapy with doxorubicin/cyclophosphamide/docetaxel in operable breast cancer: second interim analysis. Clin Breast Cancer 3:276–280 11. Chen-Mok M, Bangdiwala SI, Dominik R et al (2003) Termination of a randomized controlled trial of two vasectomy techniques. Control Clin Trials 24:78–84 12. Veronesi U, Maisonneuve P, Costa A et al (1998) Prevention of breast cancer with tamoxifen: preliminary findings from the Italian randomised trial among hysterectomised women. Italian Tamoxifen Prevention Study. Lancet 352:93–97 13. Anon (1994) Clinical advisory: carotid endarterectomy for patients with asymptomatic internal carotid artery stenosis. Stroke 25:2523–2524 14. Anon (1995) Carotid endarterectomy for patients with asymptomatic internal carotid artery stenosis. National Institute of Neurological Disorders and Stroke. J Neurol Sci 129:76–77 15. Buchwald H, Matts JP, Hansen BJ et al (1987) Program on surgical control of the hyperlipidemias (POSCH): recruitment experience. Control Clin Trials 8:94S–104S
7
How to Recruit Patients in Surgical Studies Hutan Ashrafian, Simon Rowland, and Thanos Athanasiou
Contents 7.1
Introduction ............................................................
75
7.2
Planning and Organisation....................................
76
7.3
Timing .....................................................................
76
7.4
Patients’ Point of View ...........................................
76
7.5
The Recruitment Team ..........................................
76
7.6
Recruitment Skills ..................................................
76
7.7
Sources of Recruitment..........................................
77
7.8
Balance Between Inclusion/Exclusion Criteria ...
77
7.9
Prerequisites ...........................................................
78
7.10
Factors to Increase Participation ..........................
78
7.11
Factors to Ensure Continued Participation .........
78
7.12
Patient Subgroups ..................................................
80
7.13
Practicalities............................................................
80
7.14
Conclusion...............................................................
80
References ...........................................................................
80
H. Ashrafian () The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, London W2 1NY, UK e-mail: [email protected]
Abstract The process of recruiting patients into any clinical study is fundamentally critical for the implementation, execution and completion of any project. Within this chapter, some of the salient points involved in patient recruitment will be identified and categorised so as to familiarise the reader with the necessary concepts required to recruit patients for a surgical controlled trial.
7.1 Introduction The process of recruiting patients into any clinical study is fundamentally critical for the implementation, execution and completion of any project [10]. Simply put, if the study does not have the required number of patients to examine, then no adequate conclusion regarding outcomes and results can be attained [1, 4]. The completion of such a critical process for a randomised clinical trial costs more and consumes more time than any other aspect of the project. Furthermore, the actual task of recruitment can potentially be performed by almost any member of the surgical research team, and recruiters do not necessarily need formal medical or surgical qualifications. As a result, there is a vast array of organisational input, which is required in order to successfully enrol patients within a clinical study. This is usually addressed by a specific recruitment sub-committee. Within this chapter, some of the salient points involved in patient recruitment will be identified and categorised so as to familiarise the reader with the necessary concepts required to recruit patients for a surgical controlled trial.
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_7, © Springer-Verlag Berlin Heidelberg 2010
75
76
7.2 Planning and Organisation Many recruitment difficulties occur as a result of insufficient organisation and poor scheduling. Generally, an over-estimation of the recruitment numbers from one or two limited sources (such as relying heavily upon only medical referrals) leads to poor yields [2]. Definitive recruitment strategies need to be implemented, and specific management of recruitment staff and subjects needs to be instigated with database administration and establishment of a monitoring system. Clear levels of organisational leadership need setting up in the guise of a recruitment team, and formal lines of communication should be laid with other groups of the research trial team.
7.3 Timing In many surgical study protocols, a final date of recruitment is set so as to have completed follow-up data by the end of the study. If, therefore, few patients are recruited within this period, subjects enrolled later would have results with an unsatisfactory follow-up period and would be unsuitable for inclusion in data-analysis. This could significantly detract from the required number of subjects in a power study. Furthermore, if too many patients are recruited in a very short time, then there is a risk that any unidentified problems (such as long-term surgical complications or cumulative drug toxicity) may present concurrently in a large number of individuals, overwhelming the trialists and preventing adequate time for remedial action to take place. Too many patients entering at the same time can also lead to a deluge of new data that is difficult to process by a limited number of trialists [6]. Enrolment should, therefore, ideally occur at a constant rate to maintain study power and minimise uneven or excessive research effort during the follow-up period.
H. Ashrafian et al.
procedure to a minimally invasive equivalent (laparoscopic or thoracoscopic), and thus, patients may be more inclined to opt for the less invasive option despite an unknown outcome. This may skew the subject demographic entering a trial, and it is up to the recruitment committee to try and ensure that all patients being approached are given a clear and concise background and explanation for why the trial is important, and why both arms need adequate case loads in order to achieve an interpretable result.
7.5 The Recruitment Team Any healthcare professional who understands the research question and the trial proposed can be included in the recruitment team. This can typically involve senior surgeons, junior doctors, specialist nursing staff, physician assistants, specialist physiotherapists and many more. Some recruiters will be formally employed as part of the research team of the trial, whereas other might be those who have an interest in taking part in the trial, and their input will be totally voluntary. Either way, it is important for these individuals to have a clear understanding of the research aims and protocol, and this would usually occur at research briefings, pre-trial education of staff and continual peri-trial teaching sessions. Recruiters should be picked on grounds of positive attitude, merit, lasting loyalty and commitment to the project. Pressure of work and time commitments can decrease the momentum in any recruitment process, and thus, it is important for recruiters to feel a sense of collective ownership for the project at hand, and regular contact with other recruiters and the research team at set meeting points will reinforce the need for recruitment, and will continue project encouragement.
7.6 Recruitment Skills 7.4 Patients’ Point of View Many surgical projects in a randomised trial involve comparing surgery to a non-interventional treatment, or two procedures, one more invasive than the other. A typical example would include comparing an open
The skills required in the recruitment process are broad, but require recruiters to be good communicators and also totally unbiased in their approach to selecting and enrolling patients to each study. Typically, the best recruiters need to have good interpersonal skills and empathy with their patients. They need to describe to
7
How to Recruit Patients in Surgical Studies
77
each potential trial subject the background of the research, the importance of the research question and the implications for those taking part. As a result, those recruiting need to have a good understanding of the underlying science of the study, and need to have this understanding normally assessed and approved by the trial organizing committee. Importantly, and as with all member of a trial, participants need to demonstrate good team skills, and should also be supplied with the space and facilities with which to recruit and contact patients. Therefore, there is a basic requirement of adequate office space, telephone, electronic mail and internets access.
referred from the trial directly from physicians in the community or other referring units and hospitals. Other sources include those from patient registries either set-up by the clinical trial in question, or that of a similar clinical trial. These registries are large databases containing lists of active clinical trials or of potential participants. Depending on the nature of subjects required, national and even international screening for patients can occur via media (television, radio, internet) and direct mail wherein patients are invited to come and take part in a trial, for example, The Italian Breast Cancer Prevention Trial with Tamoxifen, where television advertisements by the organising trial committee invited patients to take part in their study [11].
7.7 Sources of Recruitment The initial selection of candidates for recruitment can be largely multifaceted, though requires total anonymity and professional conduct at all times. Sources can include patient medical records or direct recruitment of subjects from inpatient wards and outpatient clinics. Furthermore, once a research trial has been established, publicity to local primary care units and regional centres will ensure that some patients will be potentially
7.8 Balance Between Inclusion/ Exclusion Criteria The number of subjects who will finally take part in a clinical trial includes the number of baseline subject candidates who will then be accordingly filtered on the basis of inclusion and exclusion criteria (Fig. 7.1). Thus, typically, many studies initially over-estimate
Asked to take part ( n=1000 )
Unwilling to take part ( n=250 )
Those that fit inclusion criteria ( n=750)
Exclusion criteria ( n=250 )
Total subjects that start study ( n=500 )
Fig. 7.1 Flow chart for a hypothetical recruitment process, demonstrating a decrease in numbers at each stage of selection based on inclusion criteria, exclusion criteria and patient compliance
In-study dropout ( n=100 )
Total number of subjects that complete study ( n=400 )
78
likely numbers of patients in a study, as it is typically over-looked that at each stage of inclusion or exclusion, patients’ number drops. Furthermore, as any study continues, patients may drop out because of the lack of commitment or in-trial exclusion criteria.
7.9 Prerequisites Once a trial is set up and the recruiters selected, there are a number of prerequisites that are required to be adhered to before any study subjects are selected (Fig. 7.2). To begin, there is a compulsory collective responsibility to ensure that the study has been approved by the data monitoring committee and the local and if necessary, national ethics groups. Each recruiter needs to be avidly aware of the nature of the research and the ethical implications of placing subjects within the sphere of the trial. This is important as it ensures that all recruiters know the nature of the trial intrinsically, including all the pros and cons of the subjects taking part. Furthermore, it allows all recruiters to accurately describe the implications of taking part in the research project. This also follows and equips the recruiters to fulfil another important aspect of their project, which is to obtain informed consent from their subjects. Furthermore, it is important that recruiters are in a position to answer, in as many ways possible, the questions that may arise during the recruitment process. As a result, it is necessary that all recruiters are adequately trained and have increasingly undergone a formal training and debrief by the trial organisers. These recruiters need to be familiar with a variety of methods of conveying information about the trial to subjects, which can include one-to-one meeting and the supply of contact details and information sheets for patients.
7.10 Factors to Increase Participation There are a number of contributing factors that can ensure successful and plentiful recruitment of subjects within each trial (Fig. 7.2). Large numbers of recruiters will allow a wide net of recruitment to be achieved through the exertion of numerous man-hours of exertion, and one way in which this can be achieved is by employing a multitude of multi-disciplinary recruiters.
H. Ashrafian et al.
This will not only strengthen the research process, but will also give a wider concept of collective ownership to researchers in general, and will ultimately allow an increased integration of resources and possibly a wider dissemination of results on completion. As with any research project, enthusiastic team members will lead to dynamism in effort and activity that will allow adequate numbers of subjects being approached and investigated for admittance into a trial. If a variety of recruitment modalities is utilised (such as by telephone, email or in person), patient numbers can be increased geometrically as opposed to arithmetically. These methods can be augmented by wider public advertising such as by presentation of the research and recruitment needs in the public media (radio, television etc.), indeed if general awareness of the project is important, not only directed to patients but also directed to those in medical world. Thus, the onus lies with the recruiters and the research team to broadcast and publicise the need for recruitment to converse with medical colleagues at all levels – local, national and international. This can be done through meetings, lectures, conferences and even word of mouth, but will eventually result in the broadest possible referral base for subjects in which to be recruited. Subjects being recruited need to be reassured that they will come to adverse consequence if they do not participate in the study, but should be encouraged to participate on grounds of personal interest and patient bonhomie. Occasionally, a trial offers a totally new treatment in a patient with an otherwise untreatable condition, and the offer of a trial may then be a direct medical benefit to that subject. Furthermore, including an “opt-out” [7] availability in the study will allow patients who drop out of treatment during a trial to still be eligible for follow-up in the trail, so as their participation is not a total loss for the study.
7.11 Factors to Ensure Continued Participation As already alluded above, once trial subjects have been entered into a trial, it is still highly important to ensure that these subjects remain within the trial; otherwise, the entire effort placed into acquiring their data may become unusable or lost. This will result in required extra effort, time and may be the failure of the project,
7
How to Recruit Patients in Surgical Studies
79
Local Ethics Approval Informed consent Data Monitoring committee ( DMC ) Approval Funding
Prerequisites
Legal Indemnity Adequately trained Recruiters Information Sheets for Patients Information Sheets for Medical Professionals Researcher Education
Enthusiastic Researchers
Academic Gain no adverse consequences to non-participation Telephone Multimodal recruitment
E-mail Personal
Multilocation recruitment
Clinic Ward
Large number of recruiters Nurses Doctors Multidiciplinary recruitment
Physiotherapists Physician Assistants
Clinical Trial Recruitment
Other trained professionals
Factors to increase participation and the branches
National
Large number of recruiting centres
International Increased patient Interest for Research Study
Personal Medical Benefit Patient Bonhomy
Advertising
Medical General Public
Lectures Increased Awarness of Trial
Conferences Word of Mouth
Medical Patient
“opt-out” availability (i.e non-responders can be followed up with further communication
Easy to reach Pleasant Staff Trial Location
Professional Looking Site General Care and Cordiality
Factors to ensure continued Participation and the branches
Reimbursement for travel expenses Suitably qualified staff on days of study Good timekeeping on days of study Payment incentives ( Within ethical remit ) Out of hours and daytime contact details in case of participant problems or queries
Fig. 7.2 Summary of the contributing factors in the successful recruitment of patients for a clinical trial
and thus, it is up to the recruiters and the research team as a whole to try and optimise the numbers of patients who complete the project and follow-up (Fig. 7.2). Many of these issues are considered as common sense, though have been recurring sources of trial delay
or failure. Trial sites should be easy to reach and welcoming, with staff cordiality encouraged at all times. Ideally, subjects would be paid for their travel, and adequately trained and informed staff will be on hand at all times so as to discuss any queries that subjects
80
may have. Some sources actually advocate a somewhat controversial idea that trialists and recruiters should get “payment incentives” [3] to ensure the smooth running of a study, but nevertheless however achieved, a standard of commitment and professionalism needs to be attained for subjects to continue returning to participate in a trial. Furthermore, this level of dedication needs translation to a further level, where at least one of the recruitment teams should be available at all times to address out-of hour and holiday-time queries by subjects, which some consider a type of “on-call” recruitment rota so as to provide patients with a complete umbrella of care during their time with the trial research group.
H. Ashrafian et al.
Salient issues include cordiality to all subjects at all times. A welcoming environment needs to be established, and hospitality to all subjects needs to be universal. This includes the reimbursement of travel expenses, the provision of refreshments and reading material if patients are to wait for long periods. An environment of open communication should be encouraged, allowing patients to feel comfortable in expressing any reservations they have to any aspects of the trial. Such feedback is vital and should be used to improve trial hospitality and practice where possible.
7.14 Conclusion 7.12 Patient Subgroups Recruiters should be aware of the variation in recruiting people from different backgrounds, ages, gender, socioeconomic status and disease [5, 8]. For example, studies of patients with HIV/AIDS [9] need to be seen to have been added confidentially to reassure subject in a trial. Some units find it notoriously difficult to recruit from racial minorities and require extensive campaigns to communicate to these populations. Furthermore, a number of studies reveal that elderly populations demonstrate a preference for person-to-person contact as opposed to written contact. As a result, each recruitment process needs to account for the special needs of each patient subgroup, and the methods used to encourage patients to enter these trials need to be tailor made for the subjects in question.
7.13 Practicalities Consideration of the practicalities of trial recruitment is fundamentally important for successful execution. For example, it is essential to recognise and develop aspects that can increase subject comfort and satisfaction. A good rapport with subjects can be very helpful and may increase their willingness to return for follow-up sessions. Furthermore, it may allow for increased openness in the discussion of symptoms that may be vital for the documentation and analysis of study results.
The successful recruitment of subjects within a surgical trial involves adequate planning, effective teamwork and a multi-modal approach in selection. Specifically trained recruiters should be equipped with a broad knowledge-base and skills with which to effectively inform and select patients. Recruitment of subjects can come from a large source of patient groups within a variety of healthcare systems, but should be targeted for the specific research area of the trial. The role of recruitment is a fundamental necessity to the ultimate completion of a trial, and if successful, can greatly strengthen the quality of the data eventually collected.
References 1. Allen PA, Waters WE (1982) Development of an ethical committee and its effect on research design. Lancet 1: 1233– 1236 2. Bell-Syer SE, Moffett JA (2000) Recruiting patients to randomized trials in primary care: principles and case study. Fam Pract 17:187–191 3. Bryant J, Powell J (2005) Payment to healthcare professionals for patient recruitment to trials: a systematic review. BMJ 331:1377–1378 4. Charlson ME, Horwitz RI (1984) Applying results of randomised trials to clinical practice: impact of losses before randomisation. Br Med J (Clin Res Ed) 289:1281–1284 5. Clark MA, Neighbors CJ, Wasserman MR et al (2007) Strategies and cost of recruitment of middle-aged and older unmarried women in a cancer screening study. Cancer Epidemiol Biomarkers Prev 16:2605–2614 6. Johnson L, Ellis P, Bliss JM (2005) Fast recruiting clinical trials–a utopian dream or logistical nightmare? Br J Cancer 92:1679–1683
7
How to Recruit Patients in Surgical Studies
7. Junghans C, Feder G, Hemingway H et al (2005) Recruiting patients to medical research: double blind randomised trial of “opt-in” versus “opt-out” strategies. BMJ 331:940 8. Keyzer JF, Melnikow J, Kuppermann M et al (2005) Recruitment strategies for minority participation: challenges and cost lessons from the POWER interview. Ethn Dis 15:395–406 9. King WD, Defreitas D, Smith K et al (2007) Attitudes and perceptions of AIDS clinical trials group site coordinators
81 on HIV clinical trial recruitment and retention: a descriptive study. AIDS Patient Care STDS 21:551–563 10. Mapstone J, Elbourne D, Roberts I (2007) Strategies to improve recruitment to research studies. Cochrane Database Syst Rev (2):MR000013 11. Veronesi U, Maisonneuve P, Costa A et al (1998) Prevention of breast cancer with tamoxifen: preliminary findings from the Italian randomised trial among hysterectomised women. Italian Tamoxifen Prevention Study. Lancet 352:93–97
8
Diagnostic Tests and Diagnostic Accuracy in Surgery Catherine M. Jones, Lord Ara Darzi, and Thanos Athanasiou
Abbreviations
Contents Abbreviations .....................................................................
83
8.1
Introduction ............................................................
84
8.2
What Is a Diagnostic Test? ....................................
84
8.2.1 Criteria for a Useful Diagnostic Test........................ 8.2.2 Choosing Diagnostic Endpoints ............................... 8.2.3 Diagnostic Test Data ................................................
84 84 85
8.3
Quality Analysis of Diagnostic Studies.................
85
8.3.1 Reporting in Diagnostic Tests .................................. 8.3.2 Sources of Bias in Diagnostic Studies .....................
86 86
8.4
Estimates of Diagnostic Accuracy.........................
88
8.4.1 8.4.2 8.4.3 8.4.4 8.4.5 8.4.6
True Disease States .................................................. Sensitivity, Specificity and Predictive Values .......... Likelihood Ratios ..................................................... Diagnostic Odds Ratio ............................................. Confidence Intervals and Measures of Variance ...... Concluding Remarks About Estimates of Diagnostic Accuracy ............................................
88 89 90 91 91
Receiver Operating Characteristic Analysis........
91
8.5.1 The ROC Curve ........................................................ 8.5.2 Area Under the ROC Curve .....................................
92 92
8.6
Combining Studies: Diagnostic Meta-Analysis ...
93
8.6.1 Goals and Guidelines ............................................... 8.6.2 Heterogeneity Assessment ....................................... 8.6.3 Diagnostic Meta-Analysis Techniques .....................
93 93 94
8.7
Conclusions .............................................................
97
References ...........................................................................
97
List of Useful Websites.......................................................
98
8.5
91
C. M. Jones () The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
AUC DOR FN FP FPR HSROC
Area under the curve Diagnostic odds ratio False negative False positive False positive rate Hierarchical summary receiver operating characteristic LR Likelihood ratio NPV Negative predictive value PPV Positive predictive value QUADAS Quality assessment of diagnostic accuracy studies RDOR Relative diagnostic odds ratio ROC Receiver operating characteristic SROC Summary receiver operating characteristic STARD Standards for reporting of diagnostic accuracy TN True negative TP True positive TPR True positive rate
Abstract This chapter outlines the principles of designing, performing and interpreting high quality studies of diagnostic test accuracy. The basic concepts of diagnostic test accuracy, including sensitivity, specificity, diagnostic odds ratio (DOR), likelihood ratios (LRs) and predictive values, are explained. Sources of bias in diagnostic test accuracy studies are explained with practical examples. The graphical ways to represent the accuracy of a diagnostic test are demonstrated. Issues to consider when interpreting the quality of a published study of test accuracy are outlined using
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_8, © Springer-Verlag Berlin Heidelberg 2010
83
84
guidelines advocated in the medical literature. Finally, the principles of diagnostic meta-analysis, including bivariate and hierarchical summary receiver operating characteristic methods, are explained.
8.1 Introduction Accurate diagnosis is the first step towards effective treatment of a disease. In surgical practice, prudent use of appropriate tests to reach accurate, fast diagnosis can be especially important in maintaining a high quality of patient care. From a general perspective, any investigation that provides information to direct management is a form of diagnostic test. Screening programs and prognostic indicators can be thought of as special cases of diagnostic tests. The need for diagnostic test guidelines is increasing across medicine, as the amount of medical literature expands yearly. The cost of diagnostic tests contributes significantly to healthcare costs, and must be a practical consideration when choosing a test. The cost of a test includes not only the immediate expense of the test materials and staff time, but also the subsequent costs of further tests performed on the basis of equivocal or inaccurate results. The decision to make large-scale screening and other diagnostic test policy must take into account the costs, but the clinical benefit to the target patient population will be necessarily paramount. Summary measures of diagnostic test performance are often quoted in the literature, and an understanding of these is important for clinicians. This chapter explains the different methods for quantifying the accuracy of a diagnostic test, and aids in the understanding of the principles of diagnostic research.
8.2 What Is a Diagnostic Test? In its broadest context, a medical diagnostic test is any discriminating question that, once answered, provides information about the status of the patient. While this classically means diagnosis of a medical condition, any outcome of interest is potentially “diagnosable”, including future clinical events. This chapter deals with the principles of medical diagnosis, and although prognostic analysis uses statistical methodology similar to
C. M. Jones et al.
that discussed in this chapter, it will be discussed elsewhere in this book. Diagnostic tests in surgical practice include history and examination findings, laboratory investigations, radiological imaging, clinical scores derived from questionnaires and operative findings. The underlying principles for use and applicability apply equally.
8.2.1 Criteria for a Useful Diagnostic Test A diagnostic test should only be performed if there is expectation of benefit for the patient. The individual situation of each patient should be considered before choosing a diagnostic pathway which maximises benefit and minimises risk. Pepe [1] summarises the criteria for a useful diagnostic test. The disease should be serious enough to warrant investigation, have reasonable prevalence in the target population and change the treatment plan if present. The test should also accurately classify diseased from non-diseased patients. The risks of performing the test should not outweigh the diagnostic benefits. In practice, the benefit-to-risk assessment is based on patient factors, test availability, test performance in the medical literature and the experience and personal preference of the managing clinician. Choosing a diagnostic test requires knowledge of its performance across different patient populations. There is no benefit in applying a test to a population in whom there is no evidence of diagnostic accuracy. Clinical experience and medical literature provide information on reliable interpretation of results. In summary, a diagnostic test is indicated if the suspected disease warrants treatment or monitoring, or significantly changes the prognosis. The test result should change patient care, according to the patient and clinician preferences. Finally, the accuracy of the test for the desired diagnosis, in that clinical setting, should be acceptably high to warrant the performance of the test.
8.2.2 Choosing Diagnostic Endpoints In some circumstances, a diagnostic test will be performed solely to exclude (or confirm) a medical condition; in these cases, the most appropriate test is one that reliably excludes (or confirms) its presence. In other
8
Diagnostic Tests and Diagnostic Accuracy in Surgery
Table 8.1 Examples of targeted diagnostic tests Test Targeted outcome Blood product screening
Exclusion of transmittable disease
Operative cholangiogram
Exclusion of common duct stone
Carotid angiogram
Confirm artery stenosis seen on ultrasound, prior to surgery
Preop chest X ray
Exclude major pulmonary disease
Cervical smear testing
Exclude cervical cell atypia
circumstances, confirmation and exclusion of disease are equally important. The decision to perform a test will, therefore, depend not only on its performance in the target population, but also on the endpoint(s) of interest. Several examples of targeted tests are given in the Table 8.1. In particular, screening tests are targeted towards exclusion of disease, as they are applied on a grand scale to detect occult disease. If positive, the screening test is often followed by other investigations with higher overall accuracy. The requirements of the screening test to be safe, inexpensive and convenient mean that the accepted levels of diagnostic accuracy can be less than ideal.
8.2.3 Diagnostic Test Data Test results can be expressed in a variety of ways, depending on the method of testing and clinical question. Table 8.2 provides a summary of the types of data generated by diagnostic tests and common examples. A dichotomous result is one in which there is “yes/ no” answer. The desired outcome is either present, or it is not. An example of this is microbiological testing for a virus; the patient is either positive or negative for the virus. This is the simplest result to interpret. An ordinal result is one in which there are a number of possible outcomes, in an ordered sequence of probability Table 8.2 Types and examples of result variables Dichotomous
Viral testing, bacterial cultures
Ordinal
V/Q scanning, Gleason score, Glasgow Coma Scale
Categorical
Genotype testing, personality disorders
Continuous
Biochemical tests, height, weight
85
or severity, but with no quantitative relationship between the outcomes. An example of this is lung ventilation/ perfusion scanning for pulmonary embolus. The result may be low, intermediate or high probability of pulmonary embolus. Another example of an ordinal test result is the Gleason histopathology score for prostate cancer, which influences prognosis and treatment. The result of an ordinal test should be interpreted in accordance with the available literature, before subsequent decisions are made about the need for further investigative tests or treatments. A categorical variable has a number of possible outcomes that are not ordered in terms of severity or probability. An example of this is genetic testing, which may have two or more possible results for a gene code in an individual. There is no implicit order in the outcomes per se. Interpretation of this type of test will require knowledge of the implications of each possible outcome. Many investigations yield a numerical result, which lies on a continuous spectrum. Examples of this include body weight, serum creatinine, body mass index and blood serum haemoglobin testing. A test that gives a continuous variable requires a pre-determined threshold for disease in order for the result to be meaningful. For example, a pre-determined value may be the threshold for anaemia requiring blood transfusion. A different value may be the threshold for iron supplementation. Another example is prostatic surface antigen in screening for prostatic cancer. In all diagnostic tests, a threshold for changing the patient’s management is required. For a dichotomous result, this threshold is incorporated within the test to produce a positive or negative result. Ordinal, categorical and continuous tests require a suitable threshold to be chosen, which will depend on the patient, clinician and local guidelines. The diagnostic accuracy of the test may depend on the chosen threshold, which will be explained later in the chapter.
8.3 Quality Analysis of Diagnostic Studies Studies of diagnostic test accuracy compare one or more index tests against a “gold standard” reference test in the same population. These studies provide guidance for clinical practice and impetus for the development of
86
new diagnostic technologies and applications. However, the publication of exaggerated results from poorly designed, implemented or reported studies may cause suboptimal application and interpretation of the diagnostic test. The importance of accurate and thorough reporting is now widely accepted.
8.3.1 Reporting in Diagnostic Tests Published guidelines are designed to promote diagnostic test study quality, focussing on accurate and thorough reporting of study design, patient population, inclusion and exclusion criteria and follow up [2]. The STARD initiative (Table 8.3) was published in several leading journals in 2003 and resulted from collaboration between journal editors and epidemiologists. It presents a checklist of 25 items for authors and reviewers to promote acceptable standards in study design, conduct and analysis. STARD is now an accepted guideline for authors and reviewers of single articles of diagnostic accuracy. In 2003, the QUADAS tool for quality assessment of studies included in diagnostic test systematic reviews was published (Table 8.4). This tool consists of fourteen items, each of which should be used in diagnostic meta-analyses to evaluate the impact of study reporting on the overall accuracy results. The role of QUADAS in meta-analysis is discussed further in later section of this chapter.
8.3.2 Sources of Bias in Diagnostic Studies As with other types of studies, diagnostic research is vulnerable to bias in study design, implementation, reporting and analysis. Bias refers to differences between the observed and true (unknown) values of the study endpoints. Generally, bias cannot be measured directly, but robust methodology and endeavours to minimise known sources of bias increase the credibility of observed results. Biased results may occur in studies where there are flaws in study design. Begg and McNeil [3] provide a thorough overview of the sources of bias in diagnostic tests. A summary of the more common and important types of bias is provided here.
C. M. Jones et al.
8.3.2.1 Reference Standard Bias The choice of reference test is central to reaching robust conclusions about the usefulness of a diagnostic test. The reference test should correctly classify the target condition, and is ideally the definitive investigation. In surgical practice, this may represent surgical intervention to achieve histology or visual inspection of a tumour. Alternatively, the reference test may incorporate imaging, biochemistry or other testing. An inaccurate reference test leads to misclassification of true disease status and makes assessment of the index test inaccurate (reference standard error bias). The entire study population, or a randomly selected subgroup, should undergo the same index and reference tests. Partial verification bias, also known as workup bias, occurs when a non-random selection of the study population undergoes both the index and reference tests. If the results of the index test influence the decision to perform the reference standard, bias arises. It occurs mostly in cohort studies where the index test is performed before the reference standard. Differential verification bias occurs when the reference standard changes with the index test result. This is particularly common in surgical practice wherein the result of a laboratory test may determine whether the patient undergoes surgery or is treated conservatively. Finally, incorporation bias occurs when the index test result contributes to the final reference test result, making the concordance rate between the two tests artificially high. The diagnostic and reference tests should be performed within a sufficiently short time period to prevent significant disease progression (disease progression bias). Within a diagnostic study, the decision to perform both tests should be made prior to either test being performed. The method used to perform the index and reference tests should be clearly provided for reproducibility elsewhere.
8.3.2.2 Investigator-Related Factors in Bias In studies where one test is interpreted with awareness of the other test result (review bias), there is a tendency to over-estimate the accuracy. For example, equivocal liver lesions on ultrasound may be diagnosed as metastatic lesions if a recent CT of the abdomen mentions liver metastases. This may be acceptable in clinical practice, but is undesirable in impartial assessment of
8
Diagnostic Tests and Diagnostic Accuracy in Surgery
87
Table 8.3 The STARD checklist (reproduced) Section and topic Item Criterion Title/abstract/keywords
1
Identify the article as a study of diagnostic accuracy (recommend MeSH heading of “sensitivity and specificity”
Introduction
2
State the research question or study aims, such as estimating diagnostic accuracy or comparing accuracy between tests or across participant groups
Methods Participants
Test methods
Statistical methods
Describe 3
Study population: the inclusion and exclusion criteria, setting and locations where the data were collected
4
Participant recruitment: was recruitment based on presenting symptoms, results from previous tests, or the fact that the participants had received the index tests or the reference standard?
5
Participant sampling: Was the study population a consecutive series of participants defined by the selection criteria in items 3 and 4? If not, specify how participants were further selected
6
Data collection: Was data collection planned before the index test and reference standard were performed (prospective study) or after (retrospective study)?
7
The reference standard and its rationale
8
Technical specifications of material and methods involved including how and when measurements were taken, and/or cite references for index tests and reference standard
9
Definition and rationale for the units, cut-offs and/or categories of the results of the index tests and the reference standard
10
The number, training and the expertise of the people executing and reading the index tests and the reference standard
11
Whether or not the readers of the index tests and reference standard were blind (masked) to the results of the other test and describe any other clinical information available to the readers
12
Methods for calculating or comparing measures of diagnostic accuracy, and the statistical measures used to quantify uncertainty (e.g. 95% confidence intervals)
13
Methods for calculating test reproducibility, if done
Results Participants
Test results
Estimates
Report 14
When study was done, including beginning and end dates of recruitment
15
Clinical and demographic characteristics of the study population (e.g. age, sex, spectrum of presenting symptoms, co-morbidity, current treatments, recruitment centres)
16
The number of participants satisfying the criteria for inclusion that did or did not undergo the index tests and/or reference standard; describe why participants failed to receive either test (flow chart recommended)
17
Time interval from the index tests to the reference standard, and any treatment administered in between
18
Distribution of severity of disease (define criteria) in those with the target condition; other diagnoses in those without the target condition
19
A cross tabulation of the results of the index test (including indeterminate and missing results) by the results of the reference standard; for continuous results, the distribution of the test results by the results of the reference standard
20
Any adverse events from performing the index tests or the reference standard
21
Estimates of diagnostic accuracy and measures of statistical uncertainty (e.g. 95% confidence intervals)
22 test. How indeterminate results, responsesreview and outliers of athesimilar index tests were handled the accuracy of a diagnostic Many diagnostic testsmissingClinical bias is concept. When there require no subjective assessment and give is a subjective or interpretive component to thereaders test result, 23 Estimates of dichotomous variability of diagnostic accuracy between sub-groups of participants, or centres, if done results, and in such cases, the people performing the clinical information may contribute to the final diagnosis. tests need to only record or Ififthis occurs, it should be made clear in the methodology 24the outcome. Estimates Biochemical of test reproducibility, done bacterial culture testing25are examples of clinical such tests. Discussion Discuss the applicabilitysoofthat the subsequent study findingsadoption of the technique elsewhere is
88
C. M. Jones et al.
Table 8.4 QUADAS tool for diagnostic meta-analysis quality assessment (reproduced) Item 1
Was the spectrum of patients representative of the patients who will receive the test in practice?
2
Were selection criteria clearly described?
3
Is the reference standard likely to correctly classify the target condition?
4
Is the time period between index test and reference standard short enough to be reasonably sure that the target condition did not change between the two tests?
5
Did the whole sample, or a random selection of the sample, receive verification using a reference standard of diagnosis?
6
Did patients receive the same reference standard regardless of the index test result?
7
Was the reference standard independent of the index test (i.e. the index test did not form part of the reference standard)?
8
Was the execution of the index test described in sufficient detail to permit replication of the test?
9
Was the execution of the reference standard described in sufficient detail to permit its replication?
10
Were the index test results interpreted without knowledge of the results of the reference standard?
11
Were the reference standard results interpreted without knowledge of the results of the index test?
12
Were the same clinical data available when test results were interpreted as would be available when the test is used in practice?
13
Were uninterpretable/intermediate test results reported?
14
Were withdrawals from the study explained?
accuracy. An example of this is knowledge of the patient’s age when assessing bony lesions on plain radiographs, as age is highly discriminating. Equivocal or unavailable results from either test pose a dilemma for study coordinators. If the unavailable results are excluded from analysis, biased results can occur if the remaining study population differs significantly from the initial group. Whether bias arises depends on the reasons for non-interpretable results.
8.3.2.3 Population Factors in Bias Spectrum bias refers to the lack of generalisability of results when differences in patient demographics or clinical features lead to differences in diagnostic test accuracy. Reported test accuracy may not be applicable to patient populations differing to the original study population in severity of disease, co-morbidities or specific demographics. Relevant patient characteristics should, therefore, be clearly reported. Control groups of “normal” patients tend to be healthier than the average non-diseased subject, and known cases to have severe disease, overestimating test accuracy as extremes
Yes
No
Unclear
of health are easier to detect. Patients should, therefore, be selected to represent the target population, which should be clearly described. Patients of special interest can be chosen to demonstrate the test accuracy in a particular subgroup, but this should be clearly explained in the paper to prevent erroneous application to other populations. The selection criteria for inclusion and exclusion from the study should be clearly stated. Similarly, excluding patients with diseases known to adversely affect the diagnostic performance of the test leads to limited challenge bias, as in the example of patients with chronic obstructive pulmonary disease excluded from a study of ventilation/perfusion scintillation scanning.
8.4 Estimates of Diagnostic Accuracy 8.4.1 True Disease States For each patient, the reference test is assumed to indicate true disease status. The index test result is classified as true positive (TP), true negative (TN), false
8
Diagnostic Tests and Diagnostic Accuracy in Surgery
Fig. 8.1 Graphical representation of correlation between index and reference test results
89
Reference Test +
-
+
TP
FP
TP+FP
-
FN
TN
FN+TN
TP+FN
FP+TN
TP+FN+FP+TN
Index test
Reference Test +
-
+
TP
FP
TP+FP
-
FN
TN
FN+TN
TP+FN
TN+FP
TP+FN+FP+TN
Index test
Fig. 8.2 Sensitivity and specificity calculations. Sensitivity = TP/(TP + FN), specificity = TN/(TN + FP)
positive (FP) or false negative (FN), depending on the correlation with the reference standard. True disease status is a binary outcome. The patient either has the disease or does not. The disease prevalence within the population will also impact upon the overall test accuracy. For example, if the prevalence of the disease is very high, a test which always gives a positive result will be correct in the majority of cases, but is clearly useless as a diagnostic test. If a spectrum of disease is being evaluated, a threshold for positivity is selected. For example, carotid artery stenosis may be considered significant at 70% luminal narrowing, with milder degrees of narrowing classified as a negative disease status. TP, TN, FP and FN outcomes are classified according to Fig. 8.1. “True” outcomes on the index test are those which agree with the reference result for that subject. “False” outcomes disagree with the reference result and are considered inaccurate. An ideal diagnostic test produces no false outcomes.
8.4.2 Sensitivity, Specificity and Predictive Values Sensitivity, specificity, negative predictive value (NPV) and positive predictive value (PPV) are commonly encountered in clinical practice. Despite their frequent
usage, the terms are often difficult to conceptualise and apply. This section aims to give an understanding and statistical basis for these terms.
8.4.2.1 Sensitivity and Specificity Sensitivity measures the ability of a test to identify diseased patients. A test with 75% sensitivity identifies 75 out of every 100 patients with a positive result on the reference test. Sensitivity gives no indication about the test performance in healthy patients. Statistically, sensitivity is the ratio of TPs to all positive results on the reference test (TP + FN). The numbers in the left column of Fig. 8.2 are used in calculation of sensitivity. Specificity measures how well a test identifies healthy patients. It is the proportion of healthy people who have a negative test result. Like sensitivity, the “true” status is determined by the reference standard. Specificity is complementary to sensitivity, as it gives information only about healthy subjects. Statistically, specificity is the ratio of TNs to all negative results on the reference test (FP + TN) (Fig. 8.2). Sensitivity is also known as true positive rate (TPR), whilst (1-Specificity) is known as false positive rate (FPR). These terms are used in receiver operating characteristic (ROC) analysis, as discussed later in the
90
C. M. Jones et al.
Fig. 8.3 Predictive value calculations. PPV = TP/(TP + FP), NPV = TN/(TN + FN)
Reference Test +
-
+
TP
FP
TP+FP
-
FN
TN
FN+TN
TP+FN
TN+FP
TP+FN+FP+TN
Index test
chapter. Sensitivity and specificity are useful when deciding whether to perform the test. Depending on the clinical scenario, high sensitivity or specificity may be more important and the best test can be chosen for the clinical situation.
be calculated using binomial methods, although if sensitivity and specificity are dependent on additional test characteristics, other methods are preferred (described later in the chapter).
8.4.2.2 Positive and Negative Predictive Values
8.4.3 Likelihood Ratios
PPV and NPV are also used to measure the usefulness of a diagnostic test. PPV and NPV assess the probability of true disease status once the test result is known. For example, a PPV of 80% indicates that 80% patients with a positive test result actually have the disease. A NPV of 40% indicates that only 40% of patients testing negative are truly healthy. PPV and NPV are particularly useful in interpreting a known test result. Statistically, PPV is the ratio of TPs to all the positive test results (TP + FP). NPV is the ratio of TNs to all negative test results (TN + FN) (Fig. 8.3). The ratios are calculated horizontally across the 2 × 2 table. 8.4.2.3 Comparison of Terms Sensitivity and specificity measure the inherent accuracy of the index test compared to the reference test. They are used to compare different tests for the same disease and population, and help clinicians choose the most appropriate test. On the other hand, PPV and NPV are measures of clinical accuracy and provide the probability of a given result being correct. In practice, both types of summary measures are reported in the literature, and the difference between them should be clearly understood. There is often a need for a high sensitivity or specificity, depending on the consequences of the two types of error (FN or FP). The accuracy of a test is, therefore, usually given as a pair of (sensitivity, specificity) values. Confidence intervals for sensitivity and specificity can
LRs quantify how much a test result will change the odds of having a disease. Many clinicians are more comfortable with probabilities than odds, but when used appropriately, they are useful clinical tools. The conversion of a probability into odds and back again is simple, according to the formulas: Odds = Probability/(1 − Probability). Probability = Odds/(1 + Odds). For example, a pre-test probability of 75% can be converted to odds of 0.75/0.25 = 3. The odds are 3–1. The odds of having the disease after the test result is known (post-test odds) will depend on both the pre-test odds and the LRs. The positive likelihood ratio (LR+) indicates how the pre-test odds of the disease change when the test result is positive. Similarly, the negative likelihood ratio (LR−) indicates how the odds change with a negative test result. Pre-test odds usually depend on the prevalence of the disease in that population and individual patient characteristics. The pre-test odds must be estimated by the clinician before the post-test odds can be calculated. The LRs are calculated by: Positive likelihood ratio (LR+) = sensitivity/(1−specificity) Negative likelihood ratio (LR−) = (1− sensitivity) / specificity Post-test odds = Pre-test odds × Likelihood Ratio.
8
Diagnostic Tests and Diagnostic Accuracy in Surgery
LRs per se do not vary with disease prevalence, although they are vulnerable to the same factors that affect sensitivity and specificity. The LRs may, therefore, vary between populations. The post-test odds are calculated as pre-test odds multiplied by the relevant LR.
8.4.4 Diagnostic Odds Ratio The DOR is another summary measure of diagnostic accuracy. It is defined as Diagnostic odds ratio: DOR = LR + /LR − sensitivity/(1 − specificity) = . (1 − sensitivity)/specificity
If the 2 × 2 table is used to gather data (as in Fig. 8.1), the formula for DOR simplifies to DOR = (TP × TN)/(FP × FN). DOR is a summary measure of the odds of positive test results in patients with disease, compared with the odds of positive test results in those without disease. It is a measure of test discrimination, and variation of threshold makes this discriminating power favour either sensitivity or specificity. In clinical practice, DOR is not commonly used for deciding on the best test on an individual basis; however, it is a useful measure for comparing different tests, and is often estimated in diagnostic test studies and meta-analyses.
8.4.5 Confidence Intervals and Measures of Variance Estimating confidence intervals for ratios is more complex than for proportions. Natural logarithmic transformation of the ratio is performed, and variance equations are used to calculate the confidence intervals. Once the limits for the log transformation are known, the antilogarithmic transformation is performed to get the confidence interval limits for the original ratio variable. The equations for log (LR) variances are given in detail in Pepe [1] and other texts. The variance of log DOR, which is useful in calculating confidence intervals for overall accuracy, is simplified down to:
91
var{log (DOR)} = 1 / TP + 1 / FP + 1 / FN + 1 / TN, where TP, FP, FN and TN are the entries in the 2 × 2 table (Fig. 8.1).
8.4.6 Concluding Remarks About Estimates of Diagnostic Accuracy There are multiple ways to quantify the performance of a diagnostic test. Sensitivity and specificity are used to measure the inherent accuracy of the test, without consideration of clinical needs. Predictive values are used to address the clinical question of how to interpret a specific test finding. LRs relate to inherent accuracy and can be used to estimate post-test probabilities of true disease status. Two-by-two tables are simple, straightforward tools for the calculation of the estimates as well as their confidence intervals. Once the calculations have been performed several times, their calculation becomes simple and understanding their meaning straightforward.
8.5 Receiver Operating Characteristic Analysis Some diagnostic tests do not produce a dichotomous answer (yes/no), but measure a continuous or ordinal scale. Continuous variables are commonly encountered in surgical practice. For example, serum haemoglobin, body temperature, prostatic specific antigen and creatinine are all continuous variables. The interpretation of these is usually done, either formally or informally, in consultation with a preconceived threshold for normality or a threshold to change management. Ordinal variables are less common, but include histopathological determination of the grade of tumour differentiation. Ultimately, the result of a diagnostic test is transformed into a dichotomous result – the probability of disease has reached a certain threshold or it has not. The selection of the appropriate threshold is usually based on guidelines derived from local guidelines or the medical literature. For each test, the choice of threshold will depend on the population, disease, available resources for follow up and consequences of inaccurate diagnosis.
92
C. M. Jones et al.
Fig. 8.4 Schematic ROC curve showing sensitivity vs. (1-specificity) plotted over the unit square. The diagonal represents the performance of the uninformative test
Schematic ROC
1
Sensitivity
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
1-Specificity
8.5.1 The ROC Curve The most commonly used measure to quantify the performance of such tests is the ROC curve, which measures test accuracy over the spectrum of possible results. ROC curve analysis is based on the selection of a threshold for positive results. By convention, we will assume that a larger result indicates the presence of disease, but in practice, a threshold may represent the lower limit of normality. For each threshold, a test has a given sensitivity and specificity. An ROC curve maps the possible pairs of (sensitivity, 1-specificity) that are produced when the threshold is shifted throughout its spectrum. In ROC analysis, sensitivity is also known as TPR, and (1-specificity) is known as FPR. Once the TPR and FPR are calculated for a given threshold, another threshold is chosen, and calculations performed again. Once all the (TPR, FPR) pairs for the test are calculated, TPR is plotted on the vertical axis, and FPR is plotted on the horizontal (Fig. 8.4). The range for TPR and FPR is zero to one, mapping the ROC curve over the unit square, (0,0) to (1,1). Less strict thresholds produce more positives, both true and false, leading to a high TPR and FPR. Stricter criteria for a positive result produce fewer positives, lowering TPR and FPR. The choice of ideal sensitivity and specificity will depend on the clinical situation. The ideal test threshold will discriminate diseased from nondiseased subjects all the time, and will have a TPR of 1
and an FPR of zero (corresponding to the top left corner of the ROC graph). Intuitively, a test that gives the incorrect diagnosis every time is simply the reverse of the ideal test, and while it is not clinically useful, it is actually a highly diagnostic test (once its flaws are known). The uninformative test makes a positive result equally likely for all subjects, irrespective of true disease status. In this case, the TPR equals the FPR, and the resulting ROC curve is a straight diagonal line from (0,0) to (1,1) (Fig. 8.4). As the accuracy of the test improves, the curve moves closer to the top left hand corner of the graph. In practice, the sensitivity and specificity for a test are often reported for a given diagnostic threshold. By demonstrating the accuracy over a spectrum of thresholds, the overall accuracy of the test can be summarised. This allows different tests to be compared through summary measures of the ROC curves.
8.5.2 Area Under the ROC Curve The area under the ROC curve (AUC) is a summary measure of test performance which allows comparison between different tests. TPR and FPR values lie between zero and one inclusive, making the AUC for a perfect test equal to one. The test which produces a false result every time (and is a perfect test to identify the false disease state) is a flat line along the horizontal axis, with an AUC of zero. The random test, allocating positive results half the time, has AUC of 0.5.
8
Diagnostic Tests and Diagnostic Accuracy in Surgery
AUC can be calculated for different diagnostic tests, and then compared. An AUC closer to one indicates a better test.
8.6 Combining Studies: Diagnostic Meta-Analysis This chapter has so far discussed diagnostic accuracy of a single test in a single population. Whilst this can be performed multiple times using different tests, populations or gold standards within the same published article, the process of combining these results to form robust statistical conclusions is most commonly seen in meta-analytical papers. For many diagnostic tests, there will be multiple individual studies available in the literature, with varying results. This section explains the principles of robust diagnostic meta-analytical methodology.
8.6.1 Goals and Guidelines Meta-analysis aims to identify a clinical question and use the available data to produce a comprehensive answer. The collective data provide more reliable and credible conclusions than individual studies, provided that the methodology of the primary studies and metaanalysis is sound. Diagnostic meta-analysis focuses on the performance of a test across the range of studies in the literature. The first steps are to identify an index test, appropriate reference standard and the diagnostic outcome(s) to be measured. For example, it is important to choose an appropriate comparison in carotid artery stenosis measurement – the reference standard and threshold for significant stenosis may vary between studies. It is also vital for the meta-analysts to determine the clinical settings of the test, including test characteristics, clinic setting, target population and threshold for positivity. As far as is practical, it is best to include studies which are similar in design, population, test performance and threshold. As this is not feasible in many meta-analytical projects, there are guidelines available to help reduce the impact of heterogeneity arising from different studies. One method of dealing with differences between studies is to isolate a subset of papers and perform subgroup analysis for more targeted
93
conclusions. A more robust method is to analyse the effect of certain characteristics on the outcomes of the test and assess the effect of study quality on the final results (see Sect. 8.6.2). For example, the accuracy of cardiac stress testing will vary across studies with different target populations or reference standards. Studies comparing thallium studies to angiography are not directly comparable to those comparing thallium studies with coronary CT. More complex statistical models are capable of directly comparing multiple tests within the same analysis, but this is discussed later in the chapter.
8.6.2 Heterogeneity Assessment The variation in results across the studies included in a meta-analysis is termed heterogeneity. The reasons for different results are numerous – chance, errors in analytical methodology, differences in study design, protocol, inclusion and exclusion criteria, threshold for calling a result positive and many others. Compared to studies of therapeutic strategies, the insistence on high quality reporting of study protocols for diagnostic strategies has been lax, leading to sometimes marked differences in published results for the same technique. The earlier discussion on sources of bias within diagnostic testing is useful in this context to explain sources of heterogeneity in diagnostic meta-analysis. In addition, if there are differences in the spectrum of disease found in the studies in the meta-analysis, there may be heterogeneity due to the effect of disease severity on the performance of the test. A test may well be more sensitive to a severe case of a particular disease than a subtle case. The points listed in the STARD and QUADAS tools are particularly important sources of heterogeneity to consider.
8.6.2.1 Quality Assessment (Univariate Analysis) The tools shown in Sect. 8.3 (STARD and QUADAS) are designed to help identify aspects of primary study design that may affect the accuracy results. Whiting et al. [4] demonstrate that summary quality scores like those generated by the STARD tool are not appropriate to assess the effect of study quality on meta-analysis
94
C. M. Jones et al.
results, as a summary score may mask an important source of bias. Analysing each item in the QUADAS tool separately for effect on overall accuracy is a more robust approach. Westwood et al. [5] suggest separate univariate regression analysis of each item, with subsequent multivariate modelling of influential items (with P < 0.10 in the univariate model) to assess their combined influence on the results. The influence of the allocated variable value (or covariate) is expressed as the relative diagnostic odds ratio (RDOR). RDOR is calculated as the DOR when the covariate is positive, and as a ratio to the DOR when the covariate is negative. Westwood et al. [5] provide a thorough discussion and worked examples of univariate and selective multivariate analysis of QUADAS tool items, as well as carefully chosen analysis-specific covariates. The identification of design flaws, population characteristics, test characteristics and other case-specific parameters which affect the reported accuracy can explain the heterogeneity of results and sharpen the conclusions of the meta-analysis.
8.6.2.2 Random and Fixed Effects The variation between results across the group of studies included in the meta-analysis can be approached in two different ways. The variation in results is assumed to be partly due to the studies themselves – patient, protocol or test factors, for example. In fixed effects models, the influence of a given study characteristic on the heterogeneity of the results is assumed to be constant, or fixed, across the studies. However, the distribution of the variable across the studies is not usually known to be constant, and more conservative analysis methods use random effects models. Random effects consider the available studies as a sample of the “population” of all possible studies, and the mean result from the
sample is used to estimate the overall performance of the diagnostic test. The study characteristic proposed to contribute to heterogeneity is, therefore, not assumed to be fixed across the studies. In this way, the level of certainty is less, and therefore, the confidence intervals tend to be wider than fixed effects methods. The variation between studies can be expressed in a variety of ways. A forest plot (Fig. 8.5) is often used to graphically demonstrate the amount of overlap in sensitivity, specificity DOR or LRs between studies. A summary ROC curve can also be used to graphically demonstrate the variety of sensitivity and specificity pairs, as described below. Cochran’s Q statistic is a chi-squared test which is used to estimate the heterogeneity of a summary estimate like DOR or LR. The smaller the value of Q, the lesser the variation across the studies. Analysis of subgroups of the studies with similar characteristics allows heterogeneity to be calculated for factors which do vary across the entire group of studies. For example, if studies including adults only are analysed separately, the heterogeneity that arises cannot be explained by adult-child differences in test accuracy. A weighting can be applied to each study’s result, depending on the choice of weighting felt to be most appropriate. Inverse variance and study size are commonly used weights, where large studies and studies with little variation in results are given greater weighting in the heterogeneity calculations.
8.6.3 Diagnostic Meta-Analysis Techniques Whether meta-analysis of pooled data can be conducted depends both on the number and methodological quality of the primary studies. The simplest method for analysing pooled data from multiple studies is averaging the Sensitivity (95% CI)
Fig. 8.5 Forest plot showing heterogeneity of sensitivity results across a dummy set of data. The overall pooled sensitivity result is shown as a diamond shape representing the confidence interval
1 2 3 4 5 6
0.81 (0.54 - 0.96) 0.81 (0.63 - 0.93) 0.43 (0.10 - 0.82) 0.89 (0.78 - 0.95) 0.91 (0.76 - 0.98) 0.76 (0.53 - 0.92) Pooled Sensitivity = 0.84 (0.77 to 0.89)
0
0.2
0.4
0.6 Sensitivity
0.8
1
8
Diagnostic Tests and Diagnostic Accuracy in Surgery
sensitivities and specificities. This is valid only when the same test criteria, population and clinical setting have been used in each study, and each study is of similar size and quality. If different criteria, or thresholds, have been used, there will be a relationship between sensitivity and specificity across the studies. As sensitivity increases, specificity will generally drop (threshold effect). In these cases, weighted averages will not reflect the overall accuracy of the test, as extremes of threshold criteria can skew the distribution. Separate calculations of overall sensitivity and specificity tend to underestimate diagnostic test accuracy, as there is always interaction between the two outcomes. DOR is the statistic of choice to measure the overall performance of a diagnostic test.
8.6.3.1 Summary Receiver Operating Characteristic (SROC) Analysis If the test results are available in binary form, 2 × 2 tables with TP, FP, TN and FN can be formed for each primary study. The [TPR, FPR] pair from each study can be plotted onto a pair of axes similar to that of a ROC curve. In a ROC curve, the data points are formed by variation of the diagnostic threshold within the same
sensitivity 1
SROC Curve
population. In meta-analytical curves, each data point is formed by the (TPR,FPR) result from a primary study. The scatterplot of data points formed from these results is termed a summary ROC (SROC) curve (Fig. 8.6). The curve which is mapped onto the graph is calculated through regression methods. The principles are that logit(TPR) and logit(FPR) have a linear relationship, and that this relationship can be exploited with a line of best fit (regression techniques). Logit(TPR) and logit(FPR) are defined as Logit (TPR) = log {TPR /(1 − TPR)} Logit (FPR) = log {FPR /(1 − FPR)} (where log represents the natural logarithm). As there is no logical reason to favour logit(TPR) or logit(FPR), Moses et al. proposed using linear combinations of the two variables as the dependent and independent variables in the regression equation. This also neatly solves the dilemma about the different “least squares” solutions, which would result from choosing one or the other logit. D = a + bS D = logit (TPR) − logit (FPR) S = logit (TPR) + logit (FPR), where α is the intercept value, and β represents the dependence of test accuracy on threshold D is equivalent to log (DOR) or the diagnostic log odds ratio. S inversely measures the diagnostic threshold. High S values correspond to low diagnostic thresholds. The above equation maps D against S on a linear axis curve, and the least squares line of best fit is fitted to the data. Once a and b are calculated from the intercept and slope of the D-S line, the model can be transformed back into the plane of (TPR,FPR), according to the equation
0.9 0.8 0.7 0.6 0.5 0.4
TPR =
0.3 0.2 0.1 0
95
0
0.2
0.4 0.6 1-specificity
0.8
Fig. 8.6 Example of SROC curve showing sensitivity (TPR) vs. (1-specificity) (FPR) plotted over the unit square. The diagonal represents the random test curve. The antidiagonal running from the top left corner to the bottom right corner intersects the curve at the value of Q*. The area under the curved line is AUC
exp (a / (1 − b ))[FPR / (1 − FPR)] (1± b )(1−b ) . 1 + exp (a / (1 − b ))[FPR / (1 − FPR)](1+ b )(1− b )
The curve of TPR against FPR can now be plotted over the data points, as both a and b are known from the line of best fit in the logit plane. Calculation of the area under the curve (AUC) is performed by integration of the above equation over the range (0,1). If the range of raw FPR data points is small, it may be necessary to perform a partial AUC over the range of data. This is acceptable if the specificity of the test can be assumed
96
C. M. Jones et al.
to be similar in the target population which will undergo the test in practice. Further information regarding calculations of the SROC curve can be found in Moses et al. [6] or Walter [7]. Examples of SROC analysis are provided in Jones and Athanasiou [8]. The statistic Q* is another summary measure of accuracy derived from the SROC curve. Q* is the point on the curve which intercepts the anti-diagonal and corresponds to the point where sensitivity and specificity are equal. Numerically, Q* is defined as Q* = exp(a/2) / [1 + exp(a/2)]. where α is the intercept value The Q* value lies between 0 and 1 and represents a point of indifference – the probability of correct diagnosis is the same for all subjects. Q* is an appropriate statistic provided that the clinical importance of sensitivity and specificity is approximately equal. After weighing up the importance of the two outcomes, it may be felt that they are not of equal importance, and Q* would no longer be an appropriate summary statistic. There are several disadvantages of the traditional SROC analysis. Firstly, it is impossible to give a summary estimate of sensitivity and specificity, as they are the independent and dependent variables in the exponential curve model. This means that AUC and Q* are used as summary values of overall accuracy, which are less well understood and need to be explained when used in the literature. It is the estimated sensitivity and/ or specificity which often help the clinician choose a diagnostic test, and the clinical scenario may favour a test with high sensitivity or specificity, but not necessarily both. SROC does not facilitate calculations of overall summary estimates based on one or the other. Similarly, LRs cannot be produced by SROC. Secondly, the effect of diagnostic threshold is not modelled, so if the accuracy is dependent on threshold, the model is biased and the curve appears asymmetrical. Additionally, threshold is not allowed to vary in the interpretation and application of the results to particular populations.
8.6.3.2 Bivariate Approach to SROC The bivariate approach was applied to diagnostic tests by Reitsma et al. [9]. The bivariate model uses the supposition that the outcome(s) of interest may be multiple and co-dependent, in the case of sensitivity and
specificity of a diagnostic test. As well as keeping the two-dimensional principles of sensitivity and specificity intact, this approach allows a single variable to be factored into the models of both outcomes. Statistically, the bivariate model is a random-effects model in which each of logit(sensitivity) and logit(specificity) is assumed to follow a normal distribution across the studies included in the meta-analysis. There is also assumed to be a correlation between the two logit-transformed variables. The presence of two dependent normally-distributed variables leads to a bivariate distribution. The variance of the two outcomes is expressed as a matrix of individual variances and the covariance between the two. Unlike SROC analysis, the results of bivariate models are expressed as summary estimates of sensitivity and specificity and their confidence intervals. As the underlying distributions are assumed to be normal, a linear function can be used to perform the analysis, allowing readily available software to be used. In addition, the use of random-effects makes estimating interstudy heterogeneity in either sensitivity or specificity straightforward. The ability to examine the effect of a covariate, for example study design or inclusion criteria, on sensitivity or specificity, rather than the overall DOR, is also an advantage. The bivariate model is, however, more complex than the SROC model, and if further reading is desired, the papers by Reitsma et al. [9] and Harbord et al. [10] are recommended.
8.6.3.3 Hierarchical SROC Analysis Hierarchical SROC (HSROC) performs multiple layers of modelling to account for heterogeneity both within and between primary studies by modelling both accuracy and threshold as random effects variables. At withinstudy level, the number of positive results is modelled on a binomial distribution, incorporating threshold and accuracy as random effects and the interaction between them as a fixed effect. This is identical to the traditional SROC model if there is no dependence of accuracy upon threshold. A second level of modelling uses the estimates of accuracy and threshold (assuming a normal distribution) to calculate the SROC curve, as well as expected sensitivity, specificity, LRs and other desirable endpoints like AUC. Like the bivariate model, the HSROC model assumes normal distributions of the underlying variables;
8
Diagnostic Tests and Diagnostic Accuracy in Surgery
however, the focus of HSROC analysis is curve generation with emphasis on the shape of the curve. The complexity of layered non-linear modelling has prevented widespread use of HSROC. A Bayesian approach using Markov Chain Monte Carlo randomisation methods required a further level of modelling just to run the analysis [11] and proved the validity of the model, without making it accessible. New software procedures like NLMIXED in SAS have enabled easier model application. The model requires individualised syntax and a grasp of both SAS and the statistical models. However, useful summary estimates are produced for sensitivity, specificity and LRs as well as the HSROC curve. The AUC and Q* estimates can be easily calculated using SAS macros. The Bayesian estimates have been shown to closely approximate the results produced with SAS [12]. The HSROC technique is able to compare multiple tests from the same studies in the same analysis, meaning that the differences between the study characteristics are present for both tests. Threshold, accuracy and SROC curve shape are estimated separately for each test, increasing the number of variables in the model. The limiting factor in HSROC analysis is often the capacity of the data to converge to a result. The SROC shape variable may be considered the same for each test type if difficulties in convergence arise. Covariates allow heterogeneity to be investigated and explained by different aspects of study design or patient spectrum. There is far more flexibility in HSROC compared to traditional SROC in terms of explaining different results. Both intra- and inter-study variables can be included as covariates. The HSROC curve is plotted on the same axes and has the same layout as the SROC curve. Curve asymmetry can be quantitatively assessed through shape variables to determine the influence of threshold on test accuracy.
8.6.3.4 Comparing the Bivariate and HSROC Models The results of HSROC and bivariate analysis are often very similar, and in the case where there are no studylevel covariates, they are identical. The bivariate model allows covariates that influence sensitivity and specificity (or both), whilst the HSROC model allows covariates that influence threshold or accuracy (or both). If a covariate is assumed to affect both variables in either model,
97
then the two models are the same. The HSROC method, however, allows greater flexibility in dropping variables from the model as it allows greater control over the choice of included covariates compared to the fairly standard framework of the bivariate approach. The bivariate model, however, can be fitted using a variety of available software without the need to use the NLMIXED function in SAS or the cumbersome WinBUGS software. Further information is available in Harbord et al. [10].
8.7 Conclusions There are many papers in the medical literature which produce conclusions on diagnostic test accuracy. Appropriate application to clinical practice depends on the clinician’s ability to identify the strengths and weaknesses in the study and decide whether the results apply to their clinical practice. Diagnostic accuracy is not measured in the same way in all studies. Most will quote sensitivity and specificity, LRs or DOR, and these concepts should be familiar to every clinician when reading papers on diagnostic accuracy. Meta-analytical papers are becoming more stringently reviewed with the increasing acceptance of the QUADAS tool and published guidelines for the performance of diagnostic meta-analysis. Nevertheless, the clinician should remain vigilant for papers which do not meet these guidelines, and at least have the understanding (or this book as a reference tool!) to make the most appropriate decisions in clinical practice.
References 1. Pepe MS (2003) The statistical evaluation of medical tests for classification and prediction (oxford statistical science series. Oxford University Press, New York 2. Irwig L, Tosteson AN, Gatsonis C et al (1994) Guidelines for meta-analyses evaluating diagnostic tests. Ann Intern Med 120:667–676 3. Begg CB, McNeil BJ (1988) Assessment of radiologic tests: control of bias and other design considerations. Radiology 167:565–569 4. Whiting P, Rutjes AW, Reitsma JB et al (2003) The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol 3:25 5. Westwood ME, Whiting PF, Kleijnen J (2005) How does study quality affect the results of a diagnostic meta-analysis? BMC Med Res Methodol 5:20 6. Moses LE, Shapiro D, Littenberg B (1993) Combining independent studies of a diagnostic test into a summary ROC
98 curve: data-analytic approaches and some additional considerations. Stat Med 12:1293–1316 7. Walter SD (2002) Properties of the summary receiver operating characteristic (SROC) curve for diagnostic test data. Stat Med 21:1237–1256 8. Jones CM, Athanasiou T (2005) Summary receiver operating characteristic curve analysis techniques in the evaluation of diagnostic tests. Ann Thorac Surg 79:16–20 9. Reitsma JB, Glas AS, Rutjes AW et al (2005) Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol 58:982–990 10. Harbord RM, Deeks JJ, Egger M et al (2007) A unification of models for meta-analysis of diagnostic accuracy studies. Biostatistics 8:239–251 11. Rutter CM, Gatsonis CA (2001) A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations. Stat Med 20:2865–2884
C. M. Jones et al. 12. Macaskill P (2004) Empirical Bayes estimates generated in a hierarchical summary ROC analysis agreed closely with those of a full Bayesian analysis. J Clin Epidemiol 57:925–932
List of Useful Websites 1. Loong T. (2003) Understanding sensitivity and specificity with the right side of the brain. BMJ 2003;327:716–719. http://www.bmj.com/cgi/content/full/327/7417/716 2. Altman DG, Bland JM (1994) . 3. Statistics notes: diagnostic tests 1: sensitivity and specificity. BMJ 1994;308:1552. http://www.bmj.com/cgi/content/full/ 308/6943/1552 4. Altman DG, Bland JM (1994) Statistics notes: diagnostic tests 2: predictive values. BMJ 1994;309:102. http://www. bmj.com/cgi/content/full/309/6947/102
9
Research in Surgical Education: A Primer Adam Dubrowski, Heather Carnahan, and Richard Reznick
Contents 9.1
Introduction ............................................................
99
9.2
Qualitative vs. Quantitative Research .................. 100
9.2.1 Generating Questions ............................................... 100 9.2.2 Qualitative Research................................................. 101 9.2.3 Quantitative Research............................................... 102 9.3
Research Design...................................................... 103
9.3.1 Minimizing Threats to Validity ................................ 104 9.3.2 Design Construction ................................................. 104 9.3.3 The Nature of Good Design ..................................... 106 9.4
Measures (Experimental Research) ...................... 106
9.4.1 9.4.2 9.4.3 9.4.4
Developing an Instrument ........................................ Feasibility ................................................................. Validity ..................................................................... Reliability .................................................................
9.5
Acquisition and Analysis of Data (Experimental Research) ....................................... 110
9.5.1 9.5.2 9.5.3 9.5.4 9.5.5 9.5.6 9.5.7 9.5.8
Data Collection......................................................... Tests of Normality .................................................... Three Normality Tests .............................................. Categories of Statistical Techniques ........................ Nonparametric Analyses .......................................... Relationships Between Variables ............................. Differences Between Independent Groups ............... Differences Between Dependent Groups .................
9.6
Funding, Dissemination, and Promotion.............. 113
107 107 108 108
110 110 111 111 112 112 112 112
References ........................................................................... 114
A. Dubrowski () Centre for Nursing Education Research, University of Toronto, 155 College Street, Toronto, ON, Canada M5T 1P8 e-mail: [email protected]
Abstract The field of surgical education is young, and opportunities are not on the same scale as they are in fields of fundamental biology, clinical epidemiology, or health care outcomes research; however, the trajectory is steep, and educational work is improving in its sophistication and adherence to methodological principles. There is also an excitement and desire among young academic surgeons to work in an area that has obvious and direct relevance to their “mainstream job” as surgeons. This chapter explores approaches that can bring such evidence to bear upon educational questions, since education, like any other discipline, cannot be subject to practice by anecdote. Changes must be made on the basis of methodologically rigorous and scientifically sound research.
9.1 Introduction Surgical education has come of age. The past 30 years have seen efforts aimed at a scientific understanding of education in surgery mature from the level of minor curiosity to emerge as bona fide academic focus. Convincing evidence supports this observation, with accomplishments in the field of surgical education regularly being considered a criterion for promotion, with graduate training in education becoming more common for aspiring academic surgeons, and with an increasing number of surgery departments hiring surgeons who will principally function in the educational arena. Of course, interest in educational theory and research is not new. For over a century, the field of psychology was the “academic home” of educational research. The last three or four decades, however, have witnessed an explosion in the quantity, diversity, and quality of efforts in education. Disciplines such as cognitive science,
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_9, © Springer-Verlag Berlin Heidelberg 2010
99
100
kinesiology, engineering, and the health professions have invested deeply in educational research as a fundamental expression of their discipline. Such activity has translated into increased opportunities for individual academics wishing to focus their work in education. In the health professions alone, there are approximately twenty institutions now offering graduate degrees tailored to the needs of academic professionals. Hundreds of journals focus on education, providing a forum for scientific writing in the field, and conferences concentrating on issues of medical and surgical education abound in the America, Europe, and Australasia. To be sure, the field of medical and surgical education is young, and opportunities for scholarship are not on the same scale as they are in fields of fundamental biology, clinical epidemiology, or health care outcomes research; however, the trajectory is steep, and educational work is improving in its sophistication and adherence to methodological principles. There is also an excitement and desire among young academic surgeons to work in an area that has obvious and direct relevance to their “mainstream job” as surgeons. This bourgeoning interest in surgical education has paralleled the skyrocketing attention being paid to the more general field of medical education fueled by a number of seminal issues. Of all these issues, the advent of problem-based learning [1], a curricular methodology which swept North America and Europe, has probably had the greatest impact on the delivery of surgical education in the last 30 years. The use of computer-assisted instruction, which has changed our thinking about information transfer, has also had a significant effect, and we are most likely now on the cusp of further revolutionary changes affected by the digital delivery of knowledge [2]. A further important wave of change has come in the form of a formal focus on communication skills; this focus on “noncognitive” competencies has, to a large extent, been responsible for the birth of the entire discipline of performance-based testing, with instruments such as the Objective Structured Clinical Examination (OSCE) [3]. Lastly, a transformative change is now occurring in the direction of interprofessional education. With the breaking down of traditional hierarchies, medicine of the future will be delivered by teams of health professionals; and health professional learning will, likewise, be accomplished through an interprofessional effort.
A. Dubrowski et al.
Concurrent with these more general issues in medical education, some sweeping changes have permeated the domain of surgical education. The first global change witnessed has been a decrease in the luster of the surgical specialties. Today’s medical students, voting with their “choice of residencies,” have declared an overwhelming interest in the so-called “controlled lifestyle specialties,” an interest that has led to a decrease in surgery as a career choice. Also, concern about “over-work” in the surgical workplace has escalated with the notoriety of the Libby Zion case [4] and with subsequent dramatic changes in the realm of workhour restrictions for house staff. This issue has become entrenched in practice through mechanisms such as the ACGME 80-h work week in the United States and the European working-time directive [5–7]. Another major change in surgical education has been the increasing popularity of adjunctive environments for surgical learning, most notably the skills laboratories, which provide instruction in “the technical” fundamentals of surgery. For surgeons interested in education, this has been an awakening; and the introduction of serious academic research venues congruent with their skills, talents, and passions has promoted a huge amount of research activity. The interest generated by these various issues must be viewed in conjunction with another emphasis now found in surgical inquiry, namely, that change needs to be guided by evidence. This chapter explores approaches that can bring such evidence to bear upon educational questions, since education, like any other discipline, cannot be subject to practice by anecdote. Changes must be made on the basis of methodologically rigorous and scientifically sound research.
9.2 Qualitative vs. Quantitative Research 9.2.1 Generating Questions Framing a good research question is the first step and one of the most important factors in designing a good research project and research program – a well formulated question that is searchable and testable will most likely lead to a successful exploration of the literature. In the question formulation stage, two pitfalls need to
9
Research in Surgical Education: A Primer
be avoided. One of the most common pitfalls encountered by a researcher immersed in a specific clinical context is that his or her research question often springs from anecdotal evidence and observations rather than from existing literature and evidence; the researcher can avoid this pitfall by framing the question carefully, ensuring that any personal bias and experiential contextualization are removed. The second pitfall is that the question generated may lack focus. By dividing the question into its component parts at the very beginning, the researcher can avoid this situation and will formulate a highly focused question, which not only avoids unnecessary, complicated, and time-consuming searches of irrelevant material, but also facilitates the discovery of the best evidence in the literature and the generation of a set of testable hypotheses. Ultimately, the structuring of the question is a function of both personal style and the nature of the research to be conducted. For example, a hypothesis-driven question will be highly structured, whereas a question leading to exploratory research will be less focused to allow the researcher to explore a wider scope. Qualitative research questions should be structured to address how people “feel” or “experience” certain situations and conditions; quantitative research questions should be structured to answer questions of “how many” or “how much” [8–10].
9.2.2 Qualitative Research Qualitative research addresses the questions dealing with perceptions of situations and scenarios. The most common qualitative approaches in medical education involve grounded theory, ethnography, and phenomenology. However, since qualitative research is not the focus of this chapter, we only briefly describe the aims and methodologies of these three approaches. The grounded theory approach aims to discover the meanings that humans assign to people and objects with which they interact, in order to link these meanings with observed behaviors [11]. Using interviews and observations to collect data, this is one of the most common approaches in medical education. Researchers with backgrounds in rhetoric, sociology, and psychology who engage in qualitative research using grounded theory approach have made tremendous contributions to our understanding of meanings and behaviors within the
101
context of medical education. Lorelei Lingard [12] and her team, for instance, have studied interprofessional communication patterns within various clinical settings such as the operating room, the intensive care unit, and the pediatric medicine inpatient ward. In particular, their work explores the intersection between team communication patterns and the issues of novice socialization and patient safety. On the basis of this research, we are now beginning to understand how certain communication patterns lead to more effective team performance within a clinical setting, and it is hoped that we can structure our future educational interventions to stimulate favorable patterns of communication. Ethnography aims to study culture from the people who actually live in a culture [13]. In the case of medical and surgical education, ethnography is a study of medical students and residents who live in the culture of medical education. This approach can be described as a very intensive, face to face encounter between the researcher and the study culture; during fieldwork, data are generally gathered by participation in social and cultural engagements. The third approach, phenomenology, aims to achieve a deeper understanding of the nature of the meaning of the everyday lived experience of people. This approach demands an indepth immersion of the researcher in the world of the study population. Asking a well-structured qualitative research question is a very difficult and demanding task. Two essential elements are necessary in the formulation of a qualitative research question: the population to be studied and the situation within which the population is being studied (see Table 9.1). The question should address the characteristics of the specific population that will be tested or assessed: Are they part of a larger group of individuals who may be different from the rest of the population? Is there a specific grouping, such as sex, or age, or level of training? What is the specific environment in which the problem is being studied? The question should also address the situation or context in which the population will be observed: What are the circumstances, conditions or experiences that are known to affect the population and which of them still need to be explored? Working example: “Communication patterns affect the atmosphere of the operating room?” Although on the surface this question appears well structured, it does not adhere to the principles of generating the optimal qualitative research question. Specifically, this
102
A. Dubrowski et al.
Table 9.1 Key features Key of testable and searchable characteristic research questions Population
Qualitative
Quantitative
Who are the participants? From where? What are their preexperimental features?
Who are the participants? From where? What are their preexperimental features?
Interventions
What are the experimental interventions and what are the control interventions?
Outcomes
What are the expected outcomes? Who will assess these and how?
Situation
What circumstances, conditions, or experiences is the researcher interested in learning more about?
question does not define the participants and the situation. If we assume that the researchers are interested in the assessments of communication patterns between the surgeon, the senior resident, and the nurse, as well as their perceptions of operating room atmosphere, that the observations are made in a virtual operating room, and finally that all parties involved went through a training session on how to improve their communication patterns, a more defined question would be: “Does interprofessional training in communication patterns lead to perceptions of improved cohesion within the operating room?: Simulation study.”
9.2.3 Quantitative Research Arguably, the majority of the research conducted in the field of medical and surgical education is quantitative in nature, typically answering questions of “how many” and “how much.” For example, when comparing two different educational approaches, the researcher can ask questions of the difference in the amount of learning that occurred when the participants were exposed to one of the two educational interventions. The quantitative research approach allows one to generate precise and focused research questions and highly testable hypotheses. Several approaches to formulating quantitative research studies have been proposed. Collectively termed “true experimental designs,” they include randomized controlled trials and observational studies. The randomized controlled trial consists of a random allocation of study participants into various experimental groups. The purpose of the randomization
procedure is to ensure that any confounding variables that are not anticipated or controlled by the researcher will be equally distributed among all the experimental groups. In some cases, the researcher may decide to conduct a pretest after the initial randomization. The pretest confirms that all participants show similar performance characteristics on the skill of interest; therefore, any improvements noted after the experimental intervention are likely due to the intervention rather than chance. Once participants are allocated to the various experimental groups, their performances are assessed forward in time to determine whether the educational experience has the hypothesized effect. Typically, this is assessed with an immediate posttest. The common mistake among educational researchers is the assumption that the posttest indicates the amount of learning that has occurred because of the educational intervention. Based on the learning literature and theories, the immediate posttest should be treated as an indication of the improvement in the performance, rather than learning [14]. More specifically, there are sets of variables that may influence performance on this immediate posttest: boredom, fatigue, and diminishing interest in learning may have a negative effect on the immediate performance. In contrast, excitement, recency, and group cohesion may be variables that have a positive, facilitating impact on the performance. To circumvent the negative as well as the positive transient effects of practice, it has been hypothesized that the true measure of learning should be assessed by a delayed posttest. Therefore, introducing a retention period between end of practice and assessment of knowledge allows the negative and positive variables to dissipate, revealing the true amount of change in skill performance due to the educational intervention.
9
Research in Surgical Education: A Primer
The strength of the randomized controlled trials, when applied to medical and surgical education, is that they control for many unanticipated factors. However, this approach does not allow the researcher to study larger-scale education interventions, such as the effectiveness of various educational curricula. Observational studies may provide better evidence than the randomized controlled trials when addressing this type of question. In particular, cohort studies are a type of observational study that may be used to compare the performance of students undergoing a specific curriculum to performances of other students undergoing a different curriculum. This comparison could be made across various educational institutions or various historical cohorts. Another less common approach to observational studies used in research in medical and surgical education is case–control studies. This approach requires that the researcher identifies outcomes or performance characteristics in a specific population of trainees and tracks back in time the educational exposures in order to find an indicator of this outcome. One possible example is the observation that a group of residents who perform well on a set of basic surgical skills were members of the surgical skills interest group in the first and second years of their undergraduate medical education, which allowed them to participate in many research studies as well as in numerous workshops of surgical skills. Overall, observational studies generate weaker evidence than the randomized controlled trials because of their inability to control intervening variables; sometimes, however, this is the only systematic experimental approach for answering certain questions of interest. The generation of the quantitative research question for searching the literature and conducting research needs different structuring than does generating a qualitative question (refer to Table 9.1). First, although the nature of the population that one wants to study still needs to be addressed; the additional components necessary for the generation of searchable and testable research questions are the specification of the interventions and the expected outcomes. Second, a crucial step in formulating questions addressing the intervention is the inclusion of the appropriate control groups. For example, when investigating the effectiveness of a specific type of simulation-based training, one control group may include didactic training. However, the inclusion of this control group does not address the question of the effectiveness of the particular simulation-based training, but rather the effectiveness of
103
hands-on practice in general. A more effective control group would practice the same skill in an alternative simulated environment. The inclusion of this control group not only addresses the question of whether handson practice within the simulated environment improves technical skills performance, but also assesses the effectiveness of the particular simulated approach when it is compared to other simulated approaches. Third, a key feature in formulating quantitative research question is addressing the outcomes. Whether performing a literature search or generating a testable hypothesis, the researcher needs to know which aspects of performance should be assessed, who should make these assessments, and when they should be made. Working example: “Does hands-on practice improve technical performance of anastomotic skills?” Although on the surface this question appears well-structured, it does not adhere to the outlined principles for generating quantitative research question. More specifically, the elements of the question do not define the population of participants, the specifics of the experimental intervention, or details about the outcomes. A question that is searchable and testable would include all three elements. If one assumes that the researchers were interested in the comparison of learning of the bowel anastomotic skills in junior and senior residents, and that they wanted to compare the effectiveness of highfidelity and low-fidelity training, and finally that they used expert-based ratings systems to assess the clinical outcomes such as leakage of the anastomosis, a more defined question would be: “Does practice on high- and low-fidelity models affect differentially the final products of junior and senior general surgery residents?”
9.3 Research Design Today’s educational research in medicine and surgery is devoted to examining whether specific educational programs or manipulations improve clinical performance. For example, researchers may wish to examine if a new educational program leads to more proficient performance in the clinical setting. The existence of such a cause–effect relationship requires that two conditions be met [15]. First, the researcher must observe changes in the outcome after, rather than before, the institution of the program. Second, the researcher must ensure that the program is the only reasonable explanation for the
104
changes in the outcome measures observed. If there are any other alternative explanations for the observed changes in outcomes, the researcher cannot be confident that the presumed cause–effect relationship is correct. Undeniably, in most educational research, showing that no alternative explanations can be applied to the findings is very difficult. There are many factors outside the researchers’ influence which may explain changes in outcomes [15, 16]. Some examples include other historical or ongoing events occurring at the same time as the program, which may have a direct or indirect impact on learning, improvement of skills, or development of a knowledge base. Cumulatively, these perceived or unperceived alternate explanations to the way the findings are being interpreted by the investigators are known as threats to internal validity. One possible way to eliminate or mitigate these threats is to maintain a rigorous approach to research design and methodology [15, 17].
A. Dubrowski et al.
Design is by far the most powerful method to rule out alternative explanations: this approach, therefore, warrants a more detailed formal expansion in the following section. Still, a number of alternative explanations cannot be eliminated by implementing research design strategies. To deal with these, the researcher can use various statistical analyses performed on the collected data. For example, a recent study by Brydges et al. [15] investigated the relationship between postgraduate-year of training and proficiency in knot tying skills. It was possible that some of the senior residents spent more time in the operating room than did junior residents, which was considered an uncontrolled variable; the actual number of hours spent in the operating room by every participant, therefore, was collected from their logbooks and used as a covariate in subsequent analyses. The results showed that the year of training, not the actual number of hours spent in the operating room, explained the differences in skill proficiency.
9.3.1 Minimizing Threats to Validity Minimizing threats to internal validity is an essential feature of any well-constructed experiment. In this section, we first discuss three approaches: argument, analysis, and design. These three approaches are not mutually exclusive; indeed, a good research plan should make use of multiple methods for reducing threats to validity. Ultimately, the better the research design, the less the threat to validity; therefore, we will expand on this issue with an outline of some common experimental designs, along with their strengths and weaknesses. It is common to rationalize or argue why a specific threat to internal validity may not merit serious consideration, but this is the least effective way to deal with threats to internal validity. This approach should only be used when the particular threats to validity have been formally investigated in prior research. For example, if an improvement in skill proficiency due to a specific educational program is revealed by assessments made by a single expert using a standardized set of assessment instruments, one may argue that a threat to the internal validity of the findings due to a single assessor is not likely because previous research shows that when performance is evaluated with the same set of assessment instruments by a number of experts, the assessments tend to be very similar.
9.3.2 Design Construction As already mentioned, proper research design is by far the most powerful method to prevent threats to internal validity. Most research designs can be conceptualized and represented graphically with four basic elements: time, intervention(s), observation(s), and groups. In design notation, time is represented horizontally, intervention is depicted with the symbol “X,” assessments and observations are depicted by the symbol “O,” and each group is indicated on a separate line. Most importantly, the manner in which groups are assigned to the conditions can be indicated by a letter: “R” represents random assignment, “N” represents nonrandom assignment (i.e., a nonequivalent group or cohort), and a “C” represents an assignment based on a cutoff score. The most basic causal relationship between an education intervention and an outcome can be measured by assessing the skill level of the particular group of trainees after the implementation of the educational intervention. Using the outlined notation, the research design would be the following: X
O
9
Research in Surgical Education: A Primer
This is the simplest design in causal research and serves as a starting point for the development of better strategies. This design does not control well for threats to the internal validity of the study. Specifically, one cannot confidently say that the observed skill level at the end of the study is any different from the skill level of the participants before the study. One also cannot rule out any historical events that may lead to changes in performance. When it is possible to deliver the educational intervention to all participants, a number of strategies are available to control for threats to internal validity of the study. One can include additional observations either before or after the program, add to or remove the program, or introduce different programs. For example, the researcher might add one or more preprogram measurements: O
O
X
O
The addition of such pretests provides a “baseline,” which, for instance, helps to assess the potential of maturation or testing threat. Specifically, the researcher can assess the differences in the amount of improvement on a skill between the first two tests and between the second pretests and the posttests. Similarly, additional posttest assessments could be added, which would be useful for determining whether an immediate program effect decays over time or whether there is a lag in time between the initiation of the program and the occurrence of an effect. However, repeated exposures to testing can potentially influence the amount of learning that occurs due to the actual educational intervention. Therefore, conclusions related to these assessments may overestimate the educational impact of the intervention. For this reason, it is suggested that a control group be included in the study design if possible. One of the simplest designs in education research is the two-group pretest/posttest design:
N
O
N
O
X
O
105
assignment of participants to each of the groups. There was an initial assessment of skill level before the program implementation (indicated by the first “O”). Because the participants were not randomly assigned to the two groups, the initial test of the ability to perform the skill was crucial to ensure that any differences found due to the educational program were not present before the study. Subsequently, participants in the first group received the educational program (indicated by X), while participants in the second group did not. Finally, all participants’ skill levels were measured after the completion of the program. Sometimes, the initial pretests of the ability to perform the skills in question may be viewed as a contaminating factor. That is, the exposure to the test may be enough practice for the group which did not receive the educational intervention to learn the skills. Therefore, whenever possible, one should avoid this initial pretest. The posttest-only randomized experimental design depicted below allows the researcher to assume that the randomized assignment of the participants to the two groups ensures an equal level of knowledge across the two groups; therefore, no pretests are necessary, and all the differences observed at the end of the study should be attributed only to educational intervention: R
X
R
O O
Educational researchers argue that neither of the two posttest-only designs adequately describes the amount of learning that occurs during the educational intervention [14]. The immediate posttest results, they argue, may be influenced by many transient factors such as fatigue, boredom, or excitement. One strategy to circumvent any influences on the results due to these transient factors is the introduction of a delayed posttest, in which case, the researcher allows participants to take a rest from the study, typically termed a retention interval (this interval may vary, but it should definitely be longer than the educational intervention):
O R
The interpretation of this diagram is as follows. The study was comprised of two groups, with nonrandom
R
X
O
O
O
O
106
A. Dubrowski et al.
While these approaches are a valid and functional first approximation of the effectiveness of educational programs, they are limited. Specifically, they assess the effectiveness of an educational intervention or program when compared to a lack of alternative educational intervention. The much more challenging and more informative approach is to assess the effectiveness of an educational intervention when compared to a different intervention: O
X1
O
O
X2
O
For example, two groups shown in this diagram are assessed prior to the intervention and after the intervention, and each group undergoes a different type of intervention. Assuming that the performance on the skill and amount of knowledge is the same across the two groups on the pretest, any differences found on the posttest would be a consequence of the specific intervention. Frequently, the inclusion of additional groups in the design may be necessary in order to rule out specific threats to validity. For example, the implementation of an educational program within a single educational institution may lead to unwanted communication between the participants, or group rivalry, possibly posing threats to the validity of the causal inference. The researcher may be inclined to add an additional nonequivalent group from a similar institution; the use of many nonequivalent groups helps to minimize the potential of a particular selection bias affecting the results: R
O
X
O
R
O
O
N
O
O
Cohort groups may also be used, in a number of ways. For example, one could use a single measure of a cohort group to help rule out a testing threat:
N
O
R
O
R
O
X
O
In this design, the randomized groups might be residents in their first year of residency, while the cohort group might consist of the entire group of first year residents from the previous academic year. This cohort group did not take the pretest and, if they are similar to the randomly selected control group, they would provide evidence for or against the notion that taking the pretest had an effect on posttest scores. Another possibility is to use pretest/posttest cohort groups: N
O
N
O
X
O O N
O
Here, the treatment group consists of first year residents, the first comparison group consists of second year residents assessed in the same year, and the second comparison group consists of the following year’s first year resident (i.e., the fourth year medical students at the time of the study year).
9.3.3 The Nature of Good Design In the preceding section, we have proposed several generally accepted research designs that are particularly applicable to educational research; still, we encourage researchers to be innovative in order to address their specific questions of interest. The following strategies may be used in order to develop good, custom-tailored designs. First, and most important, is that a research design should reflect the theories being investigated and should incorporate specific theoretical hypotheses. Second, good research design should reflect the settings of the investigation. Third, good research design should be very flexible; this can be achieved by duplication of essential design features, though it should maintain a balance between redundancy and over-design.
9.4 Measures (Experimental Research)
O
In most examples of quantitative educational research, there are two categories of variables: dependent and independent. Dependent variables are the measures
9
Research in Surgical Education: A Primer
that will vary or be impacted in some fashion, according to changes or fluctuations in other parameters or independent variables. For example, if we wanted to study the impact of medical school grades (GPA) and standardized admission tests (MCAT) on performance in surgical residency, we might set up a correlational study in which we define success in surgical residency as the score on a final exam (FINAL) at the end of training. We would then define a regression equation which would evaluate the extent to which the independent variables of GPA and MCAT could predict the dependent variable of FINAL. While the selection of independent variables is critical for effectively addressing a research question, the appropriateness of the measures used to quantify performance is just as critical. One of the challenges of educational research is to identify and develop dependent variables that adequately represent whether learning has taken place. Approaches to this involve demonstrating that an existing tool is valid, or if an existing tool does not yet exist, embarking upon the creation and validation of such a tool.
9.4.1 Developing an Instrument The first phase in the development of a new tool involves getting experts to pool their knowledge. One approach used to achieve this is called the Delphi technique, developed by the RAND Corporation in the late 1960s as a forecasting methodology. The Delphi technique has developed into a tool that allows a group of experts to reach consensus on factors that are subjective [18–21]. This technique involves several steps. The first step is the selection of a facilitator to coordinate the process. This is followed by the selection of a panel of experts who will develop a list of criteria; this step is called a round. There are no “correct” criteria, and input from people outside the panel is acceptable. Each member of the panel independently ranks the criteria, and then for each item on the list, the mean value is calculated, a mean ranking is calculated, and the item is listed according to its ranking. The panel of experts can then discuss the rankings, and the items are anonymously reranked until stabilization of rankings occurs. If we are to use the example of the development of an evaluation tool for the performance of a skill like
107
Z-plasty, a group of experts would identify the most important steps in this process. This information would be collected from the experts, and then the steps would be ordered and distributed to the group for ranking. This process does not necessarily require a face to face meeting, but could be conducted through e-mail. Following this, the experts would rank the importance of each step in the procedure. These results would be averaged and redistributed by the facilitator for a second ranking. If stabilization of the ranking of the steps was achieved, thenthe list would be completed. It is argued that the expert-based measures are always going to be subjective and prone to contributing the error variance associated with the human perceptual and decision making system. In response to this, the development of computer-based measures has been evolving. An example of this is the use motion analysis systems which are being used to obtain measures of hand motion efficiency (movement time and number of movements). These measurement systems have been shown capable of discriminating between expert and novice surgical performance [22–24]. Regardless of how a measure is generated, it must then be scrutinized in terms of its basic psychometric properties. It is generally thought that all competence measures need to be feasible, valid, and reliable.
9.4.2 Feasibility All measures of competence or new testing modalities must be achievable in terms of logistics, costs, and manpower. This is as important in the paradigm of experimentation as it is in the elaboration of an evaluative mechanism for a program. For example, some assessment methods, such as a multiple-choice examination, are cost-efficient, easy to deliver, easily scorable and deliverable through a variety of mechanisms, such as paper and pencil, digital delivery, or web administration. Others, such as an OSCE, can be costly, labor-intensive, and logistically challenging. There is almost always a fine balance between feasibility considerations, reliability, and validity. Generally speaking, especially in terms of validity, the more valid instruments are also among the most costly and logistically challenging.
108
9.4.3 Validity The next step in the process of developing a dependent variable useful for educational research involves establishing the validity and reliability of this newly established instrument. Validity reflects the degree to which an instrument actually measures what it is intended to measure, and reliability refers to the repeatability of a measure. If an instrument is found to be unreliable, then it cannot be considered valid, though it is also possible for an instrument to be reliable and still not be valid since it is repeatedly measuring the wrong construct. There are four types of validity: logical validity, content validity, criterion validity, and construct validity, each of which requires explication. Logical validity is established when there is an obvious link between the performance being measured and the nature of the measurement instrument. For example, a measure of patient safety logically should be linked with improvements in surgical performance. While it is important to establish logical validity, a more objective method for establishing validity is required for educational research. A second type of validity is content validity, which is particularly relevant for educational research. An instrument has content validity if it samples the content of a course equally throughout the duration of the learning experience. For example, if a technical skills course covered a range of suturing techniques (surface, at depth, laparoscopic), then all of these types of suturing should be evaluated using the measurement tool. This type of validity is, however, often qualitative in nature and must be accompanied by additional evaluations of validity. Criterion validity refers to the extent to which an instrument’s measures correlate with a gold standard. The concept of criterion validity can be subdivided into two categories: concurrent validity and predictive validity. Concurrent validity refers to the situation in which the gold standard and the newly established instrument are administered simultaneously. Currently, new computer-based measures of technical performance are being compared to more established instruments such as the OSATS. Predictive validity refers to the extent to which the measures generated from an instrument can predict future performance; an example would be the validation of an instrument that could be used to screen applicants to a surgical program and successfully predict success in surgical training. The mainstay of predictive validity
A. Dubrowski et al.
measures are correlational studies, and often multiple predictors are used in a regression equation to ascertain the relative contribution of the different predictors of the dependent variable. The final type of validity is construct validity. This concept addresses the degree to which an instrument measures the psychological trait it purports to measure. This may be fairly straight forward or relatively complex. For example, we might discover that a test we thought was measuring basic understanding of a scientific concept was really measuring the more rudimentary ability of reading comprehension. In that circumstance, we would say the test lacked construct validity because it was really not measuring what we thought it was measuring. Another way of looking at construct validity is predicated on the assumption that skill in a domain increases with experience. Therefore, if we find that a test of surgical skills systematically increases with the level of a trainee, we make the inference that it is construct valid. The comparisons across practice time or across groups could be made using a t-test or an analysis of variance (if the data have been shown to be normally distributed; otherwise an analogous nonparametric test could be used). Correlations can also be used in determining construct validity, particularly when examining relationships between constructs.
9.4.4 Reliability Reliability refers to the precision of a measurement. For example, a highly reliable test will be one that orders candidates on a particular dimension consistently and repeatedly. It is usually reported as an index, one common index being Cronbach’s alpha. Indices like Cronbach’s alpha employ the general principle that individuals knowledgeable or talented in a particular domain will show evidence of that talent throughout a test and across multiple raters. When we obtain a measurement of a person’s performance from a newly developed instrument, we have an observed score that is comprised of two essential elements: an individual’s real performance and the part of the score that can be attributed to measurement error. This error can be related to the participant, the testing situation, the scoring system, or instrumentation. For example, subject error variance can be influenced
9
Research in Surgical Education: A Primer
by factors such as mood, fatigue, previous practice, familiarity with the test, and motivation. The testing situation can also introduce error by factors such as clarity of the instructions, the quality of the test situation (i.e., is it quiet? is there a class going on in the same room?), or the manner in which the results will be used. The scoring system is only as good as the experts who do the scoring: all experts may not be of the same skill level, and some may be more experienced than others when judging performance. Also, in a computer-based approach to evaluation, the calibration of the equipment can lead to obvious instrumentation error; or, if a checklist or global rating score being used is not sensitive enough to discriminate between skills levels (a validity problem), this will contribute to the error variance. Reliability may be estimated through a variety of methods that fall into two types: single-administration (test retest method, alternative form method) and multiple-administration (split-halves method, internal consistency method) Fig. 9.1. Thus, the score variance is a combination of true score variance and error variance. The coefficient of reliability is the ratio of true score variance/observed score variance. However, since true score variance is never known, it is estimated by subtracting error variance from observed score variance.
109
When establishing validity, interclass correlations are used, an example of which is Pearson r, used when one is correlating two different variables (e.g., comparing a computer-based measure of surgical performance to a global rating score). However, when establishing reliability, an intraclass correlation must be used because the same variable is being correlated. When an instrument is administered twice and the first administration is correlated with the second administration, the intraclass correlation is calculated (typically using analysis of variance to obtain the reliability coefficient). One aspect of reliability addressed in educational research is stability, which is determined by establishing similar scores on separate days. A correlation coefficient is calculated for two tests separated by time. Through analysis of variance, the amount of variance on the 2 days accounted for by the separate days of testing (as well as the error variance) can be determined. Alternatively, the reliability of two raters can be established by correlating the performance of two sets of evaluation of the same performance, i.e., the technical performance of a trainee can be judged by two independent (and blinded) experts. If the instrument being used is reliable, the two judges will have a high correlation between their scores. There is no definitive rule for categorizing a correlation as high or
Low reliability
Test scores
Test scores
High reliability
Re-test scores Fig. 9.1 Reliability may be estimated through a variety of methods that fall into two types: single-administration (test retest method, alternative form method) and multiple-administration
Re-test scores (split-halves method, internal consistency method). This figure illustrates test retest method
110
A. Dubrowski et al. Bimodal
Unimodal
Variable A
−1.0 to −0.7 strong negative association. −0.7 to −0.3 weak negative association. −0.3 to +0.3 little or no association. +0.3 to +0.7 weak positive association. +0.7 to +1.0 strong positive association.
Variable A
low; however, the following scheme can be used as a guideline:
Variable B
9.5 Acquisition and Analysis of Data (Experimental Research) 9.5.1 Data Collection Educational research should be seen as an endeavor as important as any other type of research carried out in a Faculty of Medicine, and advocates for educational research should lobby for the necessary laboratory space and resources that are allocated to all medical scientists. It is important to create a controlled and calm data collection environment in order to avoid external events that might influence the participants’ performance (unless, of course, that is the variable of interest). For stability of the data collection session, it is critical that the trainee’s or expert’s performance be evaluated in a quiet room and not just in a corner of a large classroom while other activities are being carried out. This ensures that participants are not distracted by extraneous conversation or events in the environment. It is also critical for the ethics process that the confidentiality of the performance and participation of all participants (the trainee in particular) be protected; this is not possible if the performance takes place in a makeshift environment. Also, for consistency throughout the collection of a project, it is important that video cameras and the like be left in the same position for all participants so that there is consistency in the video tapes that experts will review later for the expert-based evaluations (i.e., check lists, global ratings, and final product analyses). It is also beneficial to have a single research assistant collecting all data, and if there is more than one group (i.e., a control group and an educational intervention group), participants should be allocated in either random or alternating fashion to ensure that any environmental changes that take place over the course of the testing will influence both groups equally. Once the data collection is completed, the process of analysis should begin. The first step in this
Variable B
Fig. 9.2 This figure illustrates two different sets of data. In both cases, similar correlation and slopes will describe the relationship between variables A and B. Careful inspection of the plots reveals, however, that there is a bimodal distribution of data points in the left panel. When data are reanalyzed separately for each of the two groupings, the correlations may be nonsignificant
process should involve producing a scatterplot of the data for each participant for each level of independent variable manipulated. A quick look at the plot will provide insight into whether there are any outliers present and will also provide insight into the impact of error variance on the variance accounted for by the educational manipulations. It is critical to be familiar with and know one’s data prior to running a series of statistics. If the data patterns do not make sense, it is important to double check that there have been no errors in the data management process Fig. 9.2.
9.5.2 Tests of Normality Prior to doing any inferential statistics, the normality of the data set should be evaluated (tests of skewness and kurtosis). Skewness refers to the direction of the hump of a curve: if the hump is shifted to the left, the skewness is positive, and if the hump is to the right, the skewness is negative. Kurtosis refers to the shape of the curve and describes how peaked or flat the curve is in comparison to a normally distributed curve. The researcher’s choice of statistical test will vary, depending on the characteristic of the data set. If he uses a parametric test, it is assumed that the data set he has sampled from is normally distributed. There are no assumptions of normality for nonparametric tests. However, it is preferable to use parametric tests as they have more statistical power, thereby increasing the chance of rejecting a false null hypothesis. Many data analysis methods (t-test, ANOVA, regression) depend on the assumption that data were sampled from a normal Gaussian distribution. The best way to
9
Research in Surgical Education: A Primer
evaluate how far data are from Gaussian is to look at a graph and see if the distribution deviates grossly from a bell-shaped normal distribution. There are statistical tests that can be used to test for normality, but these tests do come with problems. For example, small samples almost always pass a normality test, which has little power to tell whether or not a small sample of data comes from a Gaussian distribution. With large samples, minor deviations from normality may be flagged as statistically significant, even though small deviations from a normal distribution would not affect the results of a t-test or ANOVA (both of which are rather robust to minor deviations from normality). The decision to use parametric or nonparametric tests should usually be made on the basis of an entire series of analyses. It is rarely appropriate to make the decision based on a normality test of one data set. It is usually a mistake to test every data set for normality and use the result to decide between parametric and nonparametric statistical tests. But normality tests can help the researcher understand the data, especially when similar results occur in many experiments.
9.5.3 Three Normality Tests Most statistics packages contain tests of normality. For example, a commonly used test is the Kolmogorov– Smirnov test, which compares the cumulative distribution of the data with the expected cumulative normal distribution and bases its P value on the largest discrepancy. Other available tests are the Shapiro–Wilk normality test and the D’Agostino–Pearson omnibus test. All three procedures test the same null hypothesis – that the data are sampled from a normal distribution. The P value answers the question “If the null hypothesis were true, what is the chance of randomly sampling data that deviate as much (or more) from Gaussian as the data we actually collected?” The three tests differ in how they quantify the deviation of the actual distribution from a normal distribution.
9.5.4 Categories of Statistical Techniques There are two main categories of statistical techniques, those used to test relationships between variables within a single group of participants (e.g., correlation
111
or regression), and those used to evaluate differences between or among groups of participants (e.g., t-test or ANOVA). The purpose of correlation is to determine the relationship between two or more variables. The correlation coefficient is the quantitative value of this relationship. This coefficient can range from 0.0 to 1.0 and can be either positive or negative, with 1.0 being the perfect correlation. When there is one criterion (or dependent) variable and one predictor (or independent) variable, a Pearson product moment coefficient is calculated. However, when there is one criterion and two or more predictor variables, then multiple regression is used. While we often use a simple correlation, in reality, the prediction of educational success typically involves multiple variables. There are various methods for introducing the variables into a multiple regression analysis, including forward selection, backward selection, maximum R-Squared, and stepwise. Each of these methods differs in terms of the order in which the variables are added and how the overlapping variance between variables is treated. In experimental research, the levels of the independent variables are established by the experimenter. For example, an educational researcher might want to evaluate the effects of two types of simulator training on surgical performance. The purpose of the statistical test is to evaluate the null hypothesis (Ho) at a specific level of probability (P < 0.05). That is, do the two levels of treatment differ significantly so that these differences would be attributable to a chance occurrence more than 5 times in 100? The statistical test is always of the null hypothesis. Statistics can only reject or fail to reject the null hypothesis; they cannot accept the research hypothesis, i.e., statistics can only determine if the groups are different, not why they are different. Only appropriate theorizing can do that. One method of making these comparisons when evaluating two treatments is the t-tests. An ANOVA is just an extension of a t-test, which allows for the evaluation of the null hypothesis among two or more groups as long as the groups are levels of the same independent variable. For example, the effects of three types of simulation training can be compared on surgical performance. For both t-tests and ANOVA, the comparisons can be either across groups or within a group. An example of a within group or repeated measures comparison would be multiple samples as a function of time. For a t-test, this could be a pretest/posttest comparison within the same group; or for an ANOVA, there could be multiple samples as a function of practice time. Or, the comparisons
112
A. Dubrowski et al.
Time taken to complete task
a
effect. If the ANOVA is nonsignificant, one cannot go further with the posthoc analysis.
b
9.5.5 Nonparametric Analyses Pre
c
Post
Pre
Group 1
Pre
Post
Group 2
Group 3
Pre
Post
If the data are not normally distributed, an analogous series of nonparametric tests, equivalent to their parametric partners, can be used to establish correlation relationships and comparisons between groups.
Post
Group 1
Group 2
Group 3
Fig. 9.3 This figure illustrates hypothetical results of an experiment in which three groups of trainees participated in three different educational interventions. (a) Shows a main effect for test, where the analyses suggest that, overall, the trainees performed better (in less time) after training. (b) Shows a main effect for group, meaning that at least one of the three groups performed differently than did the other two. Posthoc analyses revealed that group 1 performed in a shorter time than did groups 2 and 3. (c) Shows an interaction between group and test. Specifically, posthoc analyses demonstrate that group 1 was the only one that benefited from the intervention; groups 2 and 3 did not. Therefore, interpretation of the main effects, in light of the significant interaction, may have led to wrong conclusions
can be across groups, i.e., performance of groups that practice under different conditions can be compared. If there is more than one manipulation of an independent variable, a factorial ANOVA is used – for instance, when three training groups are compared both before and after training. To evaluate this design, a two-factor ANOVA would be used that allows for the testing of the group by test interaction. It would be predicted that, for all groups, there would be no differences between the training groups in the pretest since no intervention would have yet been introduced. However, for the posttest, the effects of the various simulators would be apparent Fig. 9.3. When there are more than two levels to an ANOVA, any statistically significant effects require that a posthoc test be applied to determine the level at which the statistical difference exists. That is, if there is a main effect or interaction involving three or more means, a test needs to be conducted to find which means differ. A wide range of tests can be used to posthoc a significant ANOVA effect, ranging from very conservative to very liberal tests. In the field of medical education, often a more conservative posthoc is used, such as Tukey HSD. Posthoc tests can only be used when the “omnibus,” or overall, ANOVA found a significant
9.5.6 Relationships Between Variables To determine a relationship between two variables, a correlation coefficient is calculated. Nonparametric equivalents to the standard correlation coefficient are Spearman R, Kendall Tau, and coefficient Gamma (see Nonparametric correlations). If the two variables of interest are categorical in nature (e.g., “passed” vs. “failed” by “male” vs. “female”), appropriate nonparametric statistics for testing the relationship between the two variables are the Chi-square test, the Phi coefficient, and the Fisher exact test. In addition, a simultaneous test for relationships between multiple cases is available, the Kendall coefficient of concordance, which is often used for expressing interrater agreement among independent judges rating (ranking) the same stimuli.
9.5.7 Differences Between Independent Groups The nonparametric equivalents of an independent t-test are the Wald-Wolfowitz runs test, the Mann-Whitney U test, and the Kolmogorov–Smirnov two-sample test. If there are multiple groups, instead of using an ANOVA, use the nonparametric equivalents to this method, the Kruskal-Wallis analysis of ranks, and the Median test.
9.5.8 Differences Between Dependent Groups Tests analogous to the t-test for dependent samples are the Sign test and Wilcoxon’s matched pairs test. If the variables of interest are dichotomous in nature
9
Research in Surgical Education: A Primer
(i.e., “pass” vs. “no pass”), McNemar’s Chi-square test is appropriate. If there are more than two variables, instead of using a repeated measure ANOVA, use the nonparametric equivalents, the Friedman’s two-way analysis of variance, and Cochran Q test (if the variable was measured in terms of categories, e.g., “passed” vs. “failed”).
9.6 Funding, Dissemination, and Promotion Educational research will be judged, besides other forms of scholarly inquiry. As such, one of the essential components of “good educational research” is the capture of peer review funding. This element, perhaps more than any other, will get a young surgical researcher on track with his or her quest to focus on the science of surgical education. Procuring a grant has many obvious benefits. First and foremost, grant monies bring added value to an institution. Second, and equally important, they pay for the work, obviating the need for education to be a “poor second cousin” relying on discretionary handouts to fund its activities. Third, a well-constructed grant will contain the basics of a sound approach to answering the question at hand. It will place new work to be done in the context of existing knowledge and will define a series of methodological steps necessary to affirm or deny an educational hypothesis. Finally, and most importantly, the majority of grants are part of a peer review process. This element, more than any other, is a testament to the reputation of the research team, their previous track record, the integrity of the research proposal, and the impact of the work. Once the work is performed, it is essential that it get into print. “Getting published” in journals with a credible impact factor is an essential measurable product that will be valued by surgical chairs and promotion committees alike. A frequently asked question concerns the best place to publish work done in surgical education. Generally, there are five categories of journals that will accept articles focusing on surgical education. These include journals that focus on specific issues in medical education (for example, Applied Measurement in Education); journals that focus on general issues in medical education, such as Teaching and Learning in Medicine; journals that include a
113
specific emphasis on issues in surgical education, such as Journal of the American College of Surgeons; journals that are “disease specific,” such as Journal of Surgical Oncology; and finally, journals that focus on broad issues in medical science, such as New England Journal of Medicine. There is no obvious, optimal target for research work in surgical education. From one perspective, the broader the readership and more intense the impact factor, the better. However, there is merit in remaining very focused and becoming a “real expert” in a narrow field. Whatever the choice, the old adage that work not published is not really credible work is a fundamental part of our academic fabric. Allied to the published work, of course, is presentation of novel work at academic societies, an activity which is often tied to or proceeds from academic publication. It has often been said that education is the orphan child of our academic tripart. And to a certain extent this has, in the past, been true. Largely, this attitude is the product of individuals’ choice of surgical education as an academic focus by default, rather than by design. Hence, the “surgical educators” of a generation ago were often individuals whose laboratory work had run into difficulty, who transitioned into education during the final stages of their careers, or who branded themselves as clinical surgeons with an “interest in teaching.” This is contrasted today by a cadre of young surgeons with a manifest commitment to surgical education from the inception of their careers, who seek out specific training in surgical education – often including graduate level education– who have protected time for their research, and who work in an environment with an infrastructure supportive of educational science. In this modern context, the orphan child has been adopted. Another common question is whether work done in education “counts” in promotion and tenure decisions. The answer to this is an unequivocal “yes,” if that work is scholarly. Being a good teacher, holding administrative posts in education, mentoring surgical students, and participating in nonpeer reviewed education conferences will have some, but limited, value in the promotion process. By contrast, efforts in education that include capture of a peer review grant, that conform to basic accepted experimental principles, and that result in communication in the published form will absolutely count for promotion in the overwhelming majority of universities.
114
References 1. Reznick RK, Blackmore DE, Dauphinee WD et al (1997) An OSCE for licensure: the Canadian experience. In: Scherpbier AJJA, van der Vleuten CPM, Rethans JJ, van der Steeg AFW (eds) Advances in medical education. Kluwer, Dordrecht, The Netherlands, pp 458–461 2. Barrows HS, Tamblyn RM (1977) The portable patient problem pack: a problem-based learning unit. J Med Educ 52:1002–1004 3. Friedman CP, France CL, Drossman DD (1991) A randomized comparison of alternative formats for clinical simulations. Med Decis Making 11:265–272 4. Stern Z (2003) Opening Pandora’s box: residents’ work hours. Int J Qual Health Care 15:103–105 5. Carlin AM, Gasevic E, Shepard AD (2007) Effect of the 80-hour work week on resident operative experience in general surgery. Am J Surg 193:326–329; discussion 329–330 6. Leach DC (2004) A model for GME: shifting from process to outcomes. A progress report from the Accreditation Council for Graduate Medical Education. Med Educ 38: 12–14 7. Pickersgill T (2001) The European working time directive for doctors in training. BMJ 323:1266 8. Giacomini MK, Cook DJ (2000) Users’ guides to the medical literature: XXIII. Qualitative research in health care A. Are the results of the study valid? Evidence-Based Medicine Working Group. JAMA 284:357–362 9. Giacomini MK, Cook DJ (2000) Users’ guides to the medical literature: XXIII. Qualitative research in health care B. What are the results and how do they help me care for my patients? Evidence-Based Medicine Working Group. JAMA 284:478–482 10. Thorne S (1997) The art (and science) of critiquing qualitative research. In: Morse JM (ed) Completing a qualitative project: details and dialogue. Sage, Thousand Oaks, CA, pp 117–132 11. Mays N, Pope C (1995) Rigour and qualitative research. BMJ 311:109–112
A. Dubrowski et al. 12. Lingard L (2007) The rhetorical ‘turn’ in medical education: what have we learned and where are we going? Adv Health Sci Educ Theory Pract 12:121–133 13. Mays N, Pope C (2000) Qualitative research in health care. Assessing quality in qualitative research. BMJ 320:50–52 14. Dubrowski A (2005) Performance vs. learning curves: what is motor learning and how is it measured? Surg Endosc 19:1290 15. Brydges R, Kurahashi A, Brummer V et al (2008) Developing criteria for proficiency-based training of surgical technical skills using simulation: changes in performances as a function of training year. J Am Coll Surg 206:205–211 16. Campbell DT, Fiske DW (1959) Convergent and discriminant validation by the multitrait-multimethod matrix. Psychol Bull 56:81–105 17. Campbell DT, Stanley JC (1963) Experimental and quasiexperimental designs for research. Rand McNally, Chicago, IL 18. Cook TD, Campbell DT (1979) Quasi-experimentation: design and analysis issues for field settings. Rand McNally, Chicago, IL 19. Hyde WD (1986) How small groups solve problems. Ind Eng 18:42–49 20. Judd CM, Kenny DA (1981) Estimating the effects of social intervention. Cambridge University Press, Cambridge, MA 21. Madu CN, Kuei C-H, Madu AN (1991) Setting priorities for the IT industry in Taiwan – a Delphi study. Long Range Plann 24:105–118 22. Datta V, Chang A, Mackay S et al (2002) The relationship between motion analysis and surgical technical assessments. Am J Surg 184:70–73 23. Kyle Leming J, Dorman K, Brydges R et al (2007) Tensiometry as a measure of improvement in knot quality in undergraduate medical students. Adv Health Sci Educ Theory Pract 12:331–344 24. Martin JA, Regehr G, Reznick R et al (1997) Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg 84:273–278
Measurement of Surgical Performance for Delivery of a Competency-Based Training Curriculum
10
Raj Aggarwal and Lord Ara Darzi
Contents 10.1
Introduction .......................................................... 115
10.2
Surgical Competence, Proficiency and Certification .................................................. 116
10.3
Five Steps from Novice to Expert ....................... 116
10.4
Taxonomy for Surgical Performance ................. 117
10.5
Assessment of Technical Skills in Surgical Disciplines ............................................................. 118
10.6
Dexterity Analysis in Surgery ............................. 118
10.7
Video-Based Assessment in Surgery .................. 119
10.8
Virtual Reality Simulators as Assessment Devices................................................................... 120
10.9
Comparison of Assessment Tools ....................... 122
10.10
Beyond Technical Skill......................................... 122
10.11
A Systems Approach to Surgical Safety............. 122
10.12
The Simulated Operating Theatre ..................... 122
10.13
Curriculum Development.................................... 124
10.14
Innovative Research for Surgical Skills Assessment ............................................................ 125
10.14.1 Eye-Tracking Technologies .................................... 125 10.14.2 Functional Neuro-Imaging Technologies ............... 125 10.15
Conclusions ........................................................... 125
References ........................................................................... 126
R. Aggarwal () Department of Biosurgery & Surgical Technology, Imperial College London, 10th Floor, QEQM Building, St. Mary’s Hospital, Praed Street, London W2 1NY, UK e-mail: [email protected]
Abstract In order to provide a high quality health care service to the public, it is essential to employ proficient practitioners, using tools to the highest of their abilities. Surgery being a craft speciality, the focus is on an assessment of technical skill within the operating theatre. However, objective and reliable methods to measure technical skill within the operating theatre do not exist. Numerous research groups, including our own, have reported the objectivity, validity and reliability of technical skills assessments in surgical disciplines. The development and application of objective and reliable technical skills assessments of surgeons performing innovative procedures within the operating theatre can allow a judgement to be made with regard to the credentialing of surgeons to integrate novel procedures into their clinical practice.
10.1 Introduction Within the past decade, the training of a surgical specialist has become a subject of broad public concern. The almost daily articles in the mass media about doctors failing their patients and the widespread growth of the internet have led to a more educated public regarding their choice of medical specialist. To date, anyone undergoing an operative procedure asks their medical friends, “who is the best?” – a question that is traditionally answered in a subjective manner. Though there have been drives to publish mortality rates of individual doctors, departmental units and hospitals for key procedures, these figures can be misleading if casemix is not taken into account. The desire to strive for surgical excellence requires an appropriate measurement tool before inferences can be made regarding the competence of individuals
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_10, © Springer-Verlag Berlin Heidelberg 2010
115
116
within a profession [1]. Traditional measures of capacity to practise as an independent health care specialist have focused upon the use of written examinations, log book review and interview by senior members of the profession. However, these have been found to suffer from subjectivity, unreliability and bias. Particularly within the surgical specialties, it is an absolute to measure one’s technical skill. With the concomitant development of a competency-based surgical training system whereby it is acquisition of skill rather than time spent training, which leads to progression along the curriculum, it is imperative to define feasible, valid and reliable measures of surgical competence [2]. These can not only lead to the definition of benchmark levels of skill to be achieved, but also ensure the delivery of the curriculum in a standardised manner, making it possible to perform comparisons across trainees, training centres and regions in terms of not only skills accrued, but also costs entailed to achieve this.
10.2 Surgical Competence, Proficiency and Certification In 1999, The Institute of Medicine report, “To Err Is Human”, raised awareness of the significant number of medical errors committed, together with the deficiencies in the evaluation of performance and competence with regard to the medical profession [3]. It was suggested that an infrastructure that can support the assessment of competence needed to be developed. Effective since July 2002, the Accreditation Council for Graduate Medical Education (ACGME) listed six categories of competence, defined as the ACGME Outcomes Project (Table 10.1) [4]. Similarly, the American Board of Medical Specialties (ABMS) developed a set of criteria that defines competence in medicine. The description of general competency involves six components: patient care, medical knowledge, practicebased learning, interpersonal and communication skills, Table 10.1 Maintenance of certification
R. Aggarwal and L. A. Darzi
professionalism and systems-based practice. Based, to some extent, on these criteria, the ABMS and ACGME issued a joint statement in 2001 on surgical competences and, furthermore, the need for maintenance of certification. The need to ensure standardised definition of these terms is paramount and has led to an international consensus conference to establish definitions of the terms to be used when assessing technical skills in July 2001. A great deal of the discussion was based on the definitions of the terms competence, proficiency and expertise, which have been described by Dreyfus and Dreyfus in 1986 [5].
10.3 Five Steps from Novice to Expert It is in general accepted for most teaching to be concerned with bringing an individual up to a level of “competence” in their discipline. However, it is clear that some people go beyond this level to achieve expertise. How is this defined, and more importantly, if at all, can it be taught? Dreyfus and Dreyfus (1986), drawing on their different perspectives as computer scientist and philosopher, developed a five-stage theory of skill acquisition from novice through to expert. This was based upon acquiring skill through instruction and experience, with changes in task perception and mode of decision-making as skills improve. The five stages were defined as novice, advanced beginner, competent, proficient and expert (Table 10.2). During the first stage of skill acquisition, the novice learns to recognise facts and features relevant to the skill and acquires rules for determining actions based upon those facts and features. Relevant elements of the situation are Table 10.2 Five stages of skill acquisition Skill level Components Perspective Novice
Context-free
None
Analytical
Adv beginner
Context-free None and situational
Analytical
Competent
Context-free Chosen and situational
Analytical
Proficient
Context-free Experienced and situational
Analytical
Expert
Context-free Experienced and situational
Intuitive
Evidence of professional standing Evidence of lifelong learning Evidence of cognitive expertise Evidence of practice performance
Decision
10
Measurement of Surgical Performance for Delivery of a Competency-Based Training Curriculum
clearly identifiable without reference to the overall situation, i.e. context-free. For example, the novice laparoscopic camera holder is told to keep the surgeon’s working instrument in the middle of the picture at all times. This rule ignores the context of the operative procedure, and the beginner is not taught that in certain situations, it may be appropriate to violate that rule. The novice camera holder wishes to please the laparoscopic surgeon, and judges his performance by the number of times he is told to change the camera position. However, there is no coherent sense of the overall task. Having acquired a few more rules, performance of the task requires extensive concentration, with inability to talk or listen to advice from the senior surgeon. The rules enable safe acquisition of experience, though they must then be discarded. Performance improves when the surgeon has acquired considerable practical experience in coping with real-life situations. The advanced beginner can begin to recognise meaningful elements when they are present because of a perceived similarity with prior examples. These new elements are referred to as “situational” rather than “context-free”, though they are difficult to define per se. For example, the laparoscopic trainee can be taught to halt bleeding with diathermy to the blood vessel. This depends on the size and location of the vessel, though diathermy may cause more harm than good in some cases. It is, thus, the experiences that are important, rather than the presence or absence of concrete rules. With greater experience, the number of recognisable context-free and situational elements present in real-world circumstances eventually becomes overwhelming. A sense of what is important is missing. It is, thus, necessary to acquire a hierarchical sense of decision-making by choosing a plan to organise the situation, and then by examining specific factors within that plan, be able to attend to them as appropriate. Thus, competence is based upon having a goal in mind and attending to the most important facts to achieve that goal. The importance of certain facts may be dependent on the presence or absence of other facts. The competent surgeon may no longer dissect the colon in a pre-defined manner, but rather move from one side of the organ to the other as appropriate, to ensure safe and steady progress of the operative procedure. In order to perform at the competent level, the surgeon must choose an organising plan which is related to the environmental conditions, i.e. deliberate planning. Up to this level, the learner has made conscious choices of both goals and decisions after reflecting
117
upon various alternatives. With experience comes proficiency, enabling an individual to base future actions on past similar situations, with anticipation of the eventual outcomes. Intuition or know-how is based upon seeing similarities with previous experiences, though the proficient performer still thinks analytically about what to do. For example, a surgeon will notice on a ward round that a post-operative patient looks unwell and queries whether the bowel anastomosis has leaked. With the help of a series of tests together with intuition, or knowhow, the surgeon can decide upon whether to perform a re-operation and repair of the anastomotic leak. Expertise is based upon mature and practiced understanding, enabling the individual to know what to do. The expert does not need to follow rules or deconstruct the situation into individual facts, but instead “sees” the whole picture at first glance. For instance, an expert surgeon will very quickly decide to convert a laparoscopic to an open procedure due to anticipated difficulties with the case. A more junior surgeon will tend to struggle with the laparoscopic approach, with a greater possibility of causing injury, lengthening operative time and, overall, leading to a poorer operative outcome. It may be said that the expert surgeon has a “vision” of what is possible, and perhaps more importantly, what is not possible. Whilst most expert performance is ongoing and non-reflective, there are situations when time permits and outcomes are crucial, during which an expert will deliberate before acting. This may occur in a discussion with other experts, during a novel or unforeseen event, or when the environmental conditions are altered. Overall, there is a progression through these five stages of skills acquisition from the analytical behaviour of the detached subject, consciously decomposing the environment into recognisable elements and following abstract rules, to involved skill behaviour based on an accumulation of concrete experiences and the unconscious recognition of new situations as similar to whole remembered ones.
10.4 Taxonomy for Surgical Performance In order to define and measure the development of surgical expertise, it is necessary to develop a structured framework upon which this can be based. In July 2001,
118
Satava et al. convened an international workshop to enable standardisation of definitions, measurements and criteria with relevance to objective assessment of surgical skill [6]. A hierarchical approach to surgical practice was proposed: Ability: the natural state or condition of being capable, aptitude Skill: a developed proficiency or dexterity in some art, craft or the like Task: a piece of work to be done, a difficult or tedious undertaking Procedure: a series of steps taken to accomplish an end Using a surgical example, psychomotor ability, or aptitude, is defined as one’s natural performance with regard to operating on a two-dimensional screen whilst interacting in a three-dimensional space, i.e. laparoscopic surgery. With training, it is possible to acquire these abilities in order to develop skills such as instrument handling, suturing and knot-tying. A task is considered to be part of a procedure, for example, being able to perform a sutured anastomosis – this is not procedure-specific. Finally, a procedure is operation to be carried out, for example, laparoscopic adrenalectomy.
10.5 Assessment of Technical Skills in Surgical Disciplines In 1991, the Society of Gastrointestinal and Endoscopic Surgeons (SAGES) required surgeons to demonstrate competency before performing a laparoscopic procedure [7]. Competency was based on the number of procedures performed and time taken or on evaluation of the trainee by senior surgeons. These criteria are known to be crude and indirect measures of technical skill, or to suffer from the influence of subjectivity and bias. Professional organisations have recently recognised the need to assess surgical performance objectively. For any method of skill assessment to be used with confidence, it must be feasible, valid and reliable [2]. Feasibility is difficult to define as it is dependent upon the tool to be used, its cost, size, space requirements, transportability, availability, need for maintenance and acceptability to subjects and credentialing committees. Validity is defined as “the property of being true”, correct and conforming with reality’, with reference to
R. Aggarwal and L. A. Darzi
the concept of whether a test measures what it purports to measure. Face validity refers to whether the model resembles the task it is based upon, and content validity considers the extent to which the model measures surgical skill and not simply anatomical knowledge. Construct validity is a test of whether the model can differentiate between different levels of experience. Concurrent validity compares the test to the current “gold standard”, and predictive validity determines whether the test corresponds to actual performance in the operating theatre. Reliability is a measure of the precision of a test and supposes that results for a test repeated on two separate occasions, with no learning between the two tests, will be identical. It is measured as a ratio from 0 to 1.0, a test with reliability of 0–0.5 being of little use, 0.5–0.8 being moderately reliable, and over 0.8 being the most useful. This is known as the test–retest reliability, though the term inter-rater reliability is also important. This is a measure of the extent of agreement between two or more observers when rating the performance of an individual, for example, during video observation of a surgical procedure. Current measures for objective assessment of technical skill consist of dexterity analysis and video-based assessment. These measures can also aid structured progression during training, together with identification of trainees who require remedial action.
10.6 Dexterity Analysis in Surgery Laparoscopic surgery lends itself particularly well to motion analysis, as hand movements are confined to the limited movements of the instruments [8]. Smith et al. connected laparoscopic forceps to sensors to map their position in space, and relayed movements of the instruments to a personal computer. This enabled calculation of the instrument’s total path length, which was compared to the minimum path length required to complete the task. The Imperial College Surgical Assessment Device (ICSAD) has sensors placed on the back of a surgeon’s hands (Fig. 10.1). A commercially available device (Isotrack IITM; Polhemus, VT) emits electromagnetic waves to track the position of the sensors in x, y and z axes 20 times per second. This device is able to run from a standard laptop computer and data are analysed in terms of time taken, distance travelled and total number
10 Measurement of Surgical Performance for Delivery of a Competency-Based Training Curriculum
119
Fig. 10.1 The imperial college surgical assessment device (ICSAD)
of movements for each hand. Previous studies have confirmed the construct validity of the ICSAD as a surgical assessment device for open and laparoscopic procedures, both for simple tasks and real procedures such as a laparoscopic cholecystectomy [9]. Experienced laparoscopic surgeons made significantly fewer movements than occasional laparoscopists, who in turn were better than novices in the field. The ICSAD device has also been shown objectively to assess the acquisition of psychomotor skill of trainees attending laparoscopic training courses. The Advanced Dundee Endoscopic Psychomotor Tester (ADEPT) is another computer-controlled device, consisting of a static dome enclosing a defined workspace, with two standard laparoscopic graspers mounted on a gimble mechanism [10]. Within the dome is a target plate containing innate tasks, overlaid by a springmounted perspex sheet with apertures of varying shapes and sizes. A standard laparoscope relays the image to a video monitor. Each task involves manipulation of the top plate with one instrument enabling the other instrument to negotiate the task on the back plate through the access hole. The system registers time taken, successful task completion, angular path length and instrument error score (a measure of instrument contact with the sides of the front plate holes). Experienced surgeons exhibit significantly lower instrument error rates than trainees on the ADEPT system. Comparison of performance on ADEPT also correlated well with a blinded assessment of clinical competence, a measure of concurrent validity. Test– retest reliability of the system produced positive correlations for all variables when performance of two consecutive test sessions was compared.
These three methods of assessing dexterity enable objective assessment of surgical technical skill, but only the ICSAD device can be used to assess real operations. However, in this case it is important to know whether the movements made are purposeful. For example, the common bile duct may be injured during a laparoscopic cholecystectomy, and dexterity analysis alone cannot record this potentially disastrous error. To confirm surgical proficiency, it is necessary to analyse the context in which these movements are made.
10.7 Video-Based Assessment in Surgery During the introduction of laparoscopic cholecystectomy, SAGES and European Association for Endoscopic Surgery (EAES) advocated proctoring of beginners by senior surgeons before awarding privileges in laparoscopic surgery. A single assessment is open to subjectivity and bias, although additional criteria can improve reliability and validity. An example of this is the Objective Structured Clinical Examination (OSCE), a method of assessing the clinical skills of history taking, physical examination and patient–doctor communication. Martin et al. developed a similar approach to the assessment of operative skill, the objective structured assessment of technical skill (OSATS) [11] (Table 10.3). This involves six tasks on a bench format, with direct observation and assessment on a task-specific checklist, a seven-item global rating score and a pass/fail judgement. Twenty surgeons in training of varying experience performed
120
R. Aggarwal and L. A. Darzi
Table 10.3 The objective structured assessment of technical skills (OSATS) 1 2 3
4
5
Respect for tissue
Frequently used unnecessary force on tissue or caused damage by inappropriate use of instruments
Careful handling of tissue, but occasionally caused inadvertent damage
Consistently handled tissues appropriately with minimal damage
Time & motion
Many unnecessary moves
Efficient time/motion but some unnecessary moves
Economy of movement and maximum efficiency
Instrument handling
Repeatedly makes tentative or awkward moves with instruments
Competent use of instruments although occasionally appeared stiff or awkward
Fluid moves with instruments and no awkwardness
Knowledge of instruments
Frequently asked for the wrong instrument or used an inappropriate instrument
Knew the names of most instruments and used appropriate instrument or the task
Obviously familiar with the instruments required and their names
Use of assistants
Consistently placed assistants poorly or failed to use assistants
Good use of assistants most of the time
Strategically used assistant to the best advantage at all times
Flow of operation & forward planning
Frequently stopped operating or needed to discuss next move
Demonstrated ability for forward planning with steady progression of operative procedure
Obviously planned course of operation with effortless flow from one move to the next
Knowledge of specific procedure
Deficient knowledge. Needed specific instruction at most operative steps
Knew all important aspects of the operation
Demonstrated familiarity with all aspects of the operation
equivalent open surgical tasks on the bench format and on live anaesthetised animals. There was excellent correlation between assessment on the bench and live models, although test–retest and inter-rater reliabilities were higher for global scores, making them a more reliable and valid measurement tool. However, a global rating scale is generic and may ignore important steps of a particular operation. Eubanks et al. developed a procedure-specific scale for laparoscopic cholecystectomy with scores weighted for completion of tasks and occurrence of errors [12]. For example, liver injury with bleeding scored 5, whereas common bile duct injury scored 100. Three observers rated 30 laparoscopic cholecystectomies performed by trainees and consultant surgeons. Correlation between observers for final scores was good, although correlation between final score and years of experience was only moderate. A similar approach identified errors made by eight surgical registrars undertaking a total of 20 laparoscopic cholecystectomies. The procedure was broken down into ten steps such as “dissect and expose cystic structures” and “detach gallbladder from liver bed”. Errors were scored in two categories: inter-step (procedural) errors involved omission or rearrangement of correctly
undertaken steps, and intra-step (execution) errors involved failure to execute an individual step correctly. There was a total of 189 separate errors, of which 73 (38.6%) were inter-step and 116 (61.4%) intra-step. However, only 9% of the inter-step errors required corrective action, compared with 28% of intra-step errors. All of the above rating scales are complex and time consuming; for example, the assessment of 20 surgical trainers on the OSATS required 48 examiners for 3 hours each. Furthermore, the scales are open to human error and not entirely without subjectivity. To achieve instant objective feedback of a surgeon’s technical skills, virtual reality simulation may be more useful.
10.8 Virtual Reality Simulators as Assessment Devices The term virtual reality refers to “a computer-generated representation of an environment that allows sensory interaction, thereby giving the impression of actually being present”.
10 Measurement of Surgical Performance for Delivery of a Competency-Based Training Curriculum
121
Fig. 10.2 The minimally invasive surgical trainer – virtual reality (MIST-VR)
The MIST-VR laparoscopic simulator (Mentice, Gothenburg, Sweden) comprises two standard laparoscopic instruments held together on a frame with position-sensing gimbals (Fig. 10.2). These are linked to a Pentium personal computer (Intel, Santa Clara, CA) and movements of the instruments are relayed in real time to a computer monitor. Targets appear randomly on the screen and are “grasped” or “manipulated”, with performance measured by time, error rate and economy of movement for each hand. The LapSim (Surgical Science, Gothenburg, Sweden) laparoscopic trainer has tasks that are more realistic than those of the MIST-VR, involving structures that are deformable and may bleed. The Xitact LS500 (Xitact, Morges, Switzerland) laparoscopy simulator comprises tasks such as dissection, clip application and tissue separation, the integration of which can produce a procedural trainer. It differs from the MIST-VR and LapSim in that it incorporates a physical object, the “virtual abdomen”, with force feedback. Other newer simulators include the ProMIS Surgical Simulator (Haptica, Dublin, Ireland) and LapMentorTM (Simbionix, Cleveland, OH). The MIST-VR simulator has tasks that are abstract in nature, enabling the acquisition of psychomotor skill rather than cognitive knowledge. This enables the simulator to be used in a multi-disciplinary manner to teach the basic skills required for all forms of minimally invasive surgery. However, newer simulators
have augmented their basic skills programmes to incorporate parts of real procedures, allowing trainees to learn techniques they would use in the operating theatre. For example, the LapSim has a module for dissection of Calot’s triangle, and the most recently launched LapMentor simulator enables the trainee to perform a complete laparoscopic cholecystectomy with the benefit of force feedback. Although the task-based simulators are more advanced in terms of software, they are bulkier and more expensive. Using these simulators, trainees can practise standardised laparoscopic tasks repeatedly, with instant objective feedback of performance. The simulators are portable, use standard computer equipment and are available commercially. With graded exercises at different skill levels, they can be used as the basis for a structured training programme. The feedback obtained also enables comparisons to be made between training sessions and trainees. Studies to confirm the role of virtual reality simulators as assessment devices have concentrated on the demonstration of construct validity, with experienced surgeons completing the tasks on the MIST-VR significantly faster, with lower error rates and greater economy of movement scores. A direct comparison of performance is possible as all surgeons complete exactly the same task, without the effects of patient variability or disease severity. The tasks can be carried out at any time and further processing is not required
122
to produce a test score. This can lead to the development of criterion scores that have to be achieved before operating on real patients. At the American College of Surgeons’ meeting in 2001, Gallagher et al. described the performance of 210 experienced laparoscopic surgeons on two trials of the MIST-VR [13]. The aim was to benchmark performance of these surgeons to confirm future use as an assessment tool. The results revealed marked variability in the scores obtained, together with a significant learning effect between trials. To use such data for highstakes assessments, perhaps a pool of expert scores from all centres currently using virtual reality simulation might lead to the development of an international benchmark for trainee surgeons. Furthermore, as some trainees take longer to achieve pre-defined levels of proficiency than others, this may enable particularly gifted trainees to be fast-tracked into an advanced laparoscopic programme and the true development of a competency, rather than a time-based curriculum.
R. Aggarwal and L. A. Darzi
teamwork, communication, judgement, and leadership underpinning the development of surgical competence [14]. High reliability organisations such as aviation, the military and nuclear industries have noted the importance of a wide variety of factors in the development of a favourable outcome. These include ergonomic factors, such as the quality of interface design, team coordination and leadership, organisational culture, and quality of decision making. In a surgical context, the application of a systems approach can lead to the identification of possible sources of error, which are not immediately apparent. These may include the use of inappropriately designed instruments, an untrained team member, repeated interruptions by ward staff or a tired surgeon. The development of a human factors approach has led to safer performance in industry, and it is important to address these issues in the operating theatre.
10.11 A Systems Approach to Surgical Safety 10.9 Comparison of Assessment Tools Currently there is no consensus regarding the optimal assessment tool for laparoscopic procedures, and perhaps video-based and dexterity systems should be used in conjunction. The authors’ department has recently developed new software to enable the ICSAD trace to be viewed together with a video of the procedure, leading to a dexterity-based video analysis system. This still requires an investment of time to assess the procedure on a rating scale, but it may be possible to identify areas of poor dexterity and to concentrate on videobased assessment of these areas alone.
The systems approach in understanding the surgical process and outcomes has important implications for error reduction. This approach accepts that humans are fallible and errors are to be expected, even in the best organisations. Counter measures are based upon the building of defences to trap errors, and mitigation of their effects should one occur. This consists of altering the attitudes between different individuals and modifying the behavioural norms that have been established in these work settings. An example of this is the specification of theatre lists for training junior surgeons, ensuring that fewer cases are booked, thereby reducing the pressure on both the surgeon and the rest of the team to complete all procedures in the allocated time.
10.10 Beyond Technical Skill Traditionally, measures of performance in the operating theatre have concentrated on assessing the skill of the surgeon alone, and more specifically, technical proficiency. This has been done by assessment of time taken to complete the operation, and more recently, with the use of rating scales and motion analysis systems. However, technical ability is only one of the skills required to perform a successful operation, with
10.12 The Simulated Operating Theatre At our centre, we have developed a simulated operating theatre to pilot comprehensive training and assessment for surgical specialists (Fig. 10.3). This consists of a replicated operating theatre environment and an adjacent control room, separated by one-way viewing glass. In
10 Measurement of Surgical Performance for Delivery of a Competency-Based Training Curriculum
Fig. 10.3 The simulated operating theatre
the operating theatre is a standard operating table, diathermy and suction machines, trolleys containing suture equipment and surgical instruments and operating room lights. A moderate fidelity anaesthetic simulator (SimMan, Laerdl) consists of a mannequin which lies on the operating table and is controlled by a desktop computer in the control room. This enables the creation of a number of scenarios such as laryngospasm, hypoxia, and cardiac arrhythmias. A further trolley is available, containing standard anaesthetic equipment, tubes and drugs. The complete surgical team is present, consisting of an anaesthetist, anaesthetic nurse, primary surgeon, surgeon’s assistant, scrub nurse and circulating nurse. Interactions between these individuals are recorded using four ceiling mounted cameras and unobtrusively placed microphones. The multiple streams of audio and video data, together with the trace on the anaesthetic monitor, are fed into a clinical data recording (CDR) device. This enables those present in the control room to view the data in real time and for recordings to be made for debriefing sessions. In 1975, Spencer remarked that a skilfully performed operation is 75% decision making and only 25% dexterity. Decision-making and other non-technical skills are not formally taught in the surgical curriculum, but are acquired over time. In an analogous manner, it should be possible to use the simulated operating theatre environment to train and assess performance of surgeons at skills such as team interaction and communication. This situation will also allow surgeons to benefit from feedback, by understanding the nature and effect of their mistakes, and learn from them. In a preliminary study, 25 surgeons of varying grades completed part of a standard varicose vein
123
operation on a synthetic model (Limbs & Things, Bristol), which was placed over the right groin of the anaesthetic simulator [15]. The complete surgical team was present, the mannequin draped as for a real procedure, and standard surgical ins truments available to the operating surgeon. Video-based, blinded assessment of technical skills discriminated between surgeons according to experience, though their team skills measured by two expert observers on a global rating scale failed to show any similar differences. Many subjects did not achieve competency levels for pre-procedure preparation (90%), vigilance (56%), team interaction (27%) and communication (24%). Furthermore, only two trainees positioned the patient pre-operatively, and none waited for a swab/instrument check prior to closure. Feedback responses from the participants were good, with 90% of them agreeing that the simulation was a realistic representation of an operating theatre, and 88% advocating this as a good environment for training in team skills. The greatest benefit of simulator training in aviation and anaesthetics has been for training during crisis scenarios. The varicose vein procedure was subsequently modified to include a bleeding scenario – a 5 mm incision was made in the “femoral vein” of the model and connected to a tube which was, in turn, connected to a drip bag containing simulated blood. This was controlled with a three-way tap. A further group of 10 junior and 10 senior surgical trainees were recruited to the study [16]. The simulation was run as before except that at a standardised point, the tap was opened. The trainee’s technical ability to control the bleeding together with their team skills were assessed in a blinded manner by three surgeons and one human factors expert. Once again, seniors scored higher than juniors for technical skills, though there were no differences in human factors skills such as time taken to inform the team of the crisis. A majority of the participants found the model, simulated operating theatre and bleeding scenario to be realistic, with over 80% of them considering the crisis to be suitable for assessment and training of both technical and team skills. These studies have demonstrated the face validity of a novel crisis simulation in the simulated operating theatre, and describe how they can be used to assess the technical and non-technical performance of surgical trainees. Recent work has also introduced the notion of a ceiling effect in technical skills performance, with there
124
being little difference between the performance of senior trainees and consultants on bench top models. This may be due to the limited sensitivity of the tools used to assess technical skill, to the fact that most senior trainees have acquired the necessary technical skills required to operate competently and that progression to expert performance is then dependant upon non-technical skills such as decision making, knowledge and judgement.
10.13 Curriculum Development The aim of a surgical residency programme is to produce competent professionals, displaying the cognitive, technical and personal skills required to meet the needs of society. However, many surgeons are concerned that this will not be possible with the limitations placed upon work hours and the potential reduction in training opportunities by up to 50%. Furthermore, surgeons are also faced with increased public and political pressures to achieve defined levels of competence prior to independent practice. Solutions to the work hours and competency issues require the formalisation of training programmes, whereby training occurs within a pre-defined curriculum to teach the skills required. Assessment of skill is performed at regular intervals using reliable and valid measurement tools. Successful completion of the assessment enables the trainee to progress onto the next stage of the programme, though failure will necessitate repetition of the completed block of training. This describes the development of a competency-based curriculum, enabling surgeons to acquire skill in a logical, stepwise approach. Training at each step of the curriculum must begin in the skills laboratory, utilizing tools such as synthetic models, animal tissue and virtual reality simulation. Sessions should be geared to the level of the trainees, with adequate faculty available as trainers and mentors. Occurrence of these sessions should not be related solely to the schedules of senior surgeons, but must be aligned with educational theories of learning. Furthermore, these sessions must be an integral and compulsory part of the trainees’ timetable. Junior trainees should probably attend once a week, though more senior trainees may require fewer, more focused sessions. This should be coupled with adequate training time in the operating theatre preferably with the same
R. Aggarwal and L. A. Darzi
mentor as in the skills laboratory. Division of operating schedules primarily into service or training purposes can achieve a balance between the needs of the surgeon and the health authority. Built into the development of a staged curriculum is the objective assessment of competence, and present methods have concentrated purely upon technical skill. Video-based and dexterity measurement systems have been extensively validated in the surgical literature, although none are currently used as a routine assessment of surgical trainees. This is primarily because of time and cost limitations of the assessment tool, though virtual reality simulation has been shown to be a promising tool for instant, objective assessment of laparoscopic performance. This simulation has further benefits as assessment occurs on standardised models in a quiet and non-pressured environment. The development of a simulated operating theatre environment, analogous to those used by the military for training in conflict situations, can for the first time assess and train surgeons in a realistic environment. The presence of the entire surgical team can enable assessment of interpersonal skill, and the introduction of crisis scenarios leads to an evaluation of the surgeon’s knowledge and judgement. This should be integrated into the end of each training period, culminating in a crisis assessment to confirm the development of competence prior to progressing onto the next stage of the programme. This stepwise, competence-based curriculum can lead to the development of technically skilled surgeons, and it acknowledges the importance of life long learning. It refocuses the emphasis on what is learned as opposed to how many hours are spent in the hospital environment. It can also provide trainees with the choice of leaving the programme at an earlier stage in order to concentrate upon a generalist surgical practice, and also provide a purely service-orientated commitment to the health service. Surgeons are also able to take a career break, and on re-entry, must achieve competence prior to progressing further along their training pathway. The organisation of these surgical training programmes should occur as part of a national project, with agreed definitions of training schedules and competency assessments. It is imperative that the programmes are evidence-based and are regularly audited to ensure maintenance of standards of excellence. This will make the programmes accountable and strengthens the case for health care providers to resource them.
10 Measurement of Surgical Performance for Delivery of a Competency-Based Training Curriculum
10.14 Innovative Research for Surgical Skills Assessment In order to get beyond measurement of technical skills of the surgeon, described here are two promising technologies for the elucidation of higher level processes of surgical performance.
10.14.1 Eye-Tracking Technologies To get closer to the thought processes of the surgeon, eye tracking technologies have also been exploited in order to measure surgical skill [17]. Initial studies used remote eye-trackers of radiologists reading a CT scan of the thorax – the differences in scan patterns between junior and senior radiologists highlighted the possible application of such a tool during minimally invasive procedures. A comparison of tool and eye-tracking during a laparoscopic task reveals the need for inexperienced subjects to fixate onto the tip of the laparoscopic instrument, whereas experienced subjects fixated upon the target. The inference is that experienced subjects possess the hand-eye coordination to know the position of their tool without having to fixate their eyes on it, and can, thus, perform the task in a more efficient manner. A pilot study of eye-tracking during real laparoscopic cases provides further insight into surgical performance. During a laparoscopic cholecystectomy, fixation points were most stable for the principal surgical steps of dissection of Calot’s triangle and clip/cut of cystic structures. A signature pattern is produced which can be analysed and used to serve both as a teaching and assessment tool for all surgeons. The intention is to develop signature patterns during surgical manoeuvres which can enable the diagnosis of whether an individual is competent to perform a particular task or procedure.
10.14.2 Functional Neuro-Imaging Technologies Though eye and motion tracking technologies are promising tools for surgical skills assessment, it is believed that further information may be gleaned from within the processing capabilities of the surgeon – within their brains.
125
Functional neuro-imaging techniques have recently been used to enhance the understanding of motor skills acquisition for tasks such as piano playing, apple-coring and non-surgical knot tying. Studies have demonstrated that the areas of brain that are activated when performing a motor task vary according to the level of expertise. Furthermore, it appears that brain activation is dynamic and varies according to the phase of motor skills acquisition, the so-called “neuroplasticity”. For example, several studies have demonstrated that the prefrontal cortex plays a crucial role in early learning, since activation in this region wanes as performance becomes more automated. This has prompted investment in a NIRS (or near infra-red spectroscopy) functional neuro-imaging machine for the purposes of surgical skills training and assessment [18]. The system works by shining near infra-red light onto the scalp; after being absorbed and scattered by brain tissues inside the head, a signal is recorded based on a set of different scalp sites by using lowlight sensitive photo-detectors. The differential absorption spectra of oxygenated and de-oxygenated haemoglobin provide an indirect measure of cortical perfusion and can be related in real time to a surgical task. Preliminary studies to date have reproduced the results of piano players whereby experienced subjects exhibited decreased frontal cortical activation when compared to novices.
10.15 Conclusions The current climate of medical education and health care delivery is being transformed at a rapid rate. Gone are the days of residency programmes focused on unlimited hours, patient service and learning through osmosis. Dissatisfaction and drop-out rates of residents are also on the increase, together with profound changes in the demographics of student admissions to medical schools. There is a shift in the philosophy of medical education not only to incorporate lifestyles, but to deliver a competency-based system of achievement and progression. In August 2005, the Foundation Programme was introduced in the UK for all newly qualified doctors. The aim was to develop specific and focused learning objectives, with built-in demonstration of clinical competence before progression onto specialist or general practice training. Each trainee completes a number of assessments,
126
leading to the development of a portfolio of clinical performance [19]. But what then of the implications to workforce needs and planning for the health service? Current trends in general surgery workforce data support a future deficit of surgeons that can be explained by an ageing population with increased surgical needs, a growing number of outpatient surgical procedures and increasing sub-specialisation within the field of general surgery. Other interventional specialties share similar experiences. A competency-based training curriculum has the potential to cause significant disruption to managing workforce aspects of health care delivery. Workforce numbers depend on credentialing for key procedures, and a failure to achieve milestones leads to repetition of the training period. A shortfall of clinician(s) may ensue, and strategies to manage this potential deficit have not been identified. The medical profession is at a crossroad of how to apply competency-based training and subsequent practice. Within the airline industry, regular testing of individuals is carried out to ensure competence. Failure to achieve performance goals results in a period of retraining and subsequent revalidation of skills. This concept does not exist within health care; in the UK, individuals who are found to under-perform are suspended from practice and investigated by the General Medical Council (GMC). This process involves a 2-phased approach of data collection from the clinician’s workplace through audit of outcomes and interview of colleagues, followed by a 1-day structured evaluation of performance. Failure leads to termination of one’s registration with the GMC. It is not inconceivable that individuals could be retrained, and the analogy of a “boot camp” to redress the problem is an attractive idea. Though it is important to consider how to manage this shift in medical practice to deliver competent doctors, the capture and delivery of financial resources are integral concern to ensure the sustainability of competency-based training and practice. Training budgets within the UK have been reduced, and service delivery has taken centre stage. It is necessary to recognise the arena with regard to accountability, i.e. investment into a competency-based programme may lead to improvements in patient safety through maintenance and delivery of the highest standards of care. Doctors failing to meet these standards will no longer continue to practise regardless, perhaps with a reduction in morbidity and mortality and concomitant financial gain.
R. Aggarwal and L. A. Darzi
Nursing, public health and family medicine have sought to define competency-based models and their application. A German study found that implementation of a competency-based graduate training programme within a neurology department during a 1-year pilot resulted in motivation of learners and training as long as adequate resources were provided, together with a training system for the trainers. Though funding is important, it is the last point that is crucial to delivery, i.e. training schemes, workshops or courses to “train the trainers”. Not only does this lead to competency-based delivery of residency programmes, but it also fosters the development of incentives for lifelong learning and career growth. It is necessary for policy makers, medical educators, clinicians and credentialing bodies to develop a consensus on credentialing for the future of the medical profession. It must also be recognised that there is a knowledge gap, and research into tools that not only can objectively and accurately define performance but perhaps predict future performance, and must continue to be funded. The resources required are substantial both financially and in human terms, but now is the time to seize the opportunity to ensure that a competency-based delivery of health care will become both an achievable and successful endeavour [8].
References 1. Moorthy K, Munz Y, Sarker SK et al (2003) Objective assessment of technical skills in surgery. BMJ 327: 1032–1037 2. Gallagher AG, Ritter EM, Satava RM (2003) Fundamental principles of validation, and reliability: rigorous science for the assessment of surgical education and training. Surg Endosc 17:1525–1529 3. Kohn LT, Corrigan JM, Donaldson MS (2000) To err is human: building a safer health system. National Academy Press, Washington 4. Pellegrini CA (2002) Invited commentary: the ACGME “Outcomes Project”. American Council for Graduate Medical Education. Surgery 131:214–215 5. Dreyfus HL, Dreyfus SE (1986) Mind over machine. Free, New York 6. Satava RM, Cuschieri A, Hamdorf J (2003) Metrics for objective assessment. Surg Endosc 17:220–226 7. Society of American Gastrointestinal Surgeons (SAGES) (1991) Granting of privileges for laparoscopic general surgery. Am J Surg 161:324–325 8. Aggarwal R, Hance J, Darzi A (2004) Surgical education and training in the new millennium. Surg Endosc 18:1409–1410
10
Measurement of Surgical Performance for Delivery of a Competency-Based Training Curriculum
9. Aggarwal R, Grantcharov T, Moorthy K et al (2007) An evaluation of the feasibility, validity, and reliability of laparoscopic skills assessment in the operating room. Ann Surg 245:992–999 10. Francis NK, Hanna GB, Cuschieri A (2002) The performance of master surgeons on the advanced Dundee endoscopic psychomotor tester: contrast validity study. Arch Surg 137:841–844 11. Martin JA, Regehr G, Reznick R et al (1997) Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg 84:273–278 12. Eubanks TR, Clements RH, Pohl D et al (1999) An objective scoring system for laparoscopic cholecystectomy. J Am Coll Surg 189:566–574 13. Gallagher AG, Smith CD, Bowers SP et al (2003) Psychomotor skills assessment in practicing surgeons experienced in performing advanced laparoscopic procedures. J Am Coll Surg 197:479–488 14. Yule S, Flin R, Paterson-Brown S et al (2006) Non-technical skills for surgeons in the operating room: a review of the literature. Surgery 139:140–149
127
15. Moorthy K, Munz Y, Adams S et al (2005) A human factors analysis of technical and team skills among surgical trainees during procedural simulations in a simulated operating theatre. Ann Surg 242:631–639 16. Moorthy K, Munz Y, Forrest D et al (2006) Surgical crisis management skills training and assessment: a simulation[corrected]based approach to enhancing operating room performance. Ann Surg 244:139–147 17. Dempere-Marco L, Hu X-P, Yang G-Z (2003) Visual search in chest radiology: definition of reference anatomy for analysing visual search patterns. In: Proceedings of the Fourth Annual IEEE-EMBS Information Technology – Applications in Biomedicine 2003, ITAB 2003, Birmingham, UK 18. Leff DR, Aggarwal R, Deligani F et al (2006) Optical mapping of the frontal cortex during a surgical knot-tying task, a feasibility study. Lect Notes Comput Sci 4091: 140–147 19. Poole A (2003) The implications of Modernising Medical Careers for specialist registrars. BMJ 326:s194
Health-Related Quality of Life and its Measurement in Surgery – Concepts and Methods
11
Jane M. Blazeby
Abbreviations
Contents Abbreviations ..................................................................... 129 11.1
Introduction ............................................................ 129
11.2
What is Quality of Life? ........................................ 130
11.3
The Purpose of Measuring HRQL........................ 130
11.4
How to Measure HRQL ......................................... 131
11.4.1 Types of Instruments ................................................ 131 11.4.2 Developing and Validating HRQL Instruments ....... 132 11.5
Reporting Standards of HRQLin Randomized Controlled Trials and Other Research Settings.................................................... 134
11.5.1 11.5.2 11.5.3 11.5.4 11.5.5 11.5.6
Choosing a HRQL Instrument for the Research ...... Determining the HRQL Sample Size ....................... The Timing of HRQL Assessments ......................... Missing Data ............................................................ Dealing with Missing Data....................................... Analyses of HRQL Data ..........................................
11.6
The Future Role of HRQL in Evaluating Surgery .................................................................... 139
135 136 136 137 138 138
HRQL QOL PRO
Health-related quality of life Quality of life Patient reported outcome
Abstract Over the past decade, assessment of healthrelated quality of life (HRQL) has become a recognized end-point in randomized surgical trials and in other research settings. It is an important endpoint because HRQL captures the patients’ perspective of outcome and can be used to compliment clinical outcomes to influence decision making and inform consent for surgery. This chapter will consider the definition of HRQL, methods for developing HRQL tools, and using HRQL in outcomes research, and it will review the current and future research and clinical applications of HRQL assessment.
References ........................................................................... 139
11.1 Introduction
J. M. Blazeby University Hospitals Bristol, NHS Foundation Trust, Level 7, Bristol Royal Infirmary, Marlborough Street, Bristol BS2 8HW, UK e-mail: [email protected]
Over the past decade, assessment of health-related quality of life (HRQL) has become a recognized outcome in randomized surgical trials and in other research settings. It is an important endpoint because HRQL assessment captures the patients’ perspective of outcome. Information about the patients’ perception of the benefits or harms of surgery can be used alongside traditional surgical outcomes to provide a comprehensive treatment evaluation. This information can potentially be used during surgical decision making and informed consent. It is essential that HRQL is measured and reported accurately, and an enormous amount of effort has been invested in developing and validating HRQL instruments. More recently,
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_11, © Springer-Verlag Berlin Heidelberg 2010
129
130
research into the interpretation of HRQL outcomes has been published, and the influence of HRQL in clinical decision making is beginning to be understood. This chapter will consider the definition of HRQL, methods for developing HRQL tools, and using HRQL in outcomes research, and it will review the current and future research and clinical applications of HRQL assessment.
J. M.Blazeby
the terminology for QOL, HRQL, and PROs, it is recommended that in surgical research that include as assessment of QOL, a clear definition of the domains of interest is provided. In this chapter, HRQL refers to a multidimensional construct that is reported by the patient. In this chapter, the phrase HRQL will be used (Box 1 summarizes key definitions). Box 1
11.2 What is Quality of Life? There is currently no internationally agreed definition for the construct of quality of life (QOL). In layman’s terms, the phrase has an abstract notion of happiness or satisfaction that may be related to current or past events and the individual’s personal views of life. Within a scientific context, however, an assessment of QOL refers to the patients’ perception of their health, where health is defined as a multidimensional construct, with physical, social, and emotional components. The phrase “quality of life,” therefore, has the potential for confusion between a colloquial and scientific definition, and more recently, the term “health-related quality of life” (HRQL) is more widely used. HRQL, however, is also ill defined, but some fundamental evidence is accumulating to suggest that HRQL is a multidimensional construct that can include two or more domains of health. It is also generally accepted that a measure of HRQL is an individual’s perception of how the illness or treatment impacts more than one of these key domains. The focus on the importance of patients themselves reporting outcomes has now led to a new terminology. “Patient reported outcomes” (PROs) are outcomes that assess any aspect of a patient’s health that come directly from the patient, without the interpretation of the patient’s responses by a physician or anyone else [1]. Sometimes PRO measures may be confused with measures of QOL or HRQL. Although both refer to measurements made by the patients themselves, the broad definition of QOL means that it is multidimensional, where a PRO can focus much more narrowly, for example, on a single symptom such as pain. The key factor is to ask the patient themselves to rate the problem or symptom because observers are poor judges of patients’ views. Many studies have shown that observations made by doctors, nurses, carers, or research staff of patients’ QOL or HRQL differ from that reported by patients themselves. Observer assessments may be over or underestimates of the scores from patients, and there are some trends in these assessments for particular conditions. Because of the potential for confusion with
Key definitions
Quality of life (QOL) – any aspect of health, life satisfaction, or happiness Health-related quality of life (HRQL) – a multidimensional assessment of health Patient reported outcome (PRO) – a self-report of one or more aspects of health Health – physical, social, and emotional well-being Item – a question that addresses a single domain of HRQL/QOL/PRO Scale – two or more items that together assess a single domain of HRQL/QOL/PRO Global item – a single item that assesses QOL/ HRQL
11.3 The Purpose of Measuring HRQL Accurate study of the effectiveness and cost effectiveness of surgical interventions is essential to evaluate new procedures. Outcomes are of the greatest concern for patients who require information about operative mortality, morbidity, survival, and relief of symptoms. Systematic outcome assessment of surgical procedures may also be used to inform purchasers and providers of health care. It is, therefore, critical to use common data sets to prospectively audit key outcomes. Most routine health services provide only aggregate information regarding the frequency of outcome events with no information linked to risk, the use of services, or patientreported outcomes. Routine aggregate data are of some benefit, but may be misleading when coexisting diseases contribute to the outcome, and they are also limited in their perspective, focusing solely upon biomedical or economic endpoints. In addition to these standard outcomes, the importance of measuring patients’ views has recently been recognized because they capture details that may have been overlooked and take a holistic view of the impact of treatment on psychosocial health as well as physical well-being. Although there has been a growing interest in assessing HRQL in clinical trials, there has
11 Health-Related Quality of Life and its Measurement in Surgery – Concepts and Methods
also been some evidence to suggest that the information does not influence clinical decision making. Two systematic reviews have analyzed how randomized trials in breast and prostate cancer include HRQL outcomes when reaching the conclusion of the trial [6,8]. Overall, in only a small proportion of trials did the HRQL outcomes influence treatment recommendations. Another systematic review of trials of surgical oncology, however, demonstrated that HRQL outcomes influenced treatment recommendations, and the authors of the surgical trials stated how important HRQL outcomes were for fully informed consent [5]. It is possible that HRQL is a particularly relevant outcome in surgical decision making because surgery has an immediate impact on HQRL that is irreversible. Surgery also has long-term impacts on generic and disease-specific aspects of HRQL. Providing accurate information for patients undergoing surgery about the expected impact on HRQL as well as morbidity, mortality, and functional outcomes is critical for fully informed consent. This is an area where surgeons require training to ensure that the information is communicated, and currently, best methods for doing this have not been established. Box 2 summarizes key reasons for the evaluation of surgical procedures with patient-reported outcomes alongside traditional clinical endpoints. Box 2 Key reasons for evaluating surgical procedures with measures of health-related quality of life to compliment standard outcomes 1. A detailed assessment of the disease course or treatment side effects that is unavailable from the review of other endpoints 2. An assessment of the patients’ perception of outcome which frequently differs from health professionals’ judgments of the patients’ opinion 3. Patient-reported outcomes may influence surgical decision making where benefits of surgery are marginal and possible risks and negative impact on HRQL severe 4. Patient-reported outcomes may influence surgical decision making where there are nonsurgical treatments of equivalent clinical value, but with a different impact profile on HRQL 5. Baseline self-reported health data may have prognostic value 6. Patient self-report health data may inform economic analyses
131
11.4 How to Measure HRQL 11.4.1 Types of Instruments Over the past few decades, many instruments for measuring HRQL have been developed. These are mostly paper-based because of the prohibitive costs of undertaking individual patient interviews with detailed qualitative analyses. Questionnaires are composed of items and scales. Items are single questions that address a single aspect of HRQL, whereas scales are two or more items that together address a single domain of HRQL (e.g., pain scale). Although single items may address a specific symptom or problem, some single items address global concepts. For example, the single global question, “Overall, what would you say your QOL has been like during the past week?” Global questions are interesting because their simplicity is attractive, but many aspects of life, not just health issues, influence individuals’ views of QOL. This means that within the context of evaluating healthcare, global items can be difficult to interpret because of different and often competing aspects of life that impact overall QOL. For clinical practice, therefore, where specific domains of health are treated and are deliberately being investigated, it is recommended to use a multidimensional assessment of HRQL that addresses key components of physical and psychosocial health. Questionnaires that are used to assess HRQL require careful development and full clinical and psychometric validation testing to be certain that they are fit for the purpose for which they are designed. Although many questionnaires exist, the quality of their supporting evidence is variable, and while choosing an instrument, this needs to be considered. Currently, the traditional methods for developing questionnaires are progressing and modern techniques including computer adaptive testing and item response theory are likely to provide more precise measures of HRQL [7,9]. This section will describe standard types of HRQL measures and consider how modern test theory will impact HRQL assessment.
11.4.1.1 Generic Instruments Generic measures of HRQL are intended for general use within healthy populations or any general disease group. They all include an assessment of physical
132
health and a selection of other key domains of HRQL. The Medical Outcomes Study 36-Item Short Form (SF-36) is probably the most widely used generic measure of HRQL, and the EuroQol is also influential because of its suitability for cost-utility analyses [4]. Within surgical studies, these types of measures are often not suitable to detect specific symptoms or functional consequences of an intervention because they do not contain scales and items addressing these problems. If they are used alone to evaluate surgery, it is therefore possible that the results may be misleading because they are not sufficiently sensitive to small beneficial or detrimental consequences of surgery. These measures are, however, very useful for making comparisons between disease groups.
J. M.Blazeby
disease-specific system with a core tool for patients with chronic illness and supplementary modules [3]. Although widely used in cancer, noncancer-specific FACIT questionnaires are also available. The FACT-G originally designed for patients with cancer has 27 items addressing four HRQL domains, physical, social, emotional, and functional well-being. It has four similar response categories to the EORTC QLQ-C30 and additional “somewhat” response. Some items are phrased positively and some negatively. The scoring may be performed for four scales or an overall summary score derived by summing item responses. The items refer to how patients have been feeling in the past week. This questionnaire has very high quality supporting clinical and psychometric data. Specific modules to improve the sensitivity and specificity of the EORTC QLQ-C30 or FACIT systems have been developed and tested for most cancer sites.
11.4.1.2 Disease-Specific Instruments Disease-specific measures focus upon HRQL problems that are relevant to specific disease groups or disease sites. For example, disease-specific measures for patients with cancer have been developed. Some of these are designed with a modular approach, with a core disease-specific tool that is relevant to all patients within that disease group (e.g., cancer), and site or treatment-specific add-on modules that supplement the core measure (e.g., breast cancer specific). Site-specific questionnaires address particular symptoms or morbidity of treatment. In surgical oncology, where HRQL assessment is an increasingly important outcome, there are several disease-specific tools that are widely available. The EORTC QLQ-C30 is a cancer-specific 30-item questionnaire. It incorporates five functional scales (physical, emotional, social, role, and cognitive), a global health scale, and nine single items assessing symptoms that commonly occur in patients with cancer [2]. All items in the questionnaire have four response categories, excepting for the global health scale that uses a seven-point item response ranging from very poor to excellent. High scores represent higher response levels, with high functional scale scores representing better function and higher symptom scores representing more or worse symptoms. It is available in over 50 languages and widely validated and used in international clinical trials in oncology. The functional assessment of chronic illness therapy (FACIT) measurement system is a similar modular
11.4.2 Developing and Validating HRQL Instruments Developing measures to assess HRQL requires attention to detail multidisciplinary expertise and resources. Investing time during the early phases of questionnaire development will help to avoid problems during psychometric testing and ensure that the tool is comprehensive and that it will be able to assess HRQL in the population for which it is intended. Careful documentation of the development of a new measure will also provide the evidence that is needed to demonstrate that the process is robust and based upon patients’ views and opinions, rather than those of health professionals. There are several well-described phases of questionnaire development, and it is essential to follow them sequentially.
11.4.2.1 Literature Search Before creating a new HRQL measure, it is important to have a working definition of HRQL for the intended research question. This will identify the dimensions of HRQL to be included in the new tool. It is necessary to consider whether the new tool will assess generic aspects of HRQL or particular problems related to the disease
11
Health-Related Quality of Life and its Measurement in Surgery – Concepts and Methods
and treatment. A detailed literature search in relevant medical and psychosocial databases will identify existing instruments or scales that address relevant HRQL issues. Following the literature search, it will be possible to generate a list of potential HRQL issues to be considered for inclusion in the instrument.
11.4.2.2 Selection of HRQL Domains in Instrument After compilation of the initial list of HRQL issues, expert review is required to check for completeness and face validity. It is essential to include a multiprofessional group to do this including specialists, generalists, doctors, nurses, and other health professionals. The HRQL issues need to be described as succinctly as possible, with minimum overlap in content. Rare or unusual HRQL issues that are sometimes experienced by patients should be retained at this stage. Patients themselves should also review the content of the HRQL list of issues. Although traditionally reviews by patients were informal and not documented in detail, there is an increasing need to formally undertake in-depth qualitative tape recorded interviews with patients to consider questionnaire content, because of the increasing pressure to document that HRQL measures are patient generated. A broad range of patients should be interviewed, patients with different categories of disease severity and patients who have experienced the range of potential treatments for the disease of interest, as well as purposefully sampling patients of a mixed sociodemographic background. Following this comprehensive assessment of the literature, health professionals, and specific patients, a list of HRQL issues to be included in the questionnaire will be complete. These need to be transformed into specific items (questions) using standard questionnaire guidance.
11.4.2.3 Writing Items and Scales Questions should be in comprehensive language (brief and simple), and each question should assess just one HRQL issue. It is a common error to attempt to assess two dimensions of HRQL in one item, e.g., Have you had nausea and vomiting? This question may confuse respondents who suffer severe nausea, but do not actually vomit. It is also recommended to avoid double
133
negatives because of the potential of misleading respondents. At this phase of questionnaire development, the response format for the questionnaire is agreed upon. It is generally recommended to avoid binary Yes/No responses, but to use an ordinal scale where the patient can respond between absent or severe scores, not at all and very much. Most questionnaires assign integers to each response category so that the question can be scored. The layout and format of the questionnaire, using large font, underlining, and bold test, will help draw patients’ attention to particular questions or instructions. It is essential that colleagues and collaborators with experience of designing and developing questionnaires review the provisional tool as well as the multiprofessional group involved in selecting HRQL issues.
11.4.2.4 Pretesting the Provisional HRQL Instrument Pretesting the provisional HRQL questionnaire is undertaken to ensure that the target population understands the newly created questions. This essential phase of questionnaire development also checks that the wording and formatting of the tool are straightforward and not confusing or offensive. It involves approximately 10–20 patients or represent a range of the target population. If problems with ambiguous questions or difficult phrasing are identified, the revised items also require repeat pretesting. In this phase of questionnaire development, patients complete the whole tool and are subsequently interviewed to consider each item separately. This phase also allows some general questions about the whole questionnaire to be asked, such as are there any missing questions that are relevant to you and how much time or assistance was required to complete the questionnaire.
11.4.2.5 Clinical and Psychometric Validation of HRQL Instruments It is essential that an instrument to measure HRQL has good measurement properties. Good measurement properties of an instrument include validity and reliable data to demonstrate that the instrument may produce results that are reproducible, sensitive, and responsive to changes in clinically relevant aspects of
134
HRQL. It is traditional to validate a new measurement tool by comparing its output with those produced by a gold standard instrument. There is, however, no gold standard measure of HRQL. Validity is, therefore, inferred for each tool by compilation of several pieces of evidence. There are three main areas that need to be considered, content, criterion, and construct validity. Content validity concerns the extent to which the HRQL items address the intended HRQL areas of interest. Evidence to show that the instrument fulfils this aspect of validity is collected during the above early phases of questionnaire development. Criterion validity considers whether the new questionnaire shows anticipated associations with external criteria assessed with already validated tools (e.g., the HRQL scale addressing pain is related to other pain measures or requirement for pain relief). Construct validity examines the theoretical associations between items in the questionnaire with each other and with the hypothesized scales. This is examined by investigating both expected convergent and divergent associations. For example, a scale assessing fatigue would be expected to have convergent associations with physical function, but little correlation with some other aspects of health (e.g., taste). The reliability of a measurement tool concerns the random variability associated with each measurement. Where the HRQL of a patient is stable between two time points, then it is expected that HRQL scores will be similar on both occasions (not subject to random error). Becoming familiar with the reliability of a tool is essential for its use in everyday clinical practice. A measurement of hemoglobin may be 12 or 13 g/dL on two separate occasions 1 week apart. Provided the patient does not show any evidence of blood loss, these two measures would not be of concern, because it is accepted that the reliability of a full blood count is within these boundaries. This type of reliability for HRQL questionnaires is formally tested with test–retest methodology. Patients complete the HRQL measure on two separate occasions when their health is stable. The correlation between the two measures is examined, and it is expected to be high. Interrater reliability examines the agreement between two observers assessment of HRQL. Since patients themselves are usually regarded as the best assessor of their health, the interrater reliability may only need to be examined in situations where patients themselves are unable to self-report, and it is essential to
J. M.Blazeby
use a proxy to assess HRQL (e.g., severe neurological conditions). A sensitive measure of HRQL will be able to detect HRQL differences between expected clinically different groups of patients, and testing this construct, this may be referred to as testing known group comparisons. For example, patients with metastatic advanced cancer are expected to report worse HRQL than patients with localized disease. Responsiveness is similar, but is related to the ability of an instrument to detect improvement or deterioration within an individual. All these aspects of instrument clinical and psychometric validation are important and the process of demonstrating these features of HRQL tools is continuous. There are currently no internationally agreed standards of the minimum amount of evidence to prove that a tool is valid and reliable, rather it is a cumulative process. Indeed, after the initial validation and publication of a new tool, it is very important for independent groups to further test the measurement properties of the tool and to produce data to further support or refute the validity and reliability of the tool. Internal consistency is the other commonly used method to test reliability of a tool. This refers to the extent to which items are related. Cronbach’s coefficient is the most widely used method for this purpose.
11.5 Reporting Standards of HRQL in Randomized Controlled Trials and Other Research Settings While the use of HRQL instruments in clinical research has continued to increase steadily over recent years, many are still characterized by inadequate reporting. This probably reflects inadequate study design, and both robust design and high quality reporting are required. The following section provides guidance on these issues that need consideration when including an assessment of HRQL in a clinical trial or another type of research study. Robust HRQL study design and detailed reporting will allow peer reviewers and subsequent readers of the research to assess the validity of the results and to reproduce the methods if desired. A summary of this process is summarized in Box 3.
11
Health-Related Quality of Life and its Measurement in Surgery – Concepts and Methods
135
Box 3 Key issues to consider when assessing HRQL in a randomized trial, longitudinal, or cross sectional study Study objective
HRQL questionnaire
Study population
Time points
Data analyses
Data collection
Practical issues
Is HRQL a primary or secondary endpoint? Which aspects of HRQL are of particular interest? At which time points will HRQL change? Does it have relevant clinical and treatment-related HRQL domains? Are the response categories appropriate to the study question? Does it have documented validity and reliability? Is it sensitive to expected HRQL changes? Has it been used before in this patient population? (did it work?) How long will it take to complete? Availability of translations (if applicable) Is it completed by the patient themselves? Socio-demographic details (level of education, gender, age, native language, and cultural variation) Clinical details (performance status, level of anxiety, disability) Select timing of assessments of capture relevant HRQL changes Review the time frame of the questionnaire Minimize frequency of assessments to reduce respondent burden Minimize frequency of assessments to simply data analyses Define HRQL hypotheses How is the questionnaire scored? How are changes in HRQL interpreted (minimally important difference, clinical relevance)? Analyses plan (e.g., to account for multiple assessments) Dealing with missing data for random or nonrandom reasons Mode of administration – self-completion, clinician-completion, face-to-face interview, or telephone Clear practice to follow in the event of missing assessments Clear practice to follow in the event of missing items/pages Document reasons for missing data Cost (license, printing, postage, training of personnel) Check the latest version of the questionnaire If a battery of measures, consider the order of the questionnaires Check compliance and plan measures to improve, if necessary
11.5.1 Choosing a HRQL Instrument for the Research The appropriate and adequate assessment of HRQL within a trial or another research setting depends on clearly defining the objectives and endpoints of the project. It is essential to initially decide whether HRQL
is the primary or secondary endpoint of the study. This will help select a suitable instrument to assess HRQL, in order to match the specific characteristics of the questionnaire to the objectives of the trial. The choice of instrument will also depend upon the size and sociodemographic features of the study population. Level of education, gender, culture, native language, and age range are important factors that will determine whether
136
questionnaires are completed and completed accurately. The specific nature of the anticipated effects and the expected time for them to be exerted will influence the type of HRQL measure selected and the timing and frequency of its administration. A measure should be selected that is sensitive to capture both the positive and negative effects of the intervention, as the side effects of treatments (such as postsurgical pain or fatigue) may outweigh potential benefits to HRQL (such as symptom alleviation) perceived by patients. An understanding of whether these effects are transient or long-lasting will further influence the timing of the HRQL assessments and how assessments from different time points should be interpreted. Consideration of the complexity and nature of a questionnaire’s scoring system as part of instrument selection will help decide whether HRQL data can be used to address the trial objectives. Overall summary scores may be obtained with some HRQL instruments, which have the attraction of producing a single score that can be used to compare different populations and patient groups. Overall scores, however, may fail to identify where interventions lead to improvements of one aspect of HRQL, but deterioration in another. It is, therefore, recommended that multidimensional questionnaires with relevant symptoms and functional scales are used to provide HRQL data to inform treatment decisions. Finally, there are practical issues to consider during instrument selection: the size; timing, length, and frequency of measurement; mode of administration; availability of translations; cost of the questionnaire; and other resource considerations (postal survey vs. clinical setting, self- vs. interviewer-completion). As considered above, standardization of the administration process is critical to ensure unbiased data collection. Many HRQL questionnaires have not been validated outside their original country of origin, and ensuring that high quality translations are available in target languages is, therefore, essential for an international study.
11.5.2 Determining the HRQL Sample Size Determining the number of participants required is an integral part of any clinical trial, and this provides evidence to justify the size of the study and to confirm that it is powerful enough to answer the questions it is
J. M.Blazeby
designed to address. It is wasteful to recruit more participants than necessary, but underrecruitment, a more common occurrence in clinical trials, is bad practice because the energy that has been invested in the study is wasted if the study is insufficiently powerful to demonstrate a clinically important effect and to answer the questions it was designed to address. All these issues that apply to clinical trials also apply to incorporating HRQL in a trial. If HRQL is the primary trial endpoint, it is essential to adhere to the above standards. In trials where HRQL is a secondary endpoint, however, it is uncommon for sample size calculations to be performed for the HRQL outcomes, and the size of the trial is instead dictated by the primary endpoint. One of the difficulties in undertaking a HRQL sample size calculation occurs because there is little HRQL evidence on which to precisely estimate the effect size, and thus, it is common practice to consider a range of expected benefits and possible sample sizes and assess their adequacy to power the study. Different sample size methods may lead to different estimates, but the final approach should be chosen based on relevance to the trial objectives, including the type and number of HRQL endpoints, the proposed analyses, and available information on the underlying assumptions of expected benefit. Sample size calculations must also take into account possible nonresponse and loss to follow-up and control for confounding and examination of subgroup effects. While HRQL trials may include numerous HRQL outcomes, it is recommended that a maximum of four or five be included in formal analyses, taking into account the effect of multiple significance testing.
11.5.3 The Timing of HRQL Assessments Exactly when HRQL should be measured is a crucial part of the trial design in terms of gathering data at relevant time points to enable the objectives of the trial to be adequately addressed. A mandatory baseline assessment prior to randomization and the start of treatment is recommended for all randomized trials assessing HRQL, in order that changes due to treatment and disease status can be measured and to check whether there is equivalence of baseline characteristics in the treatment and comparison groups. There is also evidence to indicate that baseline HRQL may be a valuable prognostic marker for clinical outcomes, such
11
Health-Related Quality of Life and its Measurement in Surgery – Concepts and Methods
as survival or response to treatment. In study designs evaluating surgical procedures, a pretreatment assessment of HRQL is also recommended to provide baseline data. Capturing HRQL before treatment can be difficult, but it is essential because it is extremely difficult for patients who have undergone surgery to look back to reflect about their symptoms and functional ability before the operation. In a cross-sectional study, it is not possible or desirable to capture baseline (before start of treatment) HRQL data. Choosing the time points for follow-up of HRQL will depend upon the research question, the resources and the natural history of the disease, and likely side effects of treatment. These assessments can be (i) timebased, involving administration for a specific number of days/weeks after randomization, regardless of the treatment schedules, or (ii) event-based, dependent on specific treatment cycles, or serious or acute effects. A combination of both approaches may be suitable in some cases and allow treatment delays to be taken into account. The relevance of timings should be carefully considered – different timings can lead to different results. Other issues to consider are the time scale of the questionnaire being used (e.g., current status versus recall of symptoms and HRQL in the past week), frequency of assessments, and the accessibility of patients at various time points (e.g., assessments during clinic visits will be limited to the timings of the appointments). Posttreatment assessments should be timed according to the research hypotheses and whether or not HRQL was specified as a primary endpoint, although these should be undertaken at equal times in each arm with respect to randomization rather than the end of treatment to avoid bias. The practicalities of obtaining HRQL assessments up until death and, if necessary, at relapse should also be considered, particularly if patients are to be withdrawn from the study at relapse. Although it is theoretically tempting to obtain as many HRQL assessments as possible, they should be kept to a minimum to avoid overburdening patients (and thus increase compliance) and simplify data collection and analyses.
11.5.4 Missing Data Difficulties with questionnaire compliance are commonplace in many studies, which is partly related to investigations in patients with a poor survival prognosis
137
or illness. Although compliance with questionnaire completion is as high as possible and over 80%, it is critical to answer any research question and to avoid response bias. A number of factors can be addressed to improve compliance and reduce missing data. Missing data may take two main forms: (i) item nonresponse, where data are missing for one or more individual items, or (ii) unit nonresponse, where the whole questionnaire is missing for a participant, and due to missing forms, the participant is dropping out of the study or entering it late. Causes of missing data include poor study design (unclear timing of assessment and poor choice of questionnaire), administrative factors (failure of research staff to administer questionnaires), and patient-based factors (deteriorating health and refusal to complete questions). Consideration of the feasibility of the trial design, attention to protocol development, training of staff, and providing adequate resources may address organizational problems and uniformity of protocol and HRQL assessment across participating centers, and regular communication between the study coordinator, local investigators, the data manager, and those responsible for administration of the questionnaires is essential. A pilot study may be beneficial in improving organization and preparing for unforeseen difficulties. Patient-based sources of missing data may be tackled by choosing an appropriate instrument and attention to participants’ motivation by providing a clear explanation of the reasons of why, when, and how HRQL assessments will be made, whom they may contact for help, and what will happen to the data they provide (an information sheet which details confidentiality and data dissemination is typically required by most ethics committees). Proxy completion by a carer, clinician, or anybody significant may be considered in the event of missing data due to participants’ inability to complete the questionnaire. This should be considered prior to the start of the trial, if it is anticipated that this might be a significant problem. It should be noted, however, that there is evidence to suggest differing levels of concordance between patients and proxies according to the dimension being assessed, which may introduce an element of bias into the HRQL assessments. While it is possible to prevent much missing data, a certain amount should be expected within any study. The protocol should contain clear instructions on what procedures to follow in the event of a missing questionnaire or missing data for individual items, such as whether the participant should be contacted. In all
138
cases, it is good practice to maintain a record of the reasons for missing data, in order to ascertain the extent to which this was related to the patient’s HRQL and to inform the analyses and interpretation of the data. The relative amount of missing data, the assumed cause, the sample size, and the type of intended analyses will determine the degree to which missing data are a problem, and it will critically inform the interpretation of the results of the study.
11.5.5 Dealing with Missing Data Poor compliance resulting in missing data can have a significant detrimental impact on the analyses and interpretation of HRQL data. Firstly, fewer observations may result in the power of the study to detect an effect being compromised. Secondly, missing data may introduce selection bias into a trial, and thus, compromise its validity, particularly if low compliance is associated with less well patients who have a poorer HRQL, as many studies have shown to be the case. It is, therefore, important that the impact of missing data is carefully addressed, and that the potential cause of the missing data is understood, as the most appropriate method for dealing with it will depend largely on the assumed mechanism by which it is missing: if the reasons for missing data are completely unrelated to the respondent’s HRQL, it is classified missing completely at random. If the likelihood of missing data only depends on previous scores but not current or future scores, it is considered missing at random. Data are only considered not missing at random if the “missingness” is actually related to the value of the missing observation. The relevant approach for dealing with each type of missing data varies depending on whether potential bias due to data not missing at random needs to be addressed. Methods are available to test the assumption of whether or not data are missing at random. Having a record of the reasons for missing data in a trial is particularly important here. Missing values for individual items may be imputed (filled in) to complete the data in order that a full analysis may be undertaken. Most commonly, the mean of answered questions is imputed, provided that at least half the items are completed, although this may not be suitable for scales whose items are ordered hierarchically according to difficulty. Other less common
J. M.Blazeby
approaches include imputation based on the responses of other respondents or regression of missing items based on other scale items. Some instruments are provided with instructions on how to score the question when items are missing. Methods for dealing with missing whole questionnaires/assessments in longitudinal (repeated measures) studies when missing data are assumed to be ignorable include complete case analysis, which involves removing patients with incomplete data, and available case analysis, which uses all available cases on a specific variable and is subsequently more desirable. Alternatively, data can be filled in using imputation-based methods such as last observation carried forward, single mean imputation, predicted value imputation, or hot-deck imputation. Statistical techniques such as likelihood-based methods have been developed for nonignorable missing data, but they are complex and controversial and should, therefore, be applied with caution. More sophisticated models have been developed specifically for nonignorable missing data, which is likely in HRQL trials, but they are limited by their complexity and lack of availability and interpretability. They are often accompanied by sensitivity analyses, which are used to compare the appropriateness of employing a given strategy.
11.5.6 Analyses of HRQL Data The analyses of HRQL data require appropriate expertise that may be difficult to find because of the specific complexities in dealing with multiple HRQL assessments and missing data as described above. Many of these issues can be overcome by carefully identifying a priori one or two HRQL outcomes of principal interest and by working with a statistician from the outset of the study. If the research question and study design are clearly stated in the protocol and the HRQL hypothesis also established, analysis of HRQL data should be possible and overexploring and interpreting data avoided. Using descriptive statistics to illustrate the impact of surgery on HRQL has an important role for the clinical interpretation of HRQL scores. At present, there is still a lack of general understanding about the impact of most surgical treatments on HRQL, and using graphical methods to illustrate these changes will aid patients and surgeons. It is not within the scope
11 Health-Related Quality of Life and its Measurement in Surgery – Concepts and Methods
of this article to describe the details of all issues for reporting HRQL results, but a multidisciplinary approach to writing the protocol and analyzing and interpreting results with surgeons, statisticians, and social scientists should achieve outputs that are accurate and understood by the relevant audience.
11.6 The Future Role of HRQL in Evaluating Surgery Over the past two decades, methods for provision of accurate patient-centered outcome data have become established for most surgical settings. Standards for using these tools in randomized trials, longitudinal series, and cross-sectional studies are also being established. The time for evaluation of surgery with a combination of traditional clinical and biomedical outcomes and patientreported outcome data is, however, only just being realized, and many clinicians are unfamiliar with standard instruments to assess HRQL, questionnaire scoring systems, and the clinical interpretation of the results. It is also uncertain how to communicate HRQL data to patients themselves and whether this type of information will influence surgical decision making. Further work is needed in each of these areas, and ensuring that an organic collaboration between surgeons, statisticians, social scientists, and patients themselves is achieved will ensure that patient-centered outcomes will be appropriately
139
incorporated into surgical research and their role in everyday clinical practice will become clear.
References 1. Patient-reported outcome measures: use in medical product development to support labeling claims draft guidance. Available from http://www.fda.gov/cber/gdlns/prolbl.htm 2. Website for EORTC quality of life questionnaires and manuals. Available from http://www.eortc.be/QOL 3. Website for FACIT quality of life questionnaires and manuals. Available from http://www.facit.org 4. Website for SF36 quality of life questionnaires and manuals. Available from http://www.sf-36.org 5. Blazeby JM, Avery K, Sprangers M et al (2006) Healthrelated quality of life measurement in randomized clinical trials in surgical oncology. J Clin Oncol 24:3178–3186 6. Efficace F, Bottomley A, Osoba D et al (2003) Beyond the development of health-related quality-of-life (HRQOL) measures: a checklist for evaluating HRQOL outcomes in cancer clinical trials–does HRQOL evaluation in prostate cancer research inform clinical decision making? J Clin Oncol 21:3502–3511 7. Fayers PM, Machin D (2000) Quality of life: assessment, analysis, and interpretation. Wiley, New York 8. Goodwin PJ, Black JT, Bordeleau LJ et al (2003) Healthrelated quality-of-life measurement in randomized clinical trials in breast cancer–taking stock. J Natl Cancer Inst 95:263–281 9. Streiner DL, Norman GR (1995) Health measurement scales: a practical guide to their development and use, 2nd edn. Oxford University Press, Oxford
Surgical Performance Under Stress: Conceptual and Methodological Issues
12
Sonal Arora and Nick Sevdalis
Contents 12.1
Introduction ............................................................ 141
12.2
Part 1: Models and Theories of Stress .................. 142
12.2.1 Systemic Stress: Selye’s Theory .............................. 142 12.2.2 Psychological Stress: The Lazarus Theory .............. 142 12.2.3 The Yerkes–Dodson Law ......................................... 143 12.3
Part 2. Measures of Stress in Surgery .................. 143
12.3.1 Objective Measures of Stress ................................... 143 12.3.2 Subjective Measures of Stress .................................. 144 12.3.3 Combined Subjective and Objective Measures of Stress.................................................... 144 12.4
Abstract This chapter provides an overview of surgical performance under stressful conditions, often present in the operating room (OR). We begin with an overview of models and theories of stress and their relationship to human performance. We then present the current state of the art in the measurement of stress in the context of surgery and measures of surgical performance. Finally, we summarise evidence on the impact of stress on performance in the OR. We conclude with a discussion on the implications of the existing evidence based on surgical stress for the training of junior surgeons and propose directions for future empirical research.
Part 3: Measures of Performance in Surgery ...... 144
12.4.1 Measures of Technical Performance ........................ 145 12.4.2 Measures of Non-Technical Performance ................ 145 12.5
Part 4: Impact of Stress on Surgical Performance ....................................... 146
12.6
Part 5: Discussion .................................................. 146
12.7
Implications for Surgical Training ....................... 147
12.8
Future Research Agenda ....................................... 148
References ........................................................................... 149
S. Arora () Department of Biosurgery and Surgical Technology, Imperial College London, 10th floor QEQM, St. Mary’s Hospital, South Wharf road, London W2 1NY, UK e-mail: [email protected]
12.1 Introduction Stress has become a common denominator in today’s fast-paced, complex society. Anecdotal evidence suggests that people experience stress in relation to all aspects of their lives – including personal and family relationships, social encounters and, perhaps most importantly, professional life. Surgery is not an exception. Acute or chronic, stress is present in all facets of surgery – an inherently risky occupation with performing in the OR a considerable pressure itself. Organisational issues, new technologies, relationships with colleagues and health care staff can also compound the stress that a surgeon experiences daily – as can family and personal problems [1]. Despite the obvious prevalence of stress in surgery, little empirical research has been carried out that addresses either the sources of stress or its impact on surgeons’ performance. More recently, developments in the delivery of surgical services and surgical training are contributing to stress being given significant attention from surgical researchers and trainers alike.
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_12, © Springer-Verlag Berlin Heidelberg 2010
141
142
These developments include the following: • Changes in the way surgical care is delivered: working hours are being reduced (for instance, through the European Working Time Directive [2] and thus shift-working is increasing). • Changes in surgical training: high-fidelity simulators are becoming increasingly available. In addition, there is an increasing realisation that the apprenticeship model of learning (“see one, do one, teach one”) is not the most appropriate for the development of surgeons in terms of technical [3] and non-technical skills (e.g. leadership and decision-making) [4]. • Increased prominence of patient safety concerns: the safety of medical and surgical patients is becoming an increasingly prominent feature of their care through well-publicised reports from the Institute of Medicine [5], the UK’s Department of Health [2] and peer-reviewed evidence[6, 7]. The present chapter provides an overview of surgical performance under stressful conditions, often present in the OR. We begin with an overview of models and theories of stress and their relationship to human performance (Part 1). We then present the current state of art in the measurement of stress in the context of surgery (Part 2) and measures of surgical performance (Part 3). Finally, we summarise evidence on the impact of stress on performance in the OR (Part 4). We conclude with a discussion on the implications of the existing evidence based on surgical stress for the training of junior surgeons. We also propose directions for future empirical research (Part 5).
12.2 Part 1: Models and Theories of Stress Stress is typically defined as bodily processes resultant from circumstances that place physical or psychological demands on an individual [8]. The external forces that impinge on the body are “stressors”. A stressor is defined as any real or imagined event, condition, situation or stimulus that instigates the onset of the human stress response process within an individual [9]. Stress is only present when demands outweigh perceived resources [10]. There are two main categories of theories that focus on the specific relationship between external demands
S. Arora and N. Sevdalis
(stressors) and bodily processes (stress). One approach is based in physiology and psychobiology [11]. This approach is known as the “systemic stress approach”. The second approach has been developed within the field of cognitive psychology and is known as the “psychological stress approach” [12, 13].
12.2.1 Systemic Stress: Selye’s Theory This approach stems largely from the work of the endocrinologist Hans Selye, who observed that in animal studies a variety of stimulus events (e.g. heat, cold and toxic agents) are capable of producing common effects that are not specific to either stimulus event. These non-specifically caused common effects constitute the stereotypical, i.e. specific, response pattern of systemic stress. According to Selye [11], stress is defined as “a state manifested by a syndrome which consists of all the non-specifically induced changes in a biologic system”. This state has been termed “General Adaptation Syndrome”. Key criticisms of this approach include its failure to include psychological aspects of stress and whether human responses to stressors mimic those of animals.
12.2.2 Psychological Stress: The Lazarus Theory In this approach, stress is regarded as a relational concept, i.e. the relationship (“transaction”) between individuals and their environment [12]. This transactional, or interactional, approach focuses on thoughts and awareness that determine individuals’ stress responses. Psychological stress refers to a relationship with the environment that the person appraises as significant for his or her well-being and in which the demands tax or exceed available coping resources [14]. In this approach, the following two are the key mediators within the person–environment transaction: (i) cognitive appraisal: individuals’ evaluation of the significance of what is happening for their well-being and (ii) coping: individuals’ efforts in thought and action to manage specific demands [15]. The concept of appraisal is based on the idea that emotional processes (including stress) are dependent upon individual expectancies of the significance and outcome of an
12 Surgical Performance Under Stress: Conceptual and Methodological Issues
event. This helps to explain individual differences in quality, intensity and duration of an elicited emotion in environments that appear very similar. “Primary appraisal” is the individual’s evaluation of an event as a potential hazard to well-being. “Secondary appraisal” is the individual’s evaluation of his/her ability to handle the event. By being a function of these appraisals, levels of experienced stress depend on the subjective interpretation of whether an event poses a threat to the individual (primary appraisal) and whether there are resources to cope with the stressor (secondary appraisal). Coping behaviours follow these appraisals. According to Lazarus and Folkman [13], coping is defined as “the cognitive and behavioural efforts made to master, tolerate, or reduce external and internal demands and conflicts among them”. Coping includes attempts to reduce the perceived discrepancy between situational demands and personal resources [16].
12.2.3 The Yerkes–Dodson Law One of the few themes that seems to span virtually all existing evidence on stress and human performance is that performance peaks when the subject is in some optimal stress or arousal state, above or below which efficiency of performance decreases – an idea known as the Yerkes–Dodson law [17] (Fig. 12.1). Optimal stress or arousal state decreases with increasing task difficulty.
performance
Yerkes-Dodson
arousal
Fig. 12.1 The Yerkes–Dodson law of the relationship between stress/arousal and performance
143
12.3 Part 2: Measures of Stress in Surgery Stress in humans involves a physiological response and a cognitive-behavioural component. In assessing stress, therefore, both these components should be measured systematically. In this section, we present objective measures of stress that capture the former and subjective measures of stress that capture the latter in the context of surgery.
12.3.1 Objective Measures of Stress The normal physiological response to stress (“fight or flight”) results in an endogenous catecholamine release leading to increased cardiac activity. This can be determined by measuring the heart rate (HR). Studies [18–23] have used HR as a proxy measure for stress, finding the mean HR to be elevated for surgeons during an operation [18, 21], but also that experience moderated the effect of stress, with seniors exhibiting less change in HR compared to juniors [20]. Heart rate variability (HRV): HRV is the oscillation in the interval between consecutive heart beats and between consecutive instantaneous HRs [24–26]. It is an indicator of the sympathovagal balance during surgery [27] – i.e. an index of autonomic function. As stress has been linked to increased sympathetic and parasympathetic activity [28], changes in mental stress which alter autonomic activity can therefore affect HRV. HRV has been used as an indirect measure of stress/mental strain in some studies [19, 25, 26]. Power spectral analysis of HRV allows assessment of the sympathovagal activities regulating the HR by quantitatively evaluating beat-tobeat cardiac control. Spectral components include a low frequency component (LF) which rises with increased sympathetic activity [29] and a high frequency component (HF) [25] which rises with increased vagal activity [24]. The ratio of LF/HF therefore gives an overall picture of the ANS [24, 27]. The higher the ratio, the greater the stress. Operating has been found to affect HRV [26]. However, as with HR, the effects of physical activity and mental stress on HRV cannot be separated and the measure is subject to individual differences. Skin conductance level is known to rise with increased stress [30] and has been used as an objective measure to evaluate the activity of the sympathetic nervous system [31–35].
144
S. Arora and N. Sevdalis
Table 12.1 Aggregated movement of stress indicators across cases (Arora et al., 2009) Heart rate Elevated Cortisol elevated STAI (self-report)
Normal
Cortisol dropped
Cortisol elevated
Cortisol dropped
Total
Elevated
15
6
1
1
23
Dropped
1
4
1
17
23
Did not change
2
2
0
4
8
18
12
2
22
54
Total
The electrooculogram utilises an ergonomics workstation to collect data from which the number of eye blinks can be counted. The number of eye blinks increases as stress levels rise [31, 33]. Salivary cortisol is an adrenocortical hormone which rises as a result of the neuroendocrine response to a stressful situation and is widely used in nonsurgical studies. In surgery, Jezova et al. [36] confirmed that cortisol levels were higher during a work day for both junior and senior surgeons compared to a nonwork day, suggesting higher stress levels.
12.3.2 Subjective Measures of Stress These involve the subject’s self-report of their stress levels, typically assessed through a questionnaire. In surgery, only few studies have so far used a selfreported assessment of stress. Of those, some have asked subjects to report their stress using questionnaires [20, 27, 31, 37], whereas others have relied on interviews [1, 38]. Low penetration of such self-report tools in surgery is a significant problem in this literature, since such tools capture subjects’ subjective experience of stress – which, as we discussed above, is a key component of stress in humans. A self-report measure of particular relevance to surgery is the Spielberger’s Stress Trait Anxiety Inventory (STAI), which also exists in short, six-item form [39–41].
12.3.3 Combined Subjective and Objective Measures of Stress Very few surgical studies to date [31, 33, 34] have used both objective and subjective measures of stress. In a
recent study conducted by our research group (Arora et al., 2009), a multimodal stress assessment tool was developed and validated using 55 real cases. In this study, stress was assessed subjectively using the STAI scale (pre- and post-operatively), and objectively via salivary cortisol (pre- and post-operatively). In addition, participating surgeons were asked to wear a Polar HR monitor throughout the study. In the observed case set, 23/55 cases were deemed stressful – as defined by an increase in the STAI between pre- and post-operative administration. Movements of HR and cortisol against stress self-reported by surgeons can be seen in Table 12.1. Perfect agreement between subjective and objective indicators was obtained in 15/23 stressful cases and 17/23 non-stressful cases. In these cases, elevated STAI scores were associated with elevated HR and cortisol levels (stressful cases) and decreased STAI scores were associated with normal HR and decreased cortisol levels (non-stressful cases). The study demonstrated that in 70% of cases, the raised STAI was mirrored by an increase in both objective parameters. Thus, the rise in HR and cortisol found is likely due to subjects’ mental stress (rather than, for instance, physical demands of carrying out the procedure). Further analyses revealed that HR was more sensitive (91%) and cortisol more specific (91%) in picking up mental stress. Using objective and subjective assessments of stress appears to be feasible and informative.
12.4 Part 3: Measures of Performance in Surgery Surgical performance encompasses technical and nontechnical aspects. The former cover the traditionally measured and assessed psychomotor skills of surgeons. The latter cover a set of behavioural (teamworking,
12 Surgical Performance Under Stress: Conceptual and Methodological Issues
communication and leadership) and cognitive skills (decision-making; situation awareness) that have been proposed as potential co-determinants (alongside technical skills) of surgical performance. In this section, we discuss direct measures and surrogate measures of technical performance, followed by assessment tools for non-technical skills.
[35, 44] and motion analysis using ICSAD [46]. Simulator-derived measures, especially error scores, have also been used: errors include inaccurate placement of object [34, 46], dropping an object [34, 44, 46], blood loss [35, 45], vessels ripped [35] and tissue damage [44].
12.4.2 Measures of Non-Technical Performance
12.4.1 Measures of Technical Performance Measures of technical performance consist of dexterity parameters (e.g. economy of motion and time taken [42]) and indicators of quality of performance (e.g. OSATS-based global rating scales [43]) as well as task specific measures (e.g. accuracy of stent placement). Various studies that have examined surgical stressors have used a range of performance measures/markers. Time taken to complete a task [32, 34, 44, 45] and average operative time [25, 26] have both been used. In laparoscopic surgery, measures of performance include number of knots tied [31, 33], economy of motion
Table 12.2 Non-technical performance assessment tools Tool Developer Development history Observational Teamwork Assessment for Surgery (OTAS)©
145
Table 12.2 summarises three of the most well-known measures of non-technical performance currently available for surgery. NOn-TECHnical Skill (NOTECHS) and non-technical skills for surgeons (NOTSS) assess performance at the level of the individual surgeon, whereas OTAS assesses performance within the surgical team (e.g. primary surgeon and assistant/camera holder). Of the available scales, only NOTECHS has been used in studies that have investigated surgical stress [45, 46, 55]. Communication and utterance frequency have also been used as a surrogate marker of non-technical performance [46].
Skills assessed Communication Cooperation/back up behaviour Coordination Leadership Monitoring behaviour
Clinical speciality
Individual vs. Team focus Surgical, anaesthetic, and nursing sub-teams
Imperial [48–51]
Developed for OR teams
NOn-TECHnical Skill (NOTECHS)
UoA [52] Imperial [53, 54]
Developed for aviation; revised for OR
Communication/ interaction Situation awareness Cooperation/team skills Leadership/ managerial skills Decision-making
Surgical anaesthetic nursing
Individual team members
Non-technical skills for surgeons (NOTSS)©
UoA [4, 55]
Developed for surgeons
Communication/ teamwork Leadership Situation awareness Decision-making
Surgical
Individual team members
Imperial, Imperial College London; UoA, University of Aberdeen
Surgical Anaesthetic Nursing
146
12.5 Part 4: Impact of Stress on Surgical Performance There is conflicting evidence regarding the stress levels in robotic vs. laparoscopic surgery. Studies that used skin conductance levels to measure stress found that stress was lower for robotic than for laparoscopic surgery [32, 34]. However, a study that used selfreported stress [56] found no difference between the two techniques in terms of mental stress (interestingly, both studies found that performance was worse for robotic than for laparoscopic surgery). In contrast, the evidence is consistent regarding stress in open vs. laparoscopic surgery: the former is less stressful than the latter [25, 31, 33]. In these studies, stress was assessed via self-report, skin conductance and eye blinks. Regarding performance, fewer knots were tied and operative time was longer [25] with laparoscopic surgery – thereby, suggesting that increased stress affected performance negatively. Expertise is also related to stress. Relevant studies suggest that experienced subjects have lower stress levels (as measured by HR [20, 23], HRV [25] and skin conductance, self-report and eye blinks [31]). These studies also show that experienced surgeons performed technically better than their less experienced counterparts. In another study, Moorthy et al. examined the effect of bleeding on technical and non-technical performance of experienced and inexperienced surgeons. Although the researchers did not obtain direct measures of stress, existing evidence suggests that bleeding is a key surgical stressor [38]. This study found that experienced surgeons were significantly better in controlling the bleeding. Taken together, these studies suggest that with expertise, stress levels become lower and technical performance improves. Studies that have investigated the effects of distractions/interruptions on technical performance [35, 44, 46] have shown that increased distractions correlate with poorer performance (increased task time, number of errors and poorer economy of motion) on difficult laparoscopic tasks. Moorthy et al. [46] also found that certain distractions, like noise in the OR and time pressure, were associated with significantly impaired dexterity and increased errors, when compared to quiet conditions. Finally, some studies have examined the impact of multiple stressors on surgical performance.[35, 45, 55]. In the study by Moorthy et al. [46] that we discussed
S. Arora and N. Sevdalis
above, multiple stressors (e.g. bleeding, time pressure and distractions) led to worse performance. Undre et al. [55] used bleeding as a stressor for surgeons and other stressors for the anaesthetic (e.g. difficult intubation) and nursing sub-teams (e.g. equipment missing from tray and inexperienced circulating nurse). Schuetz et al. [35] exposed their surgeons to multiple distractions and subsequently split them into three groups: those that experienced stress (measured via skin conductance) but did not recover, those that experienced stress but did recover and those that did not experience stress. The researchers found that the group who experienced stress but recovered demonstrated better dexterity than either the group with stress and no recovery or the group with no stress. Few studies have addressed issues relating to stress and non-technical skills in the OR. Moorthy et al. [45] used bleeding as a stressor and a revised version of NOTECHS for surgeons and did not obtain effects on the non-technical skill scales between their expert and novice subjects. Undre et al. used different stressors for different members of the OR team and used NOTECHS to assess non-technical performance in surgeons, anaesthetists and nurses. These researchers found that leadership and decision-making were scored lower than other skills. Finally, two studies have used interviews to assess surgeons’ own views about the effects of stress on their own performance [1, 38]. Both studies have shown that excessive stress leads to impaired communication, team working, judgement and decision-making.
12.6 Part 5: Discussion This chapter aimed to provide an overview of surgical performance under stress. We explored the concept and measures of stress applied to the domain of surgery, and discussed a range of tools to assess stress in the OR. Moreover, we discussed tools that have been used to assess surgical performance, technical and non-technical, in studies that have put surgeons under stress so as to investigate the impact of stress on their performance. Furthermore, we summarised some of the existing evidence on the impact of stressful conditions on surgeons in the OR and in surgical simulators. The overall picture that emerges from these studies is that surgeons are exposed to a range of performancedebilitating conditions. Technical issues (e.g. bleeding, technically demanding procedures in laparoscopic
12 Surgical Performance Under Stress: Conceptual and Methodological Issues
surgery) are a key stressor – but not the only one. Distractions and disruptions also trigger significant levels of stress, as does lack of expertise. Importantly, whereas technical issues cannot always be predicted and expertise can only be acquired relatively slowly over a number of years in training, distractions are not entirely unavoidable. In many ORs, levels of distraction and interruption to surgical work are not negligible and some existing evidence suggests that they both cause stress and have a negative impact on surgical performance [47, 50]. Redesign of the work environment in such a way that distraction is minimised is an option that could be considered in new hospitals. Alternatively, existing work processes could be assessed for their functionality and redesigned, so that levels of disruption to the OR team through, for instance, external requests, are kept to an unavoidable minimum. More generally, the current review of stressful conditions under which surgeons are often asked to operate [57] raises a number of issues in relation to surgical training. In addition, in the light of the evidence that we have reviewed, recommendations can be made in terms of further research that is necessary to elucidate a number of issues. In what follows, we address these implications for training and research.
12.7 Implications for Surgical Training Current surgical training programmes provide very little opportunity for surgeons to recognise and respond to stress before it becomes deleterious to their practice. Surgeons are typically taught to perform a procedure in routine circumstances despite the evidence that stressors do occur in the OR – a situation less than ideal for inexperienced surgeons. Senior surgeons have learnt from anecdotal experience how to deal with potentially stressful situations, but the juniors do not have this benefit. From a training perspective, this situation is neither safe nor acceptable. Preconditioning to stress (i.e. experiencing stress before facing it in real OR circumstances) “confers a well-documented influence on the cardiovascular response and alters a subject’s approach to psychological challenges” [22, 37]. In fact, it has been suggested that senior surgeons respond better to stressful conditions in the OR because they have been preconditioned through their experience [22].
147
Given the reduced training time for junior surgeons, other routes to stress preconditioning should be sought. Simulation-based training offers a potential training route. Simulations provide a safe training environment, in which errors can occur safely, and systematic feedback on technical and non-technical performance can be provided to trainees. Simulation-based crisis management modules have been widely used in the aviation industry as part of crew resource management training [58] and more recently, they have been introduced in the training of junior anaesthetists. Lessons learnt in the context of the aviation industry and anaesthesia could be applied to surgery. Current evidence suggests that such modules are well received from trainers and trainees alike [45, 55]. How should a simulation-based stress training module be designed? Arora et al. [1] report a systematic investigation of what such a module should be based on. In detailed semi-structured interviews with expert surgeons, Arora et al. explored key coping strategies used by surgeons in stressful situations and their requirements from an intervention to enhance their performance under stress. The most common coping strategy employed was internal to the surgeon (i.e. cognitive control of one’s own stress). A key element of the surgeons’ response was to stop what they were doing, so as to be able to stand back and re-assess the situation. According to one interviewee, a stressful situation “can lead to a cascade of further errors from that point. If you remain stressed, you start panicking and blaming others. Instead of trying to control and contain the situation, it starts cascading out of control which leads to further errors and ultimately a serious incident …” (s13). Most of the interviewed surgeons (10/15) also mentioned that they used some form of pre-operative planning to minimise potential for intra-operative stress: “if you’re not prepared before you’re going to run, you’re guaranteed to run into troubles during the operation” (s6). This includes mental rehearsal (“I sort of play the operation each step backwards and forwards in my mind” (s6) ), contingency planning (“before you start, get one or two things ready in case serious things go wrong…then you have a get out clause, to help you out of trouble” (s7) ), and practicing crisis scenarios beforehand (“you have a game plan to deal with those scenarios” (C12) ). Working as a team was also highlighted as a crucial response: “create a rapport with theatre staff at all levels”
148
S. Arora and N. Sevdalis
(s13); “if you’re out of your comfort zone; communicate this to others” (s7). This is important because if the surgeon runs into difficulties, others “wake up and pay extra attention too” (c13). Timely acknowledgement of stress was very well put by one surgeon: “during a life threatening situation, I actually stop what I am doing and turn to talk to the scrub nurse. I look into her eyes and say this is a life threatening situation now, we’re in trouble here, we need as much help as possible … this is serious, I will be asking for a lot of things and may say them in a haphazard manner” (s1). Arora et al’s study also examined in detail the components of a stress training module that the surgeons would find useful. These are summarised in Table 12.3. Two main categories of such a module were identified. First, components that would reduce the stress and second, components that would improve surgeons’
ability to manage stress. Factual information about stress, cognitive training, team training and individualised feedback should be provided as part of stress training. Realistic simulation-based training, with individualised feedback on technical and non-technical skills, was favoured by most participating surgeons. Finally, most participants’ view was that the training should be targeted at Higher Surgical Trainees/ Residents
12.8 Future Research Agenda The empirical evidence on stress and surgical performance is rather sparse and the studies heterogeneous in their stress-inducing conditions and measurement
Table 12.3 Components of a stress training intervention Components of intervention Number of surgeons who mentioned this (n = 15) Cognitive training
Importance of intervention components
11
Raising awareness of own coping method
Most important
Technique to improve coping Learning from expert Technical training
11
Provide experience Use simulation Specific skills i.e. when to call for help, control bleeding Measuring outcomes
12
Objective marker of stress Effect of stress on performance Performance before and after intervention Feedback
13
Feedback and debrief after scenario Videos Opportunity for reflection Team training Dealing with difficult colleague briefing Improving communication Managing others and the environment when under stress
8 Least important
12 Surgical Performance Under Stress: Conceptual and Methodological Issues
instruments used. Further research should systematically and quantitatively investigate consequences of excessive levels of stress in the OR (as seen in a crisis situation) on surgical performance – thereby, contributing to the delineation of effective stress management strategies and training needs at different levels of expertise. This research should also address whether increasing experience in a procedure reduces stress and, subsequently, enhances performance (i.e. quantitative assessment of relevant learning curves for performance and stress). Moreover, instruments and tools need to be developed and validated if reliable evidence is to be collected on the human factors side of stress. To assess stress, both subjective (i.e. self-report) and objective (i.e. physiological) measures are necessary, as these capture subjective experience and bodily response to stressful OR conditions. Comprehensiveness and robustness in assessment should be balanced with simplicity and ease of use in real OR context to facilitate real world application. Furthermore, the impact of stress on non-technical skills remains largely unexplored. These skills are likely to deteriorate as a result of excessive stress. Conversely, adequate training in non-technical skills (such as effective team work and mental readiness) may prove an effective coping strategy. Both these questions should be addressed empirically in further research. In addition, although excessive stress can compromise performance, a small amount of it can help concentration and alertness, as is evident in the Yerkes–Dodson law [17]. Research should seek to determine where this optimal level of stress lies for surgery and investigate how well surgeons actually cope with stress (in addition to just assessing stress levels across procedures, levels of expertise, etc.). Determining optimal levels of stress for novices and experts can form the basis for training programmes designed to reduce the deleterious effects of stress on surgical practice, ultimately enhancing the quality and safety of patient care. Acknowledgements This chapter is based on an ongoing research programme on safety implications of surgical stressors that is being carried out by our research group. Dr. Roger L. Kneebone has had an instrumental role in the shaping and development of this work over a number of years. The authors would like to thank the BUPA Foundation and the Economic and Social Research Council (ESRC) Centre for Economic Learning and Social Evolution for providing funding for the work reported in this chapter.
149
References 1. Arora S, Sevdalis N, Nestel D et al (2009) Managing intraoperative stress: what do surgeons want from a crisis training programme? Am J Surg 197:537–43 2. Department of Health (2008) A high quality workforce: NHS next stage review. In: A high quality workforce: NHS next stage review. Department of Health, London 3. Aggarwal R (2004) Surgical education and training in the new millennium. Surg Endosc 18:1409 4. Yule S, Flin R, Paterson-Brown S et al (2006) Non-technical skills for surgeons in the operating room: a review of the literature. Surgery 139:140–149 5. Linda K, Janet C, Molla D (eds) (2000) To err is human: building a safer health system. National Academy Press, Washington, DC 6. Vincent C, Neale G, Woloshynowych M (2001) Adverse events in British hospitals: preliminary retrospective record review. BMJ 322:517–519 7. Vincent C, Moorthy K, Sarker SK et al (2004) Systems approaches to surgical quality and safety: from concept to measurement. Ann Surg 239:475–482 8. Selye H (1973) The evolution of the stress concept. Am Sci 61(6):692–699 9. Everly GS Jr, Lating JM (2002) A clinical guide to the treatment of the human stress response, 2nd edn, Kluwer Academic/Plenum Publishers, New York 10. Lazarus R S (1966) Psychological Stress and the coping process. McGraw-Hill, New York 11. Selye H (1978) The stress of life (revised edition). Mcgraw Hill, Oxford 12. Lazarus RS (1991) Emotion and adaptation. Oxford University Press, New York 13. Lazarus RS, Folkman S (1984) Stress, appraisal and coping. Springer, New York 14. Lazarus RS, Folkman S (1986) Cognitive theories of stress and the issue of circularity dynamics of stress: physiological, psychological, and social perspectives. Plenum Press, New York, pp. 63–80 15. Lazarus RS (1993) From psychological stress to the emotions: a history of changing outlooks. Annu Rev Psychol 44:1–21 16. Lazarus RS (1993) Coping theory and research: past, present, and future. Psychol Med 55:234–247 17. Yerkes RM, Dodson JD (1908) The relation of strength of stimulus to rapidity of habit formation. J Comp Neurol Psychol 18:459–482 18. Becker W, Ellis H, Goldsmith R et al (1983) Heart rates of surgeons in theatre. Ergonomics 26:803–807 19. Czyzewska E, Kiczka K, Czarnecki A et al (1983) The surgeon’s mental load during decision making at various stages of operations. Eur J Appl Physiol Occup Physiol 51: 441–446 20. Kikuchi K, Okuyama K, Yamamoto A et al (1995) Intraoperative stress for surgeons and assistants. J Ophthalmic Nurs Technol 14:68–70 21. Payne R, Rick J (1986) Heart rate as an indicator of stress in surgeons and anaesthetists. J Psychosom Res 30:411–420 22. Tendulkar AP, Victorino GP, Chong TJ et al (2005) Quantification of surgical resident stress “on call”. J Am Coll Surg 201(4):560–564
150 23. Yamamoto A, Hara T, Kikuchi K et al (1999) Intraoperative stress experienced by surgeons and assistants. Ophthalmic Surg Lasers 30:27–30 24. Anonymous (1996) Heart rate variability: standards of measurement, physiological interpretation and clinical use. Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology. Circulation 93:1043–1065 25. Bohm B, Rotting N, Schwenk W et al (2001) A prospective randomized trial on heart rate variability of the surgical team during laparoscopic and conventional sigmoid resection. Arch Surg 136:305–310 26. Demirtas Y, Tulmac M, Yavuzer R et al (2004) Plastic surgeon’s life: marvelous for mind, exhausting for body. Plast Reconstr Surg 114:923–931; discussion 932–923 27. Bootsma M, Swenne CA, Van Bolhuis HH et al (1994) Heart rate and heart rate variability as indexes of sympathovagal balance. Am J Physiol 266:H1565–1571 28. Pagani M, Furlan R, Pizzinelli P et al (1989) Spectral analysis of R-R and arterial pressure variabilities to assess sympatho-vagal interaction during mental stress in humans. J Hypertens Suppl 7:S14–15 29. Yamamoto Y, Hughson RL, Peterson JC (1991) Autonomic control of heart rate during exercise studied by heart rate variability spectral analysis. J Appl Physiol 71:1136–1142 30. Boucsein W (1992) Electrodermal activity. Plenum Press, New York 31. Berguer R, Smith WD, Chung YH (2001) Performing laparoscopic surgery is significantly more stressful for the surgeon than open surgery. Surg Endosc 15:1204–1207 32. Berguer R, Smith W (2006) An ergonomic comparison of robotic and laparoscopic technique: the influence of surgeon experience and task complexity. J Surg Res 134:87–92 33. Schuetz M, Gockel I, Beardi J et al (2007) Three different types of surgeon-specific stress reactions identified by laparoscopic simulation in a virtual scenario. Surg Endosc 34. Smith WD, Chung YH, Berguer R (2000) A virtual instrument ergonomics workstation for measuring the mental workload of performing video-endoscopic surgery. Stud Health Technol Inform 70:309–315 35. Smith WD, Berguer R, Rosser JC Jr (2003) Wireless virtual instrument measurement of surgeons’ physical and mental workloads for robotic versus manual minimally invasive surgery. Stud Health Technol Inform 94:318–324 36. Jezova D, Slezak V, Alexandrova M et al (1992) Professional stress in surgeons and artists as assessed by salivary cortisol. Gordon & Breach Science Publishers, Philadelphia 37. Kelsey RM, Blascovich J, Tomaka J et al (1999) Cardiovascular reactivity and adaptation to recurrent psychological stress: effects of prior task exposure. Psychophysiology 36:818–831 38. Wetzel CM, Kneebone RL, Woloshynowych M et al (2006) The effects of stress on surgical performance. Am J Surg 191:5–10 39. CD Speilberger, RL Gorsuch, Lushene R (1970) STAI manual. Consulting Psychologist Press, Palo Alto 40. Tanida M, Katsuyama M, Sakatani K (2007) Relation between mental stress-induced prefrontal cortex activity and skin conditions: a near-infrared spectroscopy study. Brain Res 1184:210–216 41. Speilberger CD, Gorsuch RLREL (1970) STAI manual. Consulting Psychologist Press, Palo Alto
S. Arora and N. Sevdalis 42. Aggarwal R, Grantcharov T, Moorthy K et al (2006) A competency-based virtual reality training curriculum for the acquisition of laparoscopic psychomotor skill. Am J Surg191:128–133 43. Grantcharov TP, Bardram L, Funch-Jensen P et al (2002) Assessment of technical surgical skills. Eur J Surg 168:139–144 44. Hassan I, Weyers P, Maschuw K et al (2006) Negative stresscoping strategies among novices in surgery correlate with poor virtual laparoscopic performance. Br J Surg 93: 1554–1559 45. Moorthy K, Munz Y, Forrest D et al (2006) Surgical crisis management skills training and assessment: a simulation[corrected]-based approach to enhancing operating room performance. Ann Surg 244:139–147 46. Moorthy K, Munz Y, Dosis A et al (2003) The effect of stress-inducing conditions on the performance of a laparoscopic task. Surg Endosc 17:1481–1484 47. Sevdalis N, Lyons M, Healey AN et al (2008) Observational teamwork assessment for surgery© (OTAS©): construct validation with expert vs. novice raters. Ann Surg 249:10471051 48. Undre S, Healey AN, Darzi A et al (2006) Observational assessment of surgical teamwork: a feasibility study. World J Surg 30:1774–1783 49. Undre S, Sevdalis N, Healey AN et al (2007) Observational teamwork assessment for surgery (OTAS): refinement and application in urological surgery. World J Surg 31:1373–1381 50. Undre S, Sevdalis N, Vincent CA (2009) Observing and assessing surgical teams: the observational teamwork assessment for surgery© (OTAS)©. In: Flin R, Mitchell L (eds) Safer surgery: analysing behaviour in the operating theatre. Ashgate, Aldershot 51. Flin R, Maran N (2004) Identifying and training non-technical skills for teams in acute medicine. Qual Safe Health Care 13(Suppl 1):i80–i84 52. Moorthy K, Munz Y, Adams S et al (2005) A human factors analysis of technical and team skills among surgical trainees during procedural simulations in a simulated operating theatre. Ann Surg 242:631–639 53. Sevdalis N, Davis R, Koutantji M et al (2008) Reliability of a revised NOTECHS scale for use in surgical teams. Am J Surg 196(2):184–190 54. Yule S, Flin R, Maran N et al (2008) Surgeons’ non-technical skills in the operating room: reliability testing of the NOTSS behavior rating system. World J Surg 32:548–556 55. Undre S, Koutantji M, Sevdalis N et al Multidisciplinary crisis simulations: the way forward for training surgical teams. World J Surg 31:1843–1853 56. Lee EC, Rafiq A, Merrell R et al (2005) Ergonomics and human factors in endoscopic surgery: a comparison of manual vs telerobotic simulation systems. Surg Endosc 19:1064–1070 57. Sevdalis N, Arora S, Undre S et al (2009) Surgical environment: an observational approach. In: Flin R, Mitchell L (eds) Safer surgery: analyzing behaviour in the operating theatre. Ashgate 58. Helmreich RL, Merritt AC, Wilhelm JA (1999) The evolution of crew resource management training in commercial aviation. Int J Av Psych 9:19–32
13
How can we Assess Quality of Care in Surgery? Erik Mayer, Andre Chow, Lord Ara Darzi, and Thanos Athanasiou
Abbreviations
Contents Abbreviations ..................................................................... 151 13.1
Introduction ............................................................ 151
13.2
Quality of Care ....................................................... 152
NHS U.K. U.S.
National Health Service United Kingdom United States
13.2.1 Defining Quality of Care .......................................... 152 13.2.2 How Should we Assess Quality of Care?................. 152 13.3
Measuring Quality of Care .................................... 154
13.3.1 13.3.2 13.3.3 13.3.4
Structural Variables .................................................. Process Measures ..................................................... Outcome Measures ................................................... Health care Economics .............................................
13.4
Benchmarking Quality of Care ............................. 158
154 155 156 157
13.4.1 Current Initiatives ..................................................... 158 13.4.2 Pay for Performance Strategies ................................ 159 13.4.3 Future Direction ....................................................... 160 13.5
Public Health Implications .................................... 160
13.6
How to Design Health Care Quality Reforms ..... 161
13.7
How to Achieve Health Care Quality Improvement .......................................................... 162
13.8
Conclusions ............................................................. 162
References ........................................................................... 163
Abstract This chapter explores and outlines existing research in the area of quality of care and identifies methods by which future research should be conducted. Before we try and assess quality of care, we must first be able to define it, although this in itself is complicated by the complexity of interacting factors that determine quality health care. The characterisation by a conceptual model of structure-process-outcome and the importance of health economics is discussed. The proposed attributes of health care which can define its quality are also presented. Existing initiatives that benchmark quality of care have a tendency to be generic and give us only an indication of minimum standards. The components of future assessment tools for quality of care are proposed along with how “frontline” quality improvement can be facilitated by conceptual frameworks for designing health system reforms and engaging contemporary managerial capabilities.
13.1 Introduction
E. Mayer () Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, QEQM Building, St Mary’s Hospital Campus, Praed Street, London, W2 1NY, UK e-mail: [email protected]
The provision of high quality care is the universal aim of any health care system and those that work within it. When the health service is working at its best, it can provide excellent care to our patients, and it is well recognised that high quality care can lead to high
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_13, © Springer-Verlag Berlin Heidelberg 2010
151
152
quality results. However, this is not always achieved. There exist wide variations in the quality of surgical care provision. This variation occurs between countries, regions, hospitals, departments and surgeons. The delicate interaction of multiple factors at numerous stages of a patient’s care pathway means that any single suboptimal episode can result in a cascade effect on the overall quality of care, to the detriment of the person who matters most, our patient. It is therefore imperative that there is a focus on continually improving the quality of care that is delivered. In order to do this, the quality of care must first be defined so that its integral components are understood. Only then can methods to assess and measure quality of care be developed. Simultaneously, the very act of measurement of care can be used to benchmark standards and drive continuing improvement. By showing that high quality care is provided, the health care system can demonstrate that it is doing its best to ensure the patient’s well-being and continuing health. Demonstrating high quality care can boost confidence in the health system both for patients and clinicians. It helps to highlight areas that are performing well, and also areas that need more attention. It helps to direct funds and resources towards the areas of care that need it most. Despite all of the universally accepted benefits to determining the quality of care that our patients receive within surgery, there is currently no universally accepted and/or validated measurement system available.
13.2 Quality of Care 13.2.1 Defining Quality of Care The Institute of Medicine in the United States (U.S.) define quality of care as: “ … the degree to which health services for individuals and populations increase the likelihood of desired health outcomes and are consistent with current professional knowledge” [1].
The American Medical Association define high quality care as: “[that] which consistently contributes to the improvement or maintenance of quality and/or duration of life” [2].
Or the definition can incorporate a patient-orientated emphasis such as that of BUPA Hospitals United Kingdom (U.K.) [3]:
E. Mayer et al. “ … ability to provide the service you want and need resulting in medical treatment you can rely on and personal care you’ll appreciate”
It is clear from these definitions that the term “quality care” may imply different things to both clinicians and patients. From the clinician point of view, high quality care means up-to-date, evidence-based patient care that results in improved clinical outcomes. Although this is also important to the patient, they may be more concerned with aspects of care such as availability, flexibility, reliability and personal touches such as politeness and empathy of medical staff. The term “quality of care” can, therefore, be broadly defining to represent an overall impression of delivery of health care, but equally require some very specific and agreed measures of the treatment process or outcomes achieved. This makes it a complex entity to encompass, requiring an ordered approach.
13.2.2 How Should we Assess Quality of Care? Although current definitions of quality of care are applicable, they are deliberately vague and therefore of limited use in defining the assessment of quality of care. Although it can be obvious when high quality care is being provided, providing objective proof of this can be more challenging. The inherent flexibility of health care provision must also be considered; with innovation in medical technology and treatments, optimum care standards and therefore the markers of the quality of care evolve. It is therefore easy to see why standardising the assessment of quality of care even at a procedural level can be problematic. Quality of care assessment should include every aspect of a patient’s journey through their health care system. This would encompass community care, screening where applicable, referral to a specialist, processes of investigation, diagnosis and treatment. Also needed are details of post-operative management and follow-up, both in the hospital and in the community. In other words, the assessment of quality should be multifactorial. It is clear that there are countless variables which could be measured. How then to either measure all of them or identify the most pertinent ones? In 1966, Donabedian divided quality of care into three tangible parts: structure, process and outcome [4] (Fig. 13.1). Structure is concerned with the actual infrastructure of the health care system. This includes aspects
13
How can we Assess Quality of Care in Surgery?
Structure
Process
Outcome
Fig. 13.1 As defined by Donabedian, quality of care is defined by an interaction of three key elements
such as the availability of equipment, availability and qualifications of staff and administration. Process looks at the actual details of care including aspects from diagnostic tests, through to interventions such as surgery and continuity of care. Outcome looks at the end result of medical care, traditionally in the form of survival, and restoration of function. This model has been widely applied to the assessment of quality of care.
153
A fourth element can be proposed for this conceptual model: health care economics or its dependent measure productivity. In modern medicine, the availability of financial resources and any resulting financial constraints can impact the accessibility and delivery of health care services and potentially therefore the provision of high quality care. This is particularly true of publicly funded systems, such as the U.K.’s National Health Service (NHS). The ability of a health care provider to deliver a quality of care operating within financial or resource restrictions is an important factor that must be considered. The notion of cost being associated with quality of care, however, is not novel; Donabedian following on from his structure-process-outcome paradigm described several attributes of health care which define its quality, the “seven pillars of quality”. One of these “efficiency” relates to the “ability to obtain the greatest health improvement at the lowest cost” [5]. Although Donabedian introduces the concept of cost within his quality attributes, the organisation of a contemporary health care service is so influenced by business planning that it directly shapes it. For this reason, a strong argument can be made for health care economics to be included in a conceptual model of quality of care. It could also be argued that health care economics forms part of structure and does, therefore, not require special attention. As described above, Donabedian proposes “seven pillars” or attributes of health care which can define its quality – Fig. 13.2. He then went on to describe 11 principles of quality assurance, which are essential for
1. Efficacy
The ability of care, at its best, to improve health
2. Effectiveness
The degree to which attainable health improvements are realised
3. Efficiency
The ability to obtain the greatest health improvement at the lowest cost
4. Optimality
The most advantageous balancing of costs and benefits
5. Acceptability
Conformity to patient preferences regarding accessibility, patient-doctor relationship, amenities, effects of care and cost of care
6. Legitimacy
Conformity to social preferences concerning all of the above
7. Equity
Fairness in the distribution of care and its effects on health
Fig. 13.2 The seven pillars of quality as defined by Donabedian. Adapted from reference [5]
Quality
154
the design, operation and effectiveness of care [6]. The complexity of quality of care as an entity makes its assessment particularly challenging. Donabedian’s structure-process-outcome model acts as a suitable foundation and has been widely applied to the assessment of quality of care. Paying attention to each of these determinants of quality of care will allow us to build a framework consisting of a number of components that are representative of the quality of care that our patients receive
13.3 Measuring Quality of Care 13.3.1 Structural Variables The structure of surgical care can be thought of as the “bare bricks” or infrastructure of care. It is involved with details such as equipment, number of beds, nurseto-patient ratios, qualifications of medical staff and administrative structure. It is thought that if surgery occurs in a high quality setting, surely high quality care should follow. An advantage of measuring structural variables is that the information required is usually fairly reliable and used frequently at a hospital managerial board level. It is, however, infrequently used in a more clinical and/or public domain to help inform the environment in which surgical care is delivered. Logically though, we need to be certain what correlation exists between these structural variables and quality of care, but this is not well established. Brook et al. have assessed the relationship between patient, physician and hospital characteristics and the appropriateness of intervention for carotid endarterectomy, coronary angiography and upper gastrointestinal endoscopy [7]. They concluded that the appropriateness of care could not be reliably predicted from standard, easily obtainable data about the patient, the physician or hospital structural variables. However, coronary angiography and carotid endarterectomy were significantly more likely to be carried out for medically appropriate reasons if performed in a teaching hospital. Hospital teaching status and other associated hospital variables such as size or for-profit status did not, however, translate into lower post-operative complication or death rates following carotid endarterectomy [8].
E. Mayer et al.
A structural variable that has received more attention than most is the institutional or surgeon volume: the volume–outcome relationship. In this scenario, the volume of patients treated is used as a proxy for the quality of care, and then the correlation to important clinical outcome measures is determined. On the basis of a large number of studies that show better outcomes for patients treated at high-volume institutions and/or by high-volume surgeons, we are seeing a trend of preferential patient referral to high-volume institutions. Promoters of this centralisation of services in the U.S. argue that it is important to help advance the quality of health care, e.g. Leapfrog group [9]. Similarly in the U.K., centralisation of oncological services is identified in the Department of Health’s Improving Outcomes Guidance framework [10]. Institutional and surgeon volume either independently or in combination are, nevertheless, rather broad proxy measures for quality of care. Indeed, some low-volume providers have excellent outcomes and some high-volume providers poor outcomes. Research in this area has resultantly more recently started to better understand the core factors that determine whether or not the institution or surgeon produces better outcomes. Figure 13.3 illustrates potential structural variables for which volume acts as proxy measure and may therefore better inform us of quality of care. In order to determine which structural variables potentially have the most influence on the quality of care, we first need to determine if correlation exists between them and some dependent endpoint; research to date has used clinical outcome measures. Elixhauser et al. [11] demonstrated the importance of the ratio of doctors and nurses per bed number, irrespective of the institutional volume, on the mortality rates for paediatric heart surgery. A systematic review published by Pronovost et al. [12] also showed that high-intensity intensive care unit (ICU) physician staffing was associated with reduced hospital and ICU length of stay and mortality. Treggiari et al. [13] demonstrated in a multi-centre study that ICUs that were run by, or associated with, a specialised intensivist had significantly lower mortality rates in patients with acute lung injury (odds ratio = 0.68; 95% CI 0.52–0.89). This association was independent of severity of illness and consultation by a respiratory physician. As we better understand the core structural variables that correlate with markers of outcome, it will enable us to then assess the degree to which they also
13
How can we Assess Quality of Care in Surgery?
155
Staff:Patient ratio Senority/Junior ratios Workforce
Teaching/Specialist
Productivity
University Affiliated Community
Locum Agency Spend Institution Status Annual Net I&E surplus
Urban/Rural
Trust income per spell
Research & Development
Finance
Structural Variables
Monthly run rate variability
Elective Acute
Bed Occupancy Rates
ITU
Availability of Diagnostics
Activity
Main Theatres Theatre Utilisation Resources
Day Surgery Unit
On-site speciality skill-mix
Technology implementation
Fig. 13.3 Examples of potential structural variables which influence quality of care
influence the quality of care that our patients receive. It is not unrealistic to imagine that integration of the structural variables, demonstrated to improve quality of care, into institutions irrespective of their caseload volume could further our aim of achieving equality of outcomes for all.
13.3.2 Process Measures In surgery, process of care can be thought of as the preoperative, intra-operative and post-operative management of a patient. It looks at what is actually done for, and to, the patient. For example, it can look at the availability of screening programmes, the appropriate use of diagnostic tests, waiting times to operation, discharge processes, post-operative follow-up and availability of and willingness to give adjuvant treatments. However, the measured processes are only useful if there is evidence to prove that they translate into improved patient care. There is little point, for instance, in ensuring that all patients prior to laparoscopic cholecystectomy have an MRI, when no clinical benefit will
be gained. Malin et al. [14] assessed the quality of care for breast and colorectal cancer in the United States. They reviewed existing quality indicators, guidelines and review articles, and peer reviewed clinical trials to produce a list of explicit quality measures evaluating the management of newly diagnosed breast and colorectal cancer patients. These were then checked for validity by a panel of experts, and included areas such as diagnostic evaluation, surgery, adjuvant therapy, management of treatment toxicity and post-treatment surveillance. Data were extracted from the patients’ notes and via patient questionnaires. Overall adherence to 36 and 25 quality measures specific to the process of care was 86% (95% CI 86–86%) for breast cancer and 78% (95% CI 77–79%) for colorectal cancer patients, respectively. What was unique in this approach is that the group was not trying to correlate process with outcomes, but simply looking at process measures that were agreed, in this instance by evidence base and expert review, to reflect quality of care. There are a number of potential benefits to the measurement of process as opposed to outcomes in assessing the quality of care. Lilford et al. [15] describe these in detail, but in brief; process measures are less
156
susceptible, although not exempt, to case-mix bias; their assessment and subsequent improvement will positively reflect on the entire evaluated institutional patient population as opposed to a few outliers with poor outcomes; deficiencies in processes of care are less likely to be seen as a label of poor performance, but instead indicate when and how improvement can be made and process measures are reflective of the current state of care as opposed to the time delay which is experienced with some outcome measures. Process measurement is not easy. It is difficult to standardise measurements for process of care in surgery, as the process varies depending upon the surgical pathology. Creating a standard quality measure for all surgery may be impossible. It is more feasible to create measures of process of care for specific pathologies. Examples of where best practice has been defined include diseases with published national guidelines such as those produced by the National Institute of Clinical Excellence for cancer of the breast and lung. In the absence of agreed national guidelines, ongoing clinically-based research will help us to define evidence-based processes that improve quality of care. The appropriate use of surgical services is an important process measure in that it not only acts as a very good measure of the quality of care that a patient receives, but also has repercussion on the use of health care resources and the economics of health care. For a surgical intervention to be appropriate, it must “have a health benefit that exceeds its health risk by a sufficiently large margin to make the intervention worth doing”. The RAND Corporation has published extensively on the topics of the overuse and underuse of health care services. One of their largest studies examined the appropriateness of coronary angiographies, carotid endarterectomy and upper gastrointestinal endoscopy across multiple regions of the United States [16]. It found that 17, 32 and 17% of these procedures, respectively, were performed for inappropriate clinical reasons (i.e. overuse). Extrapolation from the literature may indicate that one quarter of hospital days, one quarter of surgical procedures and two fifths of medications are inappropriately overused [17]. Measuring the process of care can, however, be incredibly labour- and time-intensive and will require significant clinical knowledge. There will be a multitude of measurements, which can either be obtained prospectively, or gleaned retrospectively from patient notes. The introduction of electronic coding of patient
E. Mayer et al.
records may make this an easier task in the future. A recent Cochrane review did find that process measurements in the form of audit are effective in improving clinical practice [18]. However, the costs of measuring process will ultimately have to be weighed against the patient benefit that is gained from any actions taken as a result of those measurements. In summary, there is no doubt that for the majority of surgical conditions, measuring the process of care will provide us with substantive and up-to-date indicators of quality of care and will be directly influenced by the functionality of a health care provider.
13.3.3 Outcome Measures Traditionally, quality of care has been judged on outcome measures using surrogate endpoints such as mortality and morbidity. They are used because they are easy to measure and recorded with regularity. For an outcome measure to be a valid test of quality, it must be linked to and correlate with known processes that when changed will accordingly alter that outcome measure. For example, knowing the number of patients who present with metastases within six months of diagnosis with inoperable liver cancer is an important prognostic outcome. However, it is not a compelling measure of quality, as our ability to influence it is limited. There are many advantages of using outcomes as a measure of quality of care. Outcomes are wellestablished as an important feature of quality. They can be viewed as the overall effect of care on a patient. Few would doubt the validity of outcomes such as mortality in judging surgical care. Statistics such as mortality rates are understandable at face value, including to the lay person. Consequently there is a natural tendency to rank hospitals according to outcome measures such as mortality rates, with an implied association with quality of care. Examples of organisations that produce rankings according to outcome measures such as mortality rates include the Leapfrog group [9] and the U.S. News “Americas Best Hospitals” in the U.S. [19], as well as Dr Foster Intelligence which produces the “Good Hospital Guide” [20] and the Health care Commission in the U.K. [21]. The use, however, of solely outcome measures as an indicator of quality of care can be gravely misleading and inappropriate. Outcomes, as implied earlier,
13
How can we Assess Quality of Care in Surgery?
are the “end result” of an entire patient pathway and thus reliant on numerous other variables. The outcome measure itself may not, therefore, directly correlate with the quality of care. For example, if a patient who has undergone an operation to remove a colorectal cancer dies 1 year after surgery, can we say that he has had poor quality of care? He or she may have received all the best treatments available as guided by the latest evidence-based medicine and despite all of this, died. Should this patient’s death be assumed to indicate a poor quality of care from the surgical team and allied health care professionals? Sometimes the best quality of care in surgery may still result in mortality through circumstances beyond our control. Equally though, the patient may have received the best quality of care throughout their hospital admission, but poor followup surveillance and delays in adjuvant treatments may have impacted on the final outcome. Other factors such as the natural history of the disease, the patient’s age and co-morbidities often have much larger influences on outcome than surgical care. The method of risk adjustment attempts to compensate for the difference in case-mix between surgical centres. However, the effect of case-mix can never completely eradicated. Firstly, risk adjustment cannot allow for variables that are unmeasured, or not known. Neither can it adjust for the effects of varying definitions (such as the definitions of a surgical site infection) between centres. Risk adjustment can cause increased bias if the risk of that measured factor is not uniform across the compared populations. This is why, even after adjusting mortality rates for risk, the relationship between quality of care and outcomes such as mortality is inconsistent. Pitches et al. looked at the relationship between risk-adjusted hospital mortality rates and quality or processes of care [22]. They found that a positive correlation between better quality of care and risk-adjusted mortality was found in under half of the papers examined, while the others showed either no correlation or a paradoxical correlation. Similarly, Hofer and Hayward [23] found that the sensitivity for detecting poor quality hospitals based upon their risk-adjusted mortality rates was low at only 35%, while the positive predictive value was only 52%. This work has been corroborated with similar models by Zalkind and Eastaugh [24] and Thomas and Hofer [25]. Traditional outcome measures fail to appreciate important patient-specific measures such as quality of life. Recently, the National Institute of Clinical
157
Excellence has taken the quality-adjusted life year into account against a negative financial outlay when deciding to recommend the use of novel oncological medications such as herceptin [26]. There has also been increasing interest in measuring national patient reported outcome measures (PROMS) as a way of measuring health care performance [27]. In the U.K. pilot studies have been completed and the final report on PROMS is due shortly. Outcomes remain an important method of quality assessment, despite their significant limitations around risk adjustment. The use of outcome measures in isolation is clearly inappropriate, and in order to improve the use of outcomes as a measure of quality, a multidimensional approach including not only traditional measures such as mortality and morbidity, but also more patient reported outcomes such as quality of life, pain scores and so forth, should be encouraged. This will help us to include more patient-centred measures into a currently clinically dominated quality of care assessment.
13.3.4 Health care Economics High quality of care invariably requires significant resources. A limitation on available resources or financial constraint can be an inhibitor to producing the highest quality of care. With new technologies usually having initial premium costs and increasing levels of demand from a more educated and ageing population, health care costs will continue to increase in the future. In the U.S., health care expenditure is determined by private insurance companies, and in the U.K., policy has given control to regional strategic health authorities. This “local” budget control can and has lead to geographical health care inequality. In the U.K., this has been termed the “postcode lottery”: the treatments that you are eligible to receive can be dependent upon the area in which you live in accordance with the financial position of that area. Indeed the U.K.’s Department of Health has taken this one step further and begun to look at expenditure in a number of different areas of health care and correlated this to outcome data [28] (Fig. 13.4). In economic terms, productivity is defined as the amount of output created per unit of input. Until recently, NHS productivity was determined using cost
158
E. Mayer et al.
Area A has a relatively low spend
Area B has a relatively high spend
Area A has a relatively high mortality Area B has a relatively low mortality
Fig. 13.4 Programme budgeting – circulatory system programme budget per capita, expenditure (million pounds) per 100,000 unified weighted population, 2004–2005 vs. Mortality from all circulatory diseases, DSR, all ages, 2002–2004, persons.
Area A has a low spend per capita and a corresponding high mortality rate. The reverse is true for Area B (reproduced with permission from Department of Health/Crown copyright)
(input) and volume measures (output). Volumes of treatment measures such as GP appointments, ambulance journeys and operations, taken from the National Accounts [29], were taken as clear indicators of how much work the NHS does. This is obviously an oversimplified approach, which ignores important aspects such as quality of care. Due to increasing costs, productivity has been seen to fall in recent years between 0.6 and 1.3% per annum [29]. However, if NHS output is adjusted to account for increased quality of care as well as the increasing value of health, NHS productivity actually demonstrates an increase in productivity from 0.9 to 1.6% per annum [29, 30]. Thus, we can see how understanding quality of care can have economic benefits as well as increasing public satisfaction. Although new surgical technologies are usually associated with a higher cost, this cost can at times be counterbalanced by subsequent benefits. A good example of this is the advent of laparoscopic surgery. Given the higher price of laparoscopic equipment compared to standard equipment, along with the surgical learning curve and at times increased duration of procedures, you would be forgiven for thinking that laparoscopic procedures were invariably associated with a higher cost of treatment. However, as shown by Hayes et al. [31], although the initial cost of procedures such as laparoscopic colectomy can be higher, there are overall
improvements in cost effectiveness due to savings in reduced recovery days and quality-adjusted life years. Similarly, introducing a dedicated clinical pathway for procedures such as laparoscopic cholecystectomy can provide further cost advantages [32]. The decision-making of the individual surgeon is central to health care costs. Using Kissick’s decision making model, Fisher et al. have demonstrated that using faecal occult blood testing as a primary screening tool for colorectal cancer can give similar sensitivity to that of colonoscopy, while significantly improving access with huge cost savings [33]. These examples show that improving the quality of care that a patient receives by simply improving the efficiency of health care delivery or using evidencebased practice can result in additional economic benefits. Providing high quality care does not necessarily have to cost more.
13.4 Benchmarking Quality of Care 13.4.1 Current Initiatives The need for maintenance of high standards and the improvement of quality of care is well recognised. There are a number of existing programmes dedicated
13
How can we Assess Quality of Care in Surgery?
3.2
159 18
a
2.8 2.6 2.4
14
12
10
2.2 2.0
b
16 30-Day Mortality, %
30-Day Mortality, %
3.0
Phase 1
Phase 2 FY 1996 FY 1997 FY 1998 FY 1999 FY 2000
8
Phase 1 Phase 2 FY 1996 FY 1997 FY 1998 FY 1999 FY 2000
Fig. 13.5 The 30 day post-operative mortality (a) and morbidity (b) for all major operations performed in the Department of Veterans Affairs hospitals throughout the duration of the National Surgical Quality Improvement Program data collection process.
A 27% decrease in the mortality and a 45% decrease in the morbidity were observed in the face of no change in the patients’ risk profiles. FY indicates fiscal year (figure reproduced with permission from reference [36])
to the improvement of quality of care. The majority of these base their work on performance benchmarking. Performance benchmarking is a tool that allows organisations to evaluate their practice as compared to accepted best practice. If any deficiencies exist, adjustments can be made with the aim of improving the overall performance. This process must be continuous as health care is a continually evolving entity. Currently, health care institutions are either benchmarked against national targets or each other as a means of comparison. This approach identifies “good” and “bad” outliers and a cohort of “average” performers. It also serves to identify inequalities that exist, which can then be addressed. This method of benchmarking does help to maintain a nationwide drive to continuously improve services, although there are critics of any system that arbitrarily “rank” performance without due consideration for underlying causative factors. The Health care Commission is an independent body that promotes improvements in quality of care in both the NHS and independent health sectors in England and Wales. Its role is to assess and report upon the performance of health care organisations to ensure high standards of care. It evaluates performance against targets set by the Department of Health. The Health care Commission also looks at clinical and financial efficiency, giving annual performance ratings for each NHS Trust. The areas which are looked at are generalised, and include categories such as patient safety, clinical and cost effectiveness, governance and waiting times. The U.K. Quality Indicator Project (U.K. QIP) [34] is part of an international programme (International Quality Indicator Project) that was started in the USA
in 1985. U.K. QIP is a voluntary exercise based upon the anonymous feedback of comparative data to encourage internal improvement within health care organisations. There is no system for publication of results or external judgement of data. By using performance indicators, the aim of the project is not to directly measure quality, but to identify areas that require further attention and investigation. Examples of surgical performance indicators include rates of hospital-acquired infections, surgical site infections, in-patient mortality and readmission rates. A similar project exists in the USA alone called the National Surgical Quality Improvement Programme [35]. This nationwide programme was started by the Department of Veterans Affairs (VA) to monitor and improve the standards of surgical care across all VA hospitals, and has been slowly introduced into the private sector since 1999 (Fig. 13.5). Performance benchmarking is a useful exercise for ensuring that the minimum standard of care that can be expected is attained, but is far too vague and imprecise to inform us, if we are delivering high quality care. This is the same problem that any generalisable quality assessment tool will experience, as it also will be unable to appreciate the intricacies of disease-specific high quality health care.
13.4.2 Pay for Performance Strategies Pay-for-performance (P4P) programmes use financial reimbursements for clinical providers as a “reward”
160
for a positive change in performance measures. It is thought that this will help drive further improvements in quality of care. These programmes have gained popularity in recent years with new initiatives in the USA [37, 38], UK [39], Australia [40] and Canada [41], being based in both hospital and primary care. P4P programmes typically focus upon process measures as these can detect suboptimal care in a timely manner, while being directly under control of the clinician. There are a multitude of variations of P4P programmes, with incentives being paid either to individual clinicians, clinician groups, clinics, hospitals or multi-hospital collaborations. Similarly, the amount of incentive required per measure can vary from $2 to $10,000, with incentives received either for reaching absolute thresholds of care, relative thresholds (such as a 30% increase in performance) or even a pay-per-case arrangement. Although studies have shown that P4P programmes can have positive effects on quality measures, these gains may be only modest [37, 42]. Cost-effectiveness is also unclear with some studies showing massive savings [43], and others showing gross overspending [44]. Perhaps the most worrying aspect of P4P programmes is the unintended adverse consequences that can result. Examples of these include “gaming” strategies where clinicians avoid sick or challenging patients, or reclassify patient conditions or even claiming their incentive when care has not been provided. Similarly, patients may receive substantial “over-treatment” of their medical conditions. In fact, the NHS P4P programme in the U.K. found that the strongest predictor of improvement in achievement was the exclusion of patients from the programme [39]. On the other hand, clinicians and hospitals serving the more disadvantaged populations may see their income fall as targets and thresholds are difficult to reach. Although P4P programmes have been shown to improve performance in key clinical areas, they can potentially have multiple problems if not subject to careful design and regular evaluation. In order to be successful, these programmes must be implemented with the involvement of clinicians from the very start to prevent unintended harm coming to the patient.
13.4.3 Future Direction The greatest advances in the area of quality assessment will be in the realm of measurement of process and
E. Mayer et al.
overall performance. Inclusion of structural variables will also need to be considered. This should not be confused with set performance targets, or blindly following clinical guidelines. These newly developed assessment scores should allow us to implement changes that will not only improve that score, but more importantly, improve the quality of care delivered to our patients. For example, a performance measure that tells us a certain proportion of the population underwent a particular desired process is not enough. It does not give us any indication on how to improve quality. It is important to know why certain members of the population failed to achieve this goal. Was it through contra-indications to that process, lack of communication, lack of compliance or something else? This information can give further understanding of what changes are needed to improve the service that is provided. In short, knowing that improvement is needed is important, but more helpful is the knowledge of how to improve. Engagement in this process by institutions and clinicians is crucial. They have, to date, been reluctant as the public reporting of performance data has had a “name-and-shame” style by inappropriately ranking them against each other without duly considering institutional variations that cannot be adjusted for, but which result in varying explicable performance. Better methods of visually presenting performance data that avoid arbitrarily ranking health care providers, that are interpretable at face value to the lay person and which still continue to identify trusts which need special attention will help to engage all stakeholders in future performance benchmarking.
13.5 Public Health Implications The assessment of quality of care is a public health issue that is becoming a dominant theme in structuring modern health care. The rigorous and accurate measurement of quality is an essential component for the improvement of public health services and answering public accountability. The methods by which quality is assessed have the potential to dictate health care policy well into the future, and as practicing surgeons, we must all be well educated on this topic. Some examples of the current assessment of quality of care can be gathered from the internet sources in the table below (Table 13.1). The field of cardiothoracic surgery has long been aware of the push towards quality improvement and
13
How can we Assess Quality of Care in Surgery?
Table 13.1 Internet resources for quality of health care Organisation
URL
The Leapfrog Group
http://www.leapfroggroup.org/
The National Surgical Quality Improvement Program
https://acsnsqip.org/login/ default.aspx
The International Quality Indicator Project
http://www.internationalqip. com/
The Healthcare Commission
http://2007ratings.healthcarecommission.org.uk/homepage. cfm
The Institute for Health care http://www.ihi.org/ihi Improvement Agency for Health care Research and Quality
http://www.ahrq.gov/qual/ measurix.htm
The National Committee for Quality Assurance
http://web.ncqa.org/
The Institute of Medicine’s Health Care Quality Initiative
http://www.iom.edu/ CMS/8089.aspx
The National Association for Health care Quality
http://www.nahq.org/
Quest for Quality and Improved Performance
www.health.org.uk/qquip
public accountability. In the1980s, the Society of Thoracic Surgeons (STS) initiated one of the largest data collection operations in medicine, resulting in the STS National Adult Cardiac Surgery Database. It is now the largest and most comprehensive single speciality database in the world. It allows not only surgeons and trusts to compare their results, but is also freely publicly available and patients can identify their own surgeon’s outcomes. With increasing emphasis placed upon quality and performance measurement, the STS set up the Quality Measurement Task Force to create a comprehensive quality measurement programme for cardiothoracic surgery. The results of this programme have recently been published [45, 46] and (at time of press) may represent the most up-to-date and rigorous methods by which quality assessment can be performed. Undoubtedly, with further investigation and reporting of the factors driving quality of care, inequality of health care provision will be uncovered. No one doubts that all patients should have equity of quality of care and it can result in more lives saved. The Leapfrog group in the U.S. now recommends that there is a
161
certified critical care specialist available for their ICUs, and estimated that this restructuring could save more than 54,000 lives in the U.S. per year [47]. But can the current health care infrastructure manage geographical fluxes in demand that may result from patients mobilising their freedom of choice and seeking out “better care”? Often institutions that are currently able to provide higher quality health care can do so only under the restraints of their current patient population demand. Any reasonable increase in this demand can have a negative impact and subsequently lead to a worsening of the quality of their health care provision.
13.6 How to Design Health Care Quality Reforms Objectives to improve quality of care are well recognised, but the implementation of systems in order that these objectives are met is far from straight forward. Translating research evidence into the everyday structure and processes of a health care system is feasible, but made difficult by the variation that exists across health care systems and between health care providers. Leatherman and Sutherland [48] describe a conceptual framework to facilitate the designing of health system reforms that consists of three aspects: • A taxonomy to organize the available evidence of potential quality enhancing interventions (known as the QEI project) • A multi-tiered approach to select and implement interventions in a health care system at four levels: national, regional, institutional and the patient–clinician encounter • A model to guide the adoption of a balanced portfolio approach to quality improvement – recognizing the prudence of simultaneously employing professional, governmental and market levers for change. The QEI project encompasses several aspects of quality improvement, such as effectiveness, equity, patient responsiveness and safety. It itself forms part of a wider initiative called the Quest for Quality and Improved Performance, a 5-year international collaborative research project between the University of North Carolina, School of Public Health, London School of Economics, University of York and University of
162
Cambridge. The limitations of using evidence-base to bring about health reform are recognised, such as publication bias and difficulties in translating evidence from one health care system into another, but early results of the QEI project are generating some good examples of focused quality interventions. The integration of evidence-based interventions needs to occur across all levels of a health care system in order that predictable systemic improvement in quality arises. This “multi-tiered approach to building predictable systemic capacity for improvement” describes three key factors: “horizontal coherence”, the interaction of several different types of quality interventions; “vertical coherence”, the interaction of a quality improvement intervention across the multiple levels of the health care system and “coherence in accountability”, the balance between professionalism and professional accountability, centralised governmental control and market forces. Coherence in accountability forms the components for a “balanced portfolio approach to quality improvement” and recognises that individually professionalism, government or market factors cannot generate sustainable quality change.
13.7 How to Achieve Health Care Quality Improvement As highlighted by Glickman et al. [49], there has been proportionally more attention directed towards the “process” and “outcome” components of Donabedian’s structure, process, outcome framework for quality. In today’s modern health care system, “structure” consists of important organisational and managerial components that are the enablers for driving forward multi-dimensional quality improvement agenda. Glickman et al. describe these organisational characteristics from a management perspective executive management, including senior leadership and board responsibilities, culture, organisational design, incentive structures and information management and technology. The distinctive aspect to this work is the combination of business and medical viewpoints to provide a contemporary operational definition of structure that updates Donabedian’s “physical characteristics” description. This framework engages managerial capabilities crucial to achieving health care quality improvement.
E. Mayer et al.
13.8 Conclusions Assessing quality of care in surgery is an important and essential part of maintaining and improving patient care. The very act of measurement serves to determine current standards and provides a baseline against which improvement can be made locally and/or nationwide. Benchmarking between providers will assist in identifying inequalities at a provider or regional level. Further investigation of the causative factors will discover pockets of best practice and local innovation that can then be disseminated more widely. Although traditional assessments of quality have been heavily influenced by a number of clinical outcome measures such as mortality and morbidity, we have shown that these are clearly inadequate in isolation and do not provide a reliable assessment of the quality of surgical service. As described by Donabedian some 40 years ago, quality of care can be explained by three key elements, structure, process and outcome. Treatment outcome measures will still form an important part of quality assessment as they are easily understandable to the clinician and patient alike, and outcomes such as postoperative mortality remain an important endpoint. We will see expansion of the use of patient reported outcomes such as quality of life and current health status, in order to achieve a well-rounded viewpoint on quality care. Measurement of structural and process of care variables must be used in combination with outcome measurements and has the significant advantage that they are less influenced by factors such as case-mix and patient co-morbidities. This will help to overcome the methodological difficulties of producing suitable adjusted data. These structural variables and process measures are not, however, currently widely or routinely collected, and in order for this to change, will require a labour-intensive undertaking and undoubtedly require additional resources. Health care economics is a further key element in assessing quality of care and has significant impact upon modern surgical care which is heavily influenced by continually evolving technological and biosurgical innovation. A lack of available finances will always act as an inhibitor to delivering the highest quality of care. The combination of surgical innovation, making more people eligible for treatment, with an ageing and increasingly demanding public means that financial constraints will remain a considerable factor for the foreseeable future.
13
How can we Assess Quality of Care in Surgery?
A quality of care assessment tool should be multifactorial, taking into account the entire patient treatment episode. It should include up-to-date process measurements gleaned from evidence-based medicine and national guidelines. It should consider patient-centred and well as disease-specific clinical outcome measures and incorporate structural variables indicating effective and efficient health care delivery. In this way, we can be confident that we can obtain the most accurate and valid assessment of quality of care in Surgery.
References 1. Chassin MR, Galvin RW (1998) The urgent need to improve health care quality. Institute of Medicine National Roundtable on Health Care Quality. JAMA 280:1000–1005 2. Anonymous (1986) Quality of care. Council on Medical Service. JAMA 256:1032–1034 3. BUPA Hospitals. Available at http://www.bupa-hospitals. co.uk/asp/patientcare/quality.asp#2. Accessed July 2007 4. Donabedian A (1966) Evaluating the Quality of Medical Care. The Milbank Memorial Fund Quarterly 44:166–203 5. Donabedian A (1990) The seven pillars of quality. Arch Pathol Lab Med 114:1115–1118 6. Schiff GD, Rucker TD (2001) Beyond structure-processoutcome: Donabedian’s seven pillars and eleven buttresses of quality. Jt Comm J Qual Improv 27:169–174 7. Brook RH, Park RE, Chassin MR et al (1990) Predicting the appropriate use of carotid endarterectomy, upper gastrointestinal endoscopy, and coronary angiography. N Engl J Med 323:1173–1177 8. Brook RH, Park RE, Chassin MR et al (1990) Carotid endarterectomy for elderly patients: predicting complications. Ann Intern Med 113:747–753 9. The Leapfrog Group. Available at http://www.leapfroggroup.org. Accessed June 2007 10. Khuri SF, Daley J, Henderson W et al (1999) Relation of surgical volume to outcome in eight common operations: results from the VA National Surgical Quality Improvement Program. Ann Surg 230:414–429; discussion 429–432 11. Elixhauser A, Steiner C, Fraser I (2003) Volume thresholds and hospital characteristics in the United States. Health Aff (Millwood) 22:167–177 12. Pronovost PJ, Angus DC, Dorman T et al (2002) Physician staffing patterns and clinical outcomes in critically ill patients: a systematic review. JAMA 288:2151–2162 13. Treggiari MM, Martin DP, Yanez ND et al (2007) Effect of intensive care unit organizational model and structure on outcomes in patients with acute lung injury. Am J Respir Crit Care Med 176:685–690 14. Malin JL, Schneider EC, Epstein AM et al (2006) Results of the National Initiative for Cancer Care Quality: how can we improve the quality of cancer care in the United States? J Clin Oncol 24:626–634 15. Lilford RJ, Brown CA, Nicholl J (2007) Use of process measures to monitor the quality of clinical practice. BMJ 335: 648–650
163 16. Chassin MR, Kosecoff J, Park RE et al (1987) Does inappropriate use explain geographic variations in the use of health care services? A study of three procedures. JAMA 258: 2533–2537 17. Brook RH (1989) Practice guidelines and practicing medicine. Are they compatible? JAMA 262:3027–3030 18. Jamtvedt G, Young JM, Kristoffersen DT et al (2006) Audit and feedback: effects on professional practice and health care outcomes. Cochrane Database Syst Rev CD000259 19. Al-Ruzzeh S, Athanasiou T, Mangoush O et al (2005) Predictors of poor mid-term health related quality of life after primary isolated coronary artery bypass grafting surgery. Heart 91:1557–1562 20. Dr Foster Good hospital Guide. Available at http://www. drfoster.co.uk/ghg. Accessed May 2007 21. The Healthcare Commission. Available at http://www.healthcarecommission.org.uk/homepage.cfm. Accessed May 2007 22. Pitches DW, Mohammed MA, Lilford RJ (2007) What is the empirical evidence that hospitals with higher-risk adjusted mortality rates provide poorer quality care? A systematic review of the literature. BMC Health Serv Res 7:91 23. Hofer TP, Hayward RA (1996) Identifying poor-quality hospitals. Can hospital mortality rates detect quality problems for medical diagnoses? Med Care 34:737–753 24. Zalkind DL, Eastaugh SR (1997) Mortality rates as an indicator of hospital quality. Hosp Health Serv Adm 42:3–15 25. Thomas JW, Hofer TP (1999) Accuracy of risk-adjusted mortality rate as a measure of hospital quality of care. Med Care 37:83–92 26. NICE (2006) Update on Herceptin appraisal. National Institute for Health and Clinical Excellence, London 27. DH (2005) Healthcare output and productivity: accounting for quality change. Department of Health, London 28. National Programme Budget project. Available at http://www. dh.gov.uk/en/Managingyourorganisation/Financeandplanning/ Programmebudgeting/index.htm. Accessed September 2007 29. Office for National Statistics (2006) Public service productivity: health. Econ Trends 628:26–57 30. Atkinson T (2005) Atkinson review of government output and productivity for the national accounts: final report. HMSO, London 31. Hayes JL, Hansen P (2007) Is laparoscopic colectomy for cancer cost-effective relative to open colectomy? ANZ J Surg 77:782–786 32. Topal B, Peeters G, Verbert A et al (2007) Outpatient laparoscopic cholecystectomy: clinical pathway implementation is efficient and cost effective and increases hospital bed capacity. Surg Endosc 21:1142–1146 33. Fisher JA, Fikry C, Troxel AB (2006) Cutting cost and increasing access to colorectal cancer screening: another approach to following the guidelines. Cancer Epidemiol Biomarkers Prev 15:108–113 34. Thomson R, Taber S, Lally J et al (2004) UK Quality Indicator Project (UK QIP) and the UK independent health care sector: a new development. Int J Qual Health Care 16(Suppl 1):i51–i56 35. National Surgical Quality Improvement Program. Available at https://acsnsqip.org/main/about_history.asp. Accessed July 2007 36. Khuri SF, Daley J, Henderson WG (2002) The comparative assessment and improvement of quality of surgical care in the Department of Veterans Affairs. Arch Surg 137:20–27
164 37. Lindenauer PK, Remus D, Roman S et al (2007) Public reporting and pay for performance in hospital quality improvement. N Engl J Med 356:486–496 38. Park SM, Park MH, Won JH et al (2006) EuroQol and survival prediction in terminal cancer patients: a multicenter prospective study in hospice-palliative care units. Support Care Cancer 14:329–333 39. Doran T, Fullwood C, Gravelle H et al (2006) Pay-forperformance programs in family practices in the United Kingdom. N Engl J Med 355:375–384 40. Fourth National Vascular Dataset Report. The Vascular Society of Great Britain and Ireland. Available at http://www.vascularsociety.org.uk/committees/audit.asp. Accessed May 2007 41. Pink GH, Brown AD, Studer ML et al (2006) Pay-forperformance in publicly financed healthcare: some international experience and considerations for Canada. Healthc Pap 6:8–26 42. Petersen LA, Woodard LD, Urech T et al (2006) Does payfor-performance improve the quality of health care? Ann Intern Med 145:265–272 43. Curtin K, Beckman H, Pankow G et al (2006) Return on investment in pay for performance: a diabetes case study. J Healthc Manag 51:365–374; discussion 375–6
E. Mayer et al. 44. Roland M (2006) Pay-for-performance: too much of a good thing? A conversation with Martin Roland. Interview by Robert Galvin. Health Aff (Millwood) 25: w412–w419 45. O’Brien SM, Shahian DM, DeLong ER et al (2007) Quality measurement in adult cardiac surgery: part 2–Statistical considerations in composite measure scoring and provider rating. Ann Thorac Surg 83:S13–S26 46. Shahian DM, Edwards FH, Ferraris VA et al (2007) Quality measurement in adult cardiac surgery: part 1–Conceptual framework and measure selection. Ann Thorac Surg 83: S3–12 47. Birkmeyer JD, Birkmeyer CM, Wennberg DE et al (2000) Leapfrog patient safety standards: the potential benefits of universal adoption. Leapfrog Group: Washington 48. Leatherman S, Sutherland K (2007) Designing national quality reforms: a framework for action. Int J Qual Health Care 19(6):334–340 49. Glickman SW, Baggett KA, Krubert CG et al (2007) Promoting quality: the health-care organization from a management perspective. Int J Qual Health Care
14
Patient Satisfaction in Surgery Andre Chow, Erik Mayer, Lord Ara Darzi, and Thanos Athanasiou
Abbreviations
Contents 14.1
Abbreviations .......................................................... 165
14.1
Introduction ............................................................ 166
14.2
The Patient’s Perspective of Health Care............. 167
14.3
Patient Satisfaction................................................. 167
14.3.1 The Meaning of Satisfaction .................................... 14.3.2 Determinants of Satisfaction: Patient Expectations ................................................. 14.3.3 Determinants of Satisfaction: Patient Characteristics .............................................. 14.3.4 Determinants of Satisfaction: Psychosocial Factors ................................................ 14.3.5 Components of Satisfaction ..................................... 14.3.6 Patient Dissatisfaction .............................................. 14.3.7 The Importance of Measuring Patient Satisfaction ...................................................
PRO U.K. U.S.
Patient reported outcomes United Kingdom United States
167 167 167 168 168 169 170
14.4
Measurement of Satisfaction ................................. 170
14.4.1 14.4.2 14.4.3 14.4.4
How can we Measure Satisfaction? ......................... The “Overall Satisfaction” Score ............................. Satisfaction Survey Design ...................................... Guidance for Satisfaction Measurement ..................
14.5
Conclusions ............................................................. 172
170 170 171 172
References ........................................................................... 172
Abstract Patient satisfaction is one of the most important patient reported outcomes and can be thought of as an ultimate endpoint to the assessment of health care quality. Although patient satisfaction has been studied for many years, a lack of understanding and absence of a precise definition of satisfaction have been flaws in the majority of research to date. Persistently high patient satisfaction ratings over many years may in fact reflect poorly constructed measurement tools, as opposed to high quality care. This chapter explores the meaning of patient satisfaction, including the analysis of satisfaction determinants and satisfaction components. The importance of satisfaction measurement is also discussed, and guidance on creating satisfaction measurement tools proposed.
14.1 Introduction
A. Chow () Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, QEQM Building, St Mary’s Hospital Campus, Praed Street, London, W2 1NY, UK e-mail: [email protected]
In the past, the goals of medicine were to reduce the morbidity and mortality of diseases that affected patients. While these are still valid and noble goals, the aims of modern health care have now evolved beyond this. As well as improving morbidity and mortality statistics, we now aim to improve aspects such as functional and cognitive status, quality of life (QOL) and productivity [7]. This enables us to ensure the highest quality of care.
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_14, © Springer-Verlag Berlin Heidelberg 2010
165
166
Measuring the quality of health care is an essential part of modern medicine. As mentioned in other chapters, being able to define and measure quality of care has clinical benefits in terms of ensuring high quality care and driving continuing improvement, as well as economic benefits. Outcome measures such as morbidity and mortality have traditionally been used as surrogate markers of quality care. However, these traditional outcome measures give us only a one-sided view and are inadequate to evaluate the multi-dimensional goals of modern health care. The future assessment of the quality of care should also focus on more patient centred measures, including the measurement of patient satisfaction.
14.2 The Patient’s Perspective of Health Care The measurement of health care quality can be taken from two perspectives, those of the health care provider and those of the patient. As doctors, there is a tendency to view outcomes from a health care provider’s point of view. However, it must be remembered that the patient is the most important individual in the health care system, and thus, the patient should be central to all that we do as a health care professional. The patient’s viewpoints on treatment outcomes may be completely different from the viewpoints of a health care professional. For example, the impact of treatment side effects such as impotence following radical prostatectomy or the need for a stoma following colonic resection may be lessened by a health care provider, who may be more interested in outcomes such as blood loss, postoperative infection and 30-day mortality rates. However, from the patients’ perspective, long-term sequelae such as impotence and the need for a stoma would be at the forefront of their minds. The use of patient reported outcomes (PROs) can greatly enhance the assessment of quality of care. PROs provide us with the patients’ view on their health condition and their treatment. It shows us the patients’ perspective and assessment of their symptoms, functional status, productivity and satisfaction. In an ideal system, this viewpoint should be integral to our decision-making process, and our assessment of the care that we provide. This is especially important for fields such as oncology where several different treatment options may exist, and where survival gains can be small with significant treatment side effects [12].
A. Chow et al.
The patient’s perspective is a multi-dimensional one, but can be broadly split into three components, namely QOL, current health state and patient satisfaction with care. These three components are the most important PROs that can be used to measure patient orientated quality of care. The current health state of a patient can be thought of as the symptoms and the overall well-being of a patient as a result of a disease or treatments. Knowing and understanding the patients’ health state allows us to see the position that the patient is coming from. The health state of patients can directly affect the QOL of patients as well as their satisfaction with health care. QOL has been extensively researched and has evolved significantly over the past 30 years [25]. As health care professionals, we tend to concentrate most on the healthrelated QOL, which is a multi-dimensional assessment of the physical, psychological and social aspects of life that can be affected by a disease process and its treatment [6]. Physical function is the ability to perform a range of activities of daily living as well as including physical symptoms resulting from a disease or its treatments. Psychological function encompasses the emotional aspect caused by a disease or its treatments, and may vary from severe distress to a sense of well-being. This may also include cognitive function. Social function refers to aspects of social relationships and integration. In addition, there may be supplementary issues which are specific to a particular disease. The QOL of patients can directly influence their overall satisfaction with care. Patient satisfaction can provide an ultimate endpoint to the assessment of health care quality. It is jointly affected by current health state as well as QOL and helps to give us a balance against a provider-biased perspective. It is thus an essential part of quality assessment [25] (Fig. 14.1).
Quality of Life
Patient Satisfaction
Health Care Provision
Health State
Fig. 14.1 Patient reported outcomes of health care provision
14
Patient Satisfaction in Surgery
14.3 Patient Satisfaction 14.3.1 The Meaning of Satisfaction Patient satisfaction is a multi-faceted construct. Although the concept of patient satisfaction and its measurement has been studied by health care professionals for years, there has been a distinct lack of attention to the meaning of patient satisfaction. This has been thought of as the greatest flaw in patient satisfaction research [34]. As described by Ware et al [32], patient satisfaction can be split into two distinct areas. First, there are satisfaction determinants, otherwise thought of as a reflection of patient variables that can affect satisfaction such as patient expectations, patient characteristics and psychosocial determinants. Second, there are satisfaction components which refer to a measure of the care actually received.
14.3.2 Determinants of Satisfaction: Patient Expectations Patient satisfaction is now a recognised part of quality assessment. It is easy to imagine that high levels of reported satisfaction will correlate with high quality care. The underlying truth is, however, much more complex. The use of satisfaction as a measure of quality should not be taken at face value. It should always be interpreted with the understanding of the rationale that underlies those expressions of satisfaction [21]. A patients’ expectation of health care can greatly colour their perception, and thus, satisfaction with care. Different patients can hold differing expectations for different aspects of care which have been shown to predict overall patient satisfaction [1]. For example, take the young gentleman who visits his family doctor complaining of a sore throat. The doctor correctly diagnoses a viral illness and informs the patient that there is no requirement for antibiotics and sends the patient home. The satisfaction that the patient gets from this encounter may depend significantly on his preformed expectations. He may have expected to get antibiotics to treat his symptoms because this is how he has been treated in the past. In this case the reality of the situation would not have met his expectations. Unless the doctor does a very good job in explaining to the patient that antibiotics were not
167
necessary, the overall satisfaction from the encounter may poor. Alternatively, the patient may not have had any expectations of treatment. This scenario may lead to the patient being entirely satisfied by the consultation. The same clinical encounter can therefore lead to the patient having completely different levels of satisfaction depending upon his expectations. This idea that patient satisfaction was related to a patients’ perception of care and how that met their expectations was first explored by Stimson and Webb in 1975 [27]. They divided expectation into three categories, background, interaction and action. Background expectations are explicit expectations formed from accumulated knowledge of the doctor–patient interaction and consultation–treatment process. Background expectations will vary according to the illness and individual circumstances, but there are certain routines and practices that are expected and variance from these often leads to dissatisfaction. Interaction expectations refer to patients’ expectations on how they will interact with their doctor, e.g. the doctors ‘bedside manner’, the form of questioning and examination. Action expectations refer to a patient’s expectation on what action the doctor will take, e.g. the prescription of medications, referral to a specialist or advice. The concept of expectations indicates that satisfaction is associated with the fulfilment of positive expectations. However, expectations will change with time and accumulating knowledge. It has been noticed that increasing quality of care synchronously increases levels of expectations. As a result, it is possible that increasing quality of care may lead to a paradoxical lowering of satisfaction.
14.3.3 Determinants of Satisfaction: Patient Characteristics If patient satisfaction is a subjective measure, it is only logical that satisfaction may depend upon patient characteristics such as age, gender, socio-economic class, education, religion and so forth. Age has repeatedly been shown to be the most constant socio-demographic determinant of patient satisfaction. Numerous studies have demonstrated that older people tend to be more satisfied with health care than the younger generation [3]. The elderly population tends to demand less information from their doctors, are more satisfied with primary and hospital care, and
168
are more likely to comply with medical advice [18]. Educational status is also often thought to influence satisfaction with care. There has been much data from the United States (U.S.) showing that a higher level of education correlates with a lesser degree of satisfaction with care [13]. However, this has not been supported by data from the United Kingdom (U.K.). It is possible that there are other influences such as income that have confounded the U.S. evidence. The relationship between satisfaction and social class is unclear. Although a metaanalysis by Hall and Dornan [13] did demonstrate that greater satisfaction was associated with higher social class, they found that social class was not assessed by many studies. The contradictory results for social class and education caused some uncertainty. The role of gender in patient satisfaction has also displayed conflicting data. In general, it has been found that patient’s gender does not have a bearing on satisfaction ratings [13, 16], although there have been studies showing a reduced satisfaction rating from female patients [18]. Ethnicity, by its diverse nature, may also have a complex influence on satisfaction scores. Studies from the U.S. show that in general, the Caucasian population tends to be more satisfied with care than the nonCaucasian population [23]. However, these data may be confounded by socio-economic status [8]. In the U.K., the majority of work has focused upon the Asian population. Jones and Maclean found that major problems were encountered with language difficulties, as well as perceived attitudes towards Asian patients and hospital catering [17]. There was also particular distress caused by male physicians examining Asian female patients [22]. The relationship between socio-demographic variables and patient satisfaction is obviously not straightforward. Although many separate studies have shown how individual socio-demographic variables can affect satisfaction, these effects may only be a minor predictor of satisfaction overall [13].
14.3.4 Determinants of Satisfaction: Psychosocial Factors A number of psychosocial factors may affect the way a patient expresses satisfaction [20]. In general, these tend to produce an overestimation of satisfaction ratings. The cognitive consistency theory implies that a patient
A. Chow et al.
will respond positively to a satisfaction questionnaire in order to justify his or her own time and effort spent obtaining treatment. The Hawthorne effect describes how the very act of surveying for patient satisfaction increases the apparent concern of the health care programme, thereby improving satisfaction responses [26]. Indifference also plays a part; patients may feel that problems will not be resolved either because they are too large or too trivial, thereby making accurate reporting of their satisfaction levels irrelevant. Social desirability response bias causes patients to give positive responses to satisfaction surveys as they feel that these are more acceptable to satisfaction researchers [26]. Ingratiating response bias occurs when patients use positive responses to try and ingratiate themselves to health care staff, especially when anonymity is suspect. Self-interest bias implies that patients respond positively as they feel this will allow the health care programme to continue running which is in their best interest. There has also been concern that patients may be reluctant to complain in fear of prejudice from their health care workers in the future [24]. Gratitude is also a common confounding factor. In the U.K., the effect of gratitude has been more associated with the elderly population [26].
14.3.5 Components of Satisfaction The components of satisfaction refer to the patients perceptions of the actual care that they received. These can be either specific areas of care such as waiting times, communication, access to care, etc., or can be more generalised, assessing overall quality of care. The patients’ response to these components are (as explained earlier) influenced by their satisfaction determinants. There have been numerous attempts to try and classify the components of patient satisfaction. A commonly quoted classification was developed by Ware et al. in 1983 [32] and later adapted by Fitzpatrick [10] to suit the U.K. setting. This classification involved seven items, reflecting the most common factors included in satisfaction surveys: • • • •
Interpersonal manner Technical quality of care Accessibility and convenience Efficacy and outcomes of care
14
Patient Satisfaction in Surgery
• Continuity of care • Physical environment of care • Availability of care. Interpersonal manner is often thought of as the principal component of patient satisfaction and consists of two predominant elements, communication and empathy [26]. Thus, successful interactions depend upon the social skills of the health care workers. Positive satisfaction ratings are known to be associated with non-verbal communication such as body positioning, head nodding and eye contact [19]. These aspects of care are often dismissed as unimportant by the medical profession [29]. Tishelman in 1994 [30] discovered that almost every encounter described by the patient as “exceptionally good” focused upon aspects of interpersonal interaction such as kindness and empathy as opposed to technical competence. Technical quality of care is naturally important to the patient. But how can a lay person judge the technical skill of a doctor, nurse or other health care worker? In fact, patients seem to be more comfortable commenting upon personal qualities of doctors and nurses rather than commenting upon their technical skill [9]. It has been demonstrated also that patients’ perceptions about their doctors’ skill and abilities is mostly determined by personal qualities such as friendliness [2]. There is also a danger that patients view the process of technical interventions as evidence of quality, where higher levels of technical intervention correspond with higher satisfaction ratings [15]. One possible reason why patients do not seem to emphasise technical competence in comparison to interpersonal manner is that they assume a basic level of technical competence [26]. This may explain why other aspects, such as interpersonal manner, come to the forefront. Accessibility issues include aspects such as physical access to hospitals and clinics, appointment systems, waiting lists and home visits. Long waiting times (which is especially prevalent in the U.K. [31]), parking problems and even public transport have all led to reduced satisfaction ratings [1]. The components of satisfaction can be quite succinctly summarised in the ‘three A’s’ of medical consultation, accessibility, affability and ability (Fig. 14.2). As doctors, we tend to concentrate mostly on the last ‘A’: ability. We feel that improving our knowledge and skill, and thus, our ability to treat patients makes us the best doctor we can be. However, from patients’ perspective,
169
Accessibility
Affability
Medical Consultation
Ability
Fig. 14.2 The three A’s of medical consultation. Patients tend to concentrate on accessibility and affability, while medical professionals concentrate on ability
accessibility to health care services and affability of health care workers predominate their thoughts. A doctor’s ability may not even be considered by the patient, if the doctor is neither accessible nor affable.
14.3.6 Patient Dissatisfaction There is a stark lack of variability in the majority of satisfaction surveys. It is only a small minority of patients who express dissatisfaction or criticism of their care [1]. In the U.K., stable overall satisfaction rates greater than 90% have been demonstrated for many years in primary care [18], and of 80% or more in hospital settings [35]. These good results may seem ideal for the health care system, but the lack of variability can cause problems for health care researchers who find it difficult to compare positive with more positive results. If we are to use patient satisfaction as an indicator of high quality care, the assessment of satisfaction must be sensitive enough to detect changes in health care quality. With current satisfaction surveys giving such little variability in results, surely the way in which we are assessing satisfaction must be questioned. Not only are our questionnaires obviously not sensitive enough to detect change, but they also do not help us implement change in order to improve our services. If the focus is taken away from overall satisfaction, and directed towards specific aspects of care, more variability can be found. Questions of a more detailed nature elicit greater levels of dissatisfaction than generalised questions [35]. Similarly, different questioning procedures and types of scale used can also affect the degree of dissatisfaction
170
expressed by patients [33]. It has also been shown that the volume of comment can be a more sensitive indicator of satisfaction than overall ratings [5]. An alternative and commonly used satisfaction model is the “discrepancy model” [4, 26]. This model states that the lack of variability seen in satisfaction research should steer us away from aiming for consistency of satisfaction. Instead, we should be concentrating on dissatisfaction and where there is discrepancy in results. In other words, we need to know what is wrong, not what is right. Understanding the situations that lead to discrepant findings should be more important than attaining high satisfaction results.
14.3.7 The Importance of Measuring Patient Satisfaction Understanding the concept of patient satisfaction is important on many levels. The measurement of satisfaction allows us to change our practice to improve the quality of care that we can provide to our patients. This will not only directly improve health outcomes, but will have many other additional beneficial effects. At a national level, attention to patient satisfaction and a drive to increase satisfaction with services will help to improve the public’s perception of the health care system. Along with a greater pride, there may be a greater willingness to invest in the health care system. Increasing satisfaction with health care will also lead to increasing trust in those who create health care policy. On a more individual level, an improved satisfaction with care could result in greater compliance with care. Patients may be more likely to follow their doctors’ instructions regarding lifestyle modifications, as well as medications, if they were more satisfied with the care that they received. This would lead to beneficial changes in the health of the population. Similarly there may be less misuse of health care services. Together with a boost in morale and the provision of positive feedback to health care workers, efficiency and productivity of the health care system as a whole could improve. Thus, the incorporation of patient satisfaction as a part of health care quality may have benefits not just at the level of a single patient’s perceptions, but can also affect the wider interest of the population’s health and health care policy. It is not untenable to think that
A. Chow et al. Table 14.1 Web links for further information on national patient surveys NHS Surveys
http://www.nhssurveys.org/
Department of Health
www.dh.gov.uk/en/ Publicationsandstatistics/ PublishedSurvey/ NationalsurveyofNHSpatients/ Nationalsurveyinpatients/index.htm
Health care Commission
www.healthcarecommission.org.uk/ nationalfindings/surveys/healthcareprofessionals/surveysofnhspatients.cfm
Picker Institute
www.pickereurope.org/index.php
improved patient satisfaction could have benefit in terms of cost saving as a result of fewer complaints, “second opinions” and repeated investigations. Patient satisfaction could ultimately form a component of a world class commissioning process and the associated payment by results. The patient experience is already being taken into account by the use of nationwide patient surveys in the U.K.’s NHS. These are carried out jointly by the Department of Health and the Health care Commission. Further information on these surveys can be found by following the links in Table 14.1.
14.4 Measurement of Satisfaction 14.4.1 How can we Measure Satisfaction? As we have seen, the concept of satisfaction is a complex and multi-faceted one. The scope for manipulation of the design, and thus the results, of satisfaction surveys is without boundary. If we are to reliably use satisfaction to aid us in our assessment of health care quality, satisfaction surveys must be carefully planned and tested. We must pay careful attention to the aspects of satisfaction that are measured, as well as the type of questionnaire used, and to the timing of the measurements.
14.4.2 The “Overall Satisfaction” Score We must firstly consider whether an “overall satisfaction” rating is a valid concept. We have seen that for
14
Patient Satisfaction in Surgery
many years, overall satisfaction ratings within the NHS have remained high. Yet when more specific questions are asked about exact areas of care, then there is generally more dissatisfaction elicited. We must therefore conclude that the overall satisfaction score is in fact masking varying levels of dissatisfaction with care. Can we still therefore use an overall satisfaction score to assess our quality of care? It has been claimed that there are six dimensions that determine patient satisfaction [14]: medical care and information, food and physical facilities, non-tangible environment, quantity of food, nursing care and visiting arrangements. Now, even if this is true, the question is whether these dimensions can be combined to provide an overall satisfaction score. In order to do this, we must initially calculate a ‘weighting’ for each dimension. This in turn however, is complicated by the fact that for each patient, with his or her individual expectations and experiences, the ‘weight’ given to each dimension will differ. Two methods of eliciting weights have been recognised, the direct or indirect methods [4]. The direct method suggests that we ask patients to assign a numerical score to each of these dimensions in terms of importance. It has been argued that the main limitation of this method is the tendency of “raters” to assign equal measures to each dimension [28]. The indirect method suggests that we study patient responses to a range of questions, such as scenarios. Again this method has also found to be suspect as it is difficult to extract information about how the patients individually weighs and combines different dimensions to give their response [11]. Thus, we can see that the combination of specific satisfaction scores to produce an overall satisfaction score is fraught with problems. There is very little evidence to suggest that we can reliably provide a unified satisfaction score, and thus, its use to assess quality of care is undoubtedly in question. Such a construct would be unable to identify changes in quality, or specific areas that require improvement. More useful would be satisfaction measures that were specific for a particular clinical situation or health care area.
14.4.3 Satisfaction Survey Design The results of satisfaction surveys are particularly sensitive to their design. There are four crucial parameters
171
that can influence results, choice of population, timing, type of questionnaire and the rating of satisfaction [4]. The choice of population surveyed has a huge effect upon results. You can either survey the public as a whole (as they all are entitled to use the health service) or restrict it to current/recent users of the health service. Even if surveys are restricted to current users, there can still be large variations in the type of patients interviewed according to their patient characteristics and demographics. The timing of surveys is also important. The greater the length of time between health care service use and interview/questionnaire, the greater the chance of recall bias, changes in perceptions or appreciation of care and of patients overlooking aspects that bothered them at the time of care. The type of questionnaire is probably the most important methodological consideration. It is crucial that the type of question should not distort the patient’s view, but this can be difficult to achieve. There are two main types of question: Either “open ended” questions where a patient is asked to comment on an area of care, or a “closed” form of questioning where direct questions are asked about satisfaction with services. With ‘open’ questioning, the patient is free to comment on areas of care from which we can infer satisfaction. “Closed” questioning gives us quantitative evaluations, but does not provide us with the situation the patient is referring to. Direct questions tend to act well as probes to discover dissatisfaction with areas of care that may not be mentioned with an ‘open ended’ question. In the ideal survey, both types of questions should be included to avoid under-reporting of problems and also to identify areas for change. There are numerous ways to rate satisfaction. The most common way is by a categorical score following a question. The patient can choose from a variety of responses from “very satisfied”, “satisfied”, “dissatisfied”, to “very dissatisfied”. This simple method has its benefits, but can also be problematic. For example, a change from “satisfied” to “dissatisfied” may represent either an accumulation of small shifts in separate component areas, or a large shift in a single component. Responses to this form of rating system often tend to fall into two narrow bands, being only superficially indicative of high satisfaction levels [4]. Where there is a substantial change in satisfaction ratings, the cause for this is usually an obvious one.
172
14.4.4 Guidance for Satisfaction Measurement As demonstrated earlier, the measurement of patient satisfaction is riddled with obstacles. The authors feel that it is beyond the scope of this chapter to give a detailed guide to the assessment of satisfaction for each clinical setting. In fact, it may be near impossible to produce such a guide. We do feel, however, that there are some basic principles which should be considered before embarking on measuring satisfaction. We must remember that there are satisfaction determinants as well as components. A satisfaction rating on its own is of little value without having information of the patient’s perspective. Gathering detailed socioeconomic data is therefore important. It would also be ideal to gather information regarding the expectations of the patient. As satisfaction can be significantly affected by how reality meets expectation, assessment of expectation would be invaluable. The authors feel that the use of an overall satisfaction score may be severely limited. An overall score can mask underlying variability and provide a false sense of quality. Instead, we feel that assessment of satisfaction for individual dimensions of health care should be sought, without the need to combine them to produce a unified satisfaction score. As well as satisfaction, more attention should be paid to the expression of dissatisfaction. Dissatisfaction may give us more helpful information as to where there are problems and what needs to be improved. Once we are able to reliably measure satisfaction, another issue needs to be addressed. What do we do with the information? An individual satisfaction rating on its own is of limited value, whereas the real interest lies in its comparison. We can either benchmark current scores with historical scores to determine progress, or benchmark scores with other departments, units or institutions as a method of performance measurement.
14.5 Conclusions The rise of a more consumerist society has had significant impact on modern medicine, with more emphasis on a patient-led health service. Patients are now better informed, more demanding and wish to exert more
A. Chow et al.
influence upon their health care. This increasingly patient centred approach has seen a drive to encompass the patient’s perspective in the assessment of health care quality. One of the major ways in which we can do this is by measuring patient satisfaction with health care. However, many have rushed to implement satisfaction surveys and measurements without proper consideration of the meaning of satisfaction and the interpretation of data. As a result, current satisfaction ratings have remained stable at a high level for many years. Although this may make our managers happy, it does not provide us with any useful information on how to improve our service. Further attention now needs to be paid to the meaning of satisfaction, its determinants and components. By collecting data on patients’ characteristics, and expectations, as well as hearing their viewpoints on varying aspects of health care, we can continue to identify problem areas and thus implement changes for improvement. Ultimately, if we are able to reliably assess patient satisfaction, we can truly mould our health care system around the person who matters most: our patient.
References 1. Abramowitz S, Cote AA, Berry E (1987) Analyzing patient satisfaction: a multianalytic approach. QRB Qual Rev Bull 13:122–130 2. Ben-Sira Z (1976) The function of the professional’s affective behavior in client satisfaction: a revised approach to social interaction theory. J Health Soc Behav 17:3–11 3. Blanchard CG, Labrecque MS, Ruckdeschel JC et al (1990) Physician behaviors, patient perceptions, and patient characteristics as predictors of satisfaction of hospitalized adult cancer patients. Cancer 65:186–192 4. Carr-Hill RA (1992) The measurement of patient satisfaction. J Public Health Med 14:236–249 5. Carstairs V (1970) Channels of communication. In: Scottish Health Service Studies 11. Scottish Home and Health Department, Edinburgh 6. Cella DF, Tulsky DS (1990) Measuring quality of life today: methodological aspects. Oncology (Williston Park) 4:29–38; discussion 69 7. Deyo RA (1991) The quality of life, research, and care. Ann Intern Med 114:695–697 8. Doering ER (1983) Factors influencing inpatient satisfaction with care. QRB Qual Rev Bull 9:291–299 9. Fitzpatrick R (1984) Satisfaction with health care. In: Fitzpatrick R (ed) The experience of illness. Tavistock, London 10. Fitzpatrick R (1990) Measurement of patient satisfaction. In: Hopkins D, Costain D (eds) Measuring the outcomes of
14
Patient Satisfaction in Surgery
medical care. Royal College of Physicians and King’s Fund Centre, London 11. Froberg DG, Kane RL (1989) Methodology for measuring health-state preferences – I: measurement strategies. J Clin Epidemiol 42:345–354 12. Ganz PA (1995) Impact of quality of life outcomes on clinical practice. Oncology (Williston Park) 9:61–65 13. Hall JA, Dornan MC (1990) Patient sociodemographic characteristics as predictors of satisfaction with medical care: a meta-analysis. Soc Sci Med 30:811–818 14. Health Policy Advisory Unit (1989) The patient satsifaction questionnaire. HPAU, Sheffield 15. Hopkins A (1990) Measuring the quality of medical care. Royal College of Physicians, London 16. Hopton JL, Howie JG, Porter AM (1993) The need for another look at the patient in general practice satisfaction surveys. Fam Pract 10:82–87 17. Jones L, Maclean U (1987) Consumer feedback for the NHS. King Edward’s Hospital Fund for London, London. 18. Khayat K, Salter B (1994) Patient satisfaction surveys as a market research tool for general practices. Br J Gen Pract 44:215–219 19. Larsen KM, Smith CK (1981) Assessment of nonverbal communication in the patient-physician interview. J Fam Pract 12:481–488 20. LeVois M, Nguyen TD, Attkisson CC (1981) Artifact in client satisfaction assessment: experience in community mental health settings. Eval Program Plann 4:139–150 21. Locker D, Dunt D (1978) Theoretical and methodological issues in sociological studies of consumer satisfaction with medical care. Soc Sci Med 12:283–292 22. Madhok R, Bhopal RS, Ramaiah RS (1992) Quality of hospital service: a study comparing ‘Asian’ and ‘non-Asian’ patients in Middlesbrough. J Public Health Med 14: 271–279 23. Pascoe GC, Attkisson CC (1983) The evaluation ranking scale: a new methodology for assessing satisfaction. Eval Program Plann 6:335–347
173 24. Raphael W (1967) Do we know what the patients think? A survey comparing the views of patients, staff and committee members. Int J Nurs Stud 4:209–223 25. Schwartz CE, Sprangers MA (2002) An introduction to quality of life assessment in oncology: the value of measuring patient-reported outcomes. Am J Manag Care 8: S550–S559 26. Sitzia J and Wood N (1997) Patient satisfaction: a review of issues and concepts. Soc Sci Med 45:1829–1843 27. Stimson G, Webb B (1975) Going to see the doctor: the consultation process in general practice. Routledge and Kegan Paul, London 28. Sutherland HJ, Lockwood GA, Minkin S et al (1989) Measuring satisfaction with health care: a comparison of single with paired rating strategies. Soc Sci Med 28: 53–58 29. Thompson J (1984) Communicating with patients. In: Fitzpatrick R (ed) The experience of illness. Tavistock, London 30. Tishelman C (1994) Cancer patients’ hopes and expectations of nursing practice in Stockholm – patients’ descriptions and nursing discourse. Scand J Caring Sci 8:213–222 31. Wardle S (1994) The Mid-Staffordshire survey. Getting consumers’ views of maternity services. Prof Care Mother Child 4:170–174 32. Ware JE Jr, Snyder MK, Wright WR et al (1983) Defining and measuring patient satisfaction with medical care. Eval Program Plann 6:247–263 33. Wensing M, Grol R, Smits A (1994) Quality judgements by patients on general practice care: a literature analysis. Soc Sci Med 38:45–53 34. Williams B (1994) Patient satisfaction: a valid concept? Soc Sci Med 38:509–516 35. Williams SJ, Calnan M (1991) Convergence and divergence: assessing criteria of consumer satisfaction across general practice, dental and hospital care settings. Soc Sci Med 33:707–716
How to Measure Inequality in Health Care Delivery
15
Erik Mayer and Julian Flowers
184
Abstract There has been an increased focus on health inequalities and equity even in developed countries over the last decade. Reducing health inequalities is an important policy objective. The origin of health inequalities and their development is a complex interplay between structural, social and individual factors influencing both population health and individual health. The study of health inequality is a vast, complex and rapidly developing field. The focus of this chapter will relate to our current understanding of the measures and dimensions of inequality in health care delivery. For areas not covered in depth in this chapter, suitable references for further reading will be provided where applicable.
185 186 188
15.1 Introduction
Contents 15.1
Introduction ............................................................ 175
15.1.1 Access to Health Care .............................................. 177 15.2
Dimensions of Inequality in Health Care Delivery ................................................................... 180
15.2.1 15.2.2 15.2.3 15.2.4
Patient-Level Characteristics .................................... Primary (Community) Care Characteristics ............. Secondary (Hospital) Care Characteristics .............. Limitations of Inequality Research ..........................
15.3
Measuring Inequality in Health Care Delivery .......................................................... 184
15.3.1 What to Measure – Measuring Health Care ............. 15.3.2 What to Measure – Measuring Health Inequality and Inequity ............................................ 15.3.3 Data Sources............................................................. 15.3.4 Limitations of Data Sources ..................................... 15.4
180 182 183 183
Methods for Measuring Health Care Inequality ................................................................ 189
15.4.1 Health Gap Analysis ................................................ 189 15.4.2 Share-Based Measures ............................................. 190 15.4.3 Methodological Limitations ..................................... 191 15.5
Conclusions ............................................................. 191
References ........................................................................... 192
E. Mayer () Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, QEQM Building, St. Mary’s Hospital Campus, Praed Street, London, W2 1NY, UK e-mail: [email protected]
There has been an increased focus on health inequalities and equity even in developed countries over the last decade. For example, reducing health inequalities is an important policy objective of the British government. Despite the renewed focuses, several definitions of health inequality have been proposed including: • Differences in health that are avoidable, unjust and unfair [1] • Differences in health status or in the distribution of health determinants between different population groups [2] • Systematic and potentially remediable differences in one or more aspects of health across populations or population groups defined socially, economically, demographically or geographically [3] The strengths and weaknesses of these and other definitions are discussed in detail by Bravemen [4]. There is general consensus that health inequality exists when there
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_15, © Springer-Verlag Berlin Heidelberg 2010
175
176
E. Mayer and J. Flowers
are potentially avoidable differences in health status or outcomes between disadvantaged people or populations, and those less disadvantaged. Disadvantage often encapsulates notions of social exclusion, deprivation or other forms of social or economic discrimination. It is apparent that contemporary definitions of health inequality incorporate the concept that there is a systematic trend to health inequality; inequalities in health do not occur randomly [5]. Having clear definitions aids measurement and understanding and can, therefore, direct appropriate action. Not all health inequality results from factors external to the patient and not all inequality, therefore, are inequitable. The terms inequality and inequity tend to be used interchangeably, although their meanings are discrete; inequality measures factual differences, whereas inequity incorporates a moral judgement, although health inequality generally implies a degree of unfairness and avoidability, and inequity tends to refer to distribution of health or health care according to need. Often we investigate inequality because it can point towards potential inequity. So for example, on the one hand, there are large variations in coronary heart disease death rates between geographical areas in the UK (and indeed between countries across Europe), i.e. there are inequalities in health outcomes; and on the other, there is a wide variation in the prescription of beta blockers as secondary prevention for people with known coronary heart disease – there are inequities in health care provision for these patients. Ecological level
The origin of health inequalities and their development is a complex interplay between structural, social and individual factors influencing both population health and individual health (Figs. 15.1, 15.2). Health care and health care systems can contribute to both individual and population health improvement, but also to reducing health inequality and health inequity, although the relationship between improving equity of access, for example, and reducing inequality of health outcomes may be complex and is often not well understood. It is evident from Figs. 15.1 and 15.2 that health system characteristics have potential for a targeted approach to tackling inequality. Although it may not be possible to eradicate health inequality entirely, the structure and processes of health care delivery can influence the degree to which it exists. In particular, the greatest impact may be on the systematic relationship with characteristics such as geography, ethnicity, socio-economic group, etc. [6]. Inequality in health care delivery and the corresponding equity in health care have been defined as: Health care is equitable when resource allocation and access are determined by health needs [7]
Researchers now distinguish horizontal equity, which exists when individuals or groups with equal need consume equal amounts of health care from vertical equity, which exists when individuals with varying levels of need consume appropriately different amounts of health care.
Aggregated individual level
Individual level
Occupational/ Environmental exposures Physiologic damage Material resources Material deprivation
Political and policy context
Fig. 15.1 Factors that influence health at the individual level. Figure reproduced with permission from [5]. Reproduce with the permission of the Pan American Health Organisation (PAHO)
Socioeconomic characteristics Developmental health disadvantage
Social resources Social isolation Behavioral/ Cultural characteristics
Psychosocial characteristics
Health system characteristics
Health
Stress
Health services received
Genetic/ Biological characteristics
15
How to Measure Inequality in Health Care Delivery
177
Environmental characteristics
Occupational/ Environmental policy
Social policy Political context
Equity in health Economic development Historical health disadvantage
Social characteristics
Economic policy
Health Behavioral/ Cultural characteristics
Health policy
Demographic structure Health system characteristics
Fig. 15.2 Factors that influence health at the population level. Figure reproduced with permission from [5]. Reproduce with the permission of the Pan American Health Organisation (PAHO)
Indeed, horizontal equity is a guiding principle of the UK National Health Service (NHS) to provide health care on the basis of clinical need. Nevertheless, there is mounting evidence of the “inverse care law” – provision unrelated to need. Examples include emergency procedures (appendicectomy and cholecystectomy), which are more common in deprived populations, as are tonsillectomies. Varicose vein surgery tends to be more common in less deprived areas (Fig. 15.3). Equally, we see inequality in surgical outpatient attendances between Primary Care Trusts across England (Fig. 15.4). The importance of the distinction between horizontal and vertical equity lies in the appreciation that although clinicians may treat the patients they see according to their individual needs, other factors may deter or encourage referral or attendance. Vertical inequity is really only detectable at a population level through comparison between populations or areas. For example, imagine two populations with an equal “need” for hip replacement based on a similar prevalence of disabling hip osteoarthritis. Imagine that one has greater social deprivation than the other such that people are less able to access health care (e.g. lack of affordable transport, inability to take time off work, less demanding population). The patients referred and
who attend orthopaedic clinics may have equivalent clinical need in both areas, but at population level, the rates of hip replacement may vary or the proportion of patients with osteoarthritis receiving a hip replacement may be lower in the more deprived population. The level of inequity can, therefore, only be determined, once “need variables” have been distinguished from “non-need variables”, as need should affect consumption of health care and non-need variables should not. More recently, Culyer and Wagstaff provide four definitions of equity in health care, equality of utilisation, distribution according to need, equality of access and equality of health. They conclude that, in general, these four definitions are mutually exclusive and practically incompatible, but correctly identify that each of these components needs to be aligned within the distribution of a health care service so as to get as close as is feasibly possible to an equal distribution of health [9].
15.1.1 Access to Health Care A component of health care delivery is, therefore, concerned with creating a health care system or
178
E. Mayer and J. Flowers Correlations
Index of multiple deprivation 2007
Pearson Correlatio Sig. (2-tailed) N
Cholecystectomy
.314*
Sig. (2-tailed)
.030
.269
.031
.908
.569
48
48
48
48
48
.295*
.477*
.241
.189
.455**
.220
.404*
.042
.001
.098
.198
.001
.133
.004
48
48
48
48
48
48
48
1
.270
.197
.279
.351*
-.185
.191 .192
1 48
.162
.042
48
48
.011 48
.063
.180
.055
.015
.208
48
48
48
48
48
48
48
1
.014
.292*
.375**
.174
.280 .054
Pearson Correlatio
.241
.477**
.270
Sig. (2-tailed)
.098
.001
.063
.923
.044
.009
.236
48
48
48
48
48
48
48
48
48
.163
.241
.197
.014
1
.437**
.063
-.115
.176
.269
.098
.180
.923
.002
.673
.438
.233
48
48
48
48
48
48
48
48
1
.141
-.090
.392* .006
N
48
Pearson Correlatio
.312*
.189
.279
.292*
.437**
Sig. (2-tailed)
.031
.198
.055
.044
.002
.341
.543
48
48
48
48
48
48
48
48
1
.022
.296*
.884
.041
N
48
Pearson Correlatio
.017
.455**
.351*
.375*
.063
.141
Sig. (2-tailed)
.908
.001
.015
.009
.673
.341
48
48
48
48
48
48
48
48
48
-.084
.220
-.185
.174
-.115
-.090
.022
1
.143
.569
.133
.208
.236
.438
.543
.884
48
48
48
48
48
48
48
Pearson Correlatio
.365*
.404**
.191
.280
.176
.392**
Sig. (2-tailed)
.011
.004
.192
.054
.233
48
48
48
48
48
N Pearson Correlatio Sig. (2-tailed) N Appendicectomy
.098
48
Sig. (2-tailed)
Sig. (2-tailed)
Varicose veins
Appendic ectomy .365*
.162
.295*
Knee replacemen Pearson Correlatio
Hip replacement
Varicose veins -.084
48
.205
N
Tonsillectomy
48
Knee Hip replacement Tonsillectomy replacement .163 .312* .017
.030
Pearson Correlatio N
Hernia repair
48
Pearson Correlatio N
Grommets
Index of multiple deprivation Cholecyst ectomy Grommets Hernia repair 2007 1 .314* .205 .241
N
.331 48
48
.296*
.143
1
.006
.041
.331
48
48
48
48
*. Correlation is significant at the 0.05 level (2-tailed). **. Correlation is significant at the 0.01 level (2-tailed).
Fig. 15.3 Relationship between deprivation and area admission rates (European standardised, 2005/6) for a range of common surgical procedures in a single region of England. Source: Hospital Episode Statistics [8]
environment in which there exists fair or equal opportunities for all people to attain health, i.e. equity of access to health care. The term “access” itself needs to be carefully defined as health care delivery occurs as a three tier system, hospital care, primary (community) care and public health [10]. For a service or health care system to be accessed, it needs at least to be: • Available – the service needs to be present • Accessible – e.g. service can be reached by public transport, is open when people can use it, is culturally sensitive to the population of patients likely to use the service and so on • Appropriate – the service needs to be relevant to the needs of the population. Research into inconsistency of health care access can be divided into three domains – patient level variables, characteristic and practices of health care professionals and the system of health care delivery [11].
Measures of health status and health inequality have traditionally focused on indicators of health outcome such as mortality, rather than direct measures of health status, although there are strong correlations between the two [12]. Frameworks for understanding health care structure, processes and patient outcomes have increased the awareness of the requirement to study measures of both the delivery and distribution of delivery of health care [13]. The understanding that health care delivery was linked to patient outcome, or health, was stratified by Donabedian in his 3 × 4 matrix over 25 years ago [14] (Fig. 15.5). Furthermore, equity and lack of variation in health care delivery (consistency in process) forms one of the “seven pillars of quality” as defined by Donabedian [15] Table 15.1. Health care inequality research is, therefore, critical to further our progress to improve the quality of care that our patients receive. “The notion of health care quality implies that resources are allocated according
15
How to Measure Inequality in Health Care Delivery
179
Fig. 15.4 Lorenz curve showing inequality in surgical outpatient attendances across PCTs in England 2007/8. Source: http://nww.nhscomparators. nhs.uk/
Resources
Process
Outcome
Access
Table 15.1 Equity in health care distribution forms one of Donabedian’s seven pillars of quality health care. Table adapted from figure in reference [16] Efficacy
The ability of care, at its best, to improve health
Effectiveness
The degree to which attainable health improvements are realised
Efficiency
The ability to obtain the greatest health improvement at the lowest cost
Optimality
The most advantageous balancing of costs and benefits
Acceptability
Conformity to patient preferences regarding accessibility, patient-practitioner relationships, amenities, effects of care and cost of care
Legitimacy
Conformity to social preferences concerning all of the above
Equity
Fairness in the distribution of care and its effects on health
Technical Quality Affect/Relationship Quality Continuity of care
Fig. 15.5 Donabedian’s 3 × 4 quality matrix linking health care delivery with patient outcome
to medical need, risk and benefit”, yet quality assessment tools have failed to address adequately health care inequality across socio-economic groups [17]. Understanding and uncovering inequality in health care delivery, therefore, aims to reduce inequality and inequity in health and simultaneously increase the quality of care for a greater proportion of the population. Few, however, have looked at the impact of health care access on health inequality [18]. Improving the equality of delivery can also have financial gain by decreasing inequality in health outcomes, and therefore, health care demand through fewer complications, fewer exacerbations of chronic illness, lower numbers of emergency re-admissions, etc.
The study of health inequality is a vast, complex and rapidly developing field. The focus of this chapter will relate to our current understanding of the measures and dimensions of inequality in health care delivery. For areas not covered in depth in this chapter, suitable references for further reading will be provided where applicable.
180
15.2 Dimensions of Inequality in Health Care Delivery As already discussed, there is a non-random distribution of health in the population; health problems tend to cluster systematically for some individuals and some groups of individuals [5]. The generation of average health statistics at the national or even local level will tend to ignore this clustering phenomenon, and therefore, hide important variations within the population of interest. This is one of the main arguments for the categorisation of groups within a population and then to explore their relationship with measures of inequality in health care delivery. Random inequity can arise, however, because of factors such as variations in medical practice and historical accident [19]. The majority of studies assess horizontal equity and not vertical equity, which is less commonly addressed and gives rise to more complex ethical issues, such as the conflict between trying to generate equality of delivery and the potential for removing patient lead choice and the generation of inefficiency of utilisation by equalising access. Many studies have been performed over the years and have led to some conflict in the reported results. We report here on the most contemporary of studies, which tend to use the richest of data sources and make better adjustment for concepts such as “need”, thereby reflecting on the most current position with regard to inequality in health care delivery.
15.2.1 Patient-Level Characteristics 15.2.1.1 Socio-Economic Characteristics Socio-economic inequalities in health result from both social causation mechanisms (behavioural, structural environmental and psycho-social factors associated with a low socio-economic status resulting in poor health) and social selection (poor health impacts negatively on education, employment, income, etc. to further lower socio-economic status) [20]. It has been proven that across countries there is a strong association between income inequality and health inequality, with inequality favouring the rich. This association is particularly true for the US and the UK [21]. Although
E. Mayer and J. Flowers
inequalities in health tend to originate outside of the health care system, it follows that mechanisms of health care delivery can impact to either worsen or improve inequality that might exist. Inequality in health care delivery can be measured across socio-economic factors such as income, education, occupational characteristics and social class. In the UK, as in other European countries, equity of availability of a broad range of health care services, regardless of income, has been achieved through taxation-funded health care system. This has not, however, eliminated socio-economic inequality in health care access or “realised access” [19]. Income-related inequality in doctor utilisation as measured by self-reported “number of doctor contacts” and adjusting for self-reported measure of need has been demonstrated for a number of European countries, using European Community Household Panel (ECHP) data. Wealthier and more highly educated individuals are more likely to have contact with a medical specialist than the poor after adjusting for need, despite the poor having greater needs. There appears to be little or no need-adjusted inequality for general practitioner (GP) services [22]. Interestingly, the most important socio-economic variables driving a higher need-unadjusted use of GP services by the poor were low education, retirement and unemployment. A more recent study using improved methodology for adjustment of need confirms the pro-rich estimates for specialist care and lessens any pro-poor inequity in GP care that was previously demonstrated [23]. Similar findings to those of the ECHP dataset have been found for the organisation for economic co-operation and development datasets, which now incorporates 30 member countries worldwide [24]. Variation across countries was seen in each of the studies. If one looks specifically at the UK results, some contradiction in the studies results are seen; van Doorslaer et al. [22] showed a slight pro-rich inequity in the need-adjusted probability of a GP visit with a more significant prorich inequity for specialist care, whereas van Doorslaer et al.’s alternative study using the 2001 British Household Panel Survey data showed no significant income-inequity for GP, medical specialist (outpatient) or hospital (inpatient) care utilisation. Morris et al. using the 1998–2000 Health Survey for England investigated inequality in the use of GP’s, outpatient visits, day cases and inpatient stays, adjusting for local supply conditions and subjective and objective measures
15
How to Measure Inequality in Health Care Delivery
of individuals need. Individuals with low-incomes were more likely to consult their GP, but less likely to have outpatient visits, day cases and inpatient stays. The same was true for individuals with lower levels of formal qualifications, except for outpatient visits where no association was found [25, 26]. Socio-economic inequity for specific surgical treatments or services such as total hip replacements has been demonstrated [27], although there is evidence that the inequality appears to have decreased slightly over a ten-year period between 1991 and 2001 [28].
15.2.1.2 Socio-Demographic Characteristics Ethnicity Using the General Household Survey dataset, Smaje and Grand [29] demonstrated a trend towards equivalent or higher GP use and lower outpatient use relative to Whites among most ethnic groups. Only for the Indian and Pakistani groups was there a significantly greater utilisation of GP services. The relationship was unaltered after adjusting for income or socio-economic group. The major limitation, however, was the inadequate adjustment for varying degrees of “need” between ethnic groups. Perhaps the most interesting result was the consistent finding of lower levels of outpatient use by “non-sick” individuals across all the minority ethnic groups as compared to White patients. Morris et al., using more robust regression analysis, demonstrated similar results, with a trend towards nonWhite ethnic groups utilising GP services less than Whites, but outpatient services more than White. The only significant results were for the Indian group and Pakistani, Bangladeshi and Chinese groups, respectively. There were no significant differences for the use of day case services or probability of inpatient stay, except for the Chinese who were 45% less likely to have had an inpatient stay [26]. NHS Direct, an emergency telephone advice service, was launched to help improve access to health care. It may, however, have inadvertently increased ethnic inequity in health care delivery, with lower utilisation levels by non-White patients and those born outside of the UK [30]. “Need”, however, was only defined as limiting or non-limiting long-term illness; this limits extrapolation of the findings, as NHS Direct likely serves a significant population with acute illnesses.
181
Gender and Age Summary studies have demonstrated that females, irrespective of age, are significantly less likely to utilise GP visits, day case treatment and inpatient stays [26]. Age was shown to have non-linear effects on the probability of the use of the same health care services, but also including outpatient visits. This incorporated both conditional (effect of age on use, keeping all other factors constant) and unconditional (incorporates effects of other variables which affect use and are correlated with age such as morbidity) estimated relationships. Interestingly, Allin et al. [31] studied horizontal inequality of utilisation of health care services in only patients aged 65 and over in the UK using the British Household Panel Survey data (1997–2003). They found that those on a lower income were significantly less likely to visit a GP or a specialist in outpatients despite having the greater need, i.e. pro-rich horizontal inequity. Households with older residents are also less likely to use the UK’s nationally available emergency telephone advice service, NHS Direct [30]. For more specifically-orientated studies, women were found to be twice as likely to need hip replacements, but equally likely to be receiving care as males. Whereas older people, despite a greater need for total hip replacements, were less likely to be receiving care than younger patients [27]. Shaw et al. 2004 using Hospital Episodes Statistics (HES) data showed that women, in addition to older people in England, are probably receiving less revascularisation in terms of coronary artery bypass grafting and percutaneous transluminal coronary angioplasty than their need, as defined by admission rates for acute myocardial infarction, would indicate. The authors do, however, accept the limitations of the data in terms of being able to adjust adequately for clinical severity and indications for treatment [32].
Geography The variation across countries for overall utilisation of health care services has already been described [22]. Within countries, geographical distinct groups have been commonly used as comparators to identify potential inequality. Equitable geographical distribution of health care carries obvious political incentive. Patients living in the UK urban areas consult GP’s more commonly than rural areas, whereas in the US
182 100% 90% Cumulative % interventions
this imbalance is not seen [33]. Although the causes for the differences seen in the UK are complex, evidence suggests an influence of a range of personal factors that affect an individual’s intention to consult a GP, irrespective of location; this may be particularly true for “out-of-hours” access [33]. In a study conducted in the north and northeast of Scotland, there was significant trend between the presence of disseminated disease at presentation of patients with lung cancer and colorectal cancer and an increasing distance of residence from a cancer centre. This may not, however, be attributable to limitations in health care delivery, as women with breast cancer, who lived further from cancer centres, were treated more quickly, but only as the result of them receiving earlier treatment at non-cancer centre hospitals [34]. No difference was seen for patients with colorectal cancer. In a study looking at the effects of rurality on need adjusted use of health services for total hip replacement, it was shown that patients living in rural areas and in need of total hip replacement were as likely to access GP or hospital care, or be on a waiting list, as patients in urban areas [27]. More local level analysis can demonstrate inequalities in health care delivery. Dixon et al. showed the presence of regional variation (across eight administrative NHS regions of England) in age and sex-standardised hip and knee joint replacement rates [35]. There are, however, many patient and service level factors that can explain the variation seen; the most important for total hip replacement was the proportion of older patients aged 65–84 years old (this was despite the data being age-standardised). For total knee replacement, over 50% of the regional variation could be explained by the number of centres in each region offering surgery. Returning to the principle of horizontal equity of health care provision, where intervention rates (supply) for a defined disease process should appropriately increase as need (demand) increases. Analysis of the relationship between surgical intervention for lung cancer and lung cancer incidence across the Primary Care Trusts within a Strategic Health Authority has shown little correlation [36]. Furthermore, it has shown that there is also the presence of inequity, with 50% of the interventions being performed in the 40% of the areas with the highest lung cancer incidence (Fig. 15.6). The exact reasons for the inequality seen between socio-economic and socio-demographic groups is not known, but may reflect factors such as disadvantaged individuals having more and/or multiple co-morbidities,
E. Mayer and J. Flowers
80% 70% 60% 50% 40% 30% 20% 10% 0% 0%
20%
40%
60%
80%
100%
Cumulative % population ranked in order of decreasing lung cancer incidence
Fig. 15.6 Concentration curve for lung cancer incidence. Red line shows expected curve if intervention rates were most concentrated in areas with highest need. Reproduced with permission from reference [36]
thereby restricting their suitability for or benefits from surgery and increasing the risk of post-operative complications and poorer long-term outcomes [28]. It may also be related to variations in “demand factors” where disadvantaged individuals could have lower overall expectations with regard to the treatment that is available to them, or for cultural or educational reasons be less able to access the correct level of care appropriately and understand medical system and/or information provided [28]. It is also possible that there are “supply factors” at play, with longer waiting lists and less capacity to undertake operations in hospitals located in more deprived areas. Variation in medical practitioner’s clinical practice for diagnosis, investigations, timing of referrals and indications for operation can occur geographically or be affected by deprivation [28, 37]. These supply factors will now be discussed in more detail.
15.2.2 Primary (Community) Care Characteristics Hippisley-Cox and Pringle looked at several characteristics of primary care facilities that may impact patients’ access to coronary angiography and revascularisation. Factors that lowered angiography and revascularisation rates were primary care practices that were more that 20 km from the revascularisation (referral) centre and
15
How to Measure Inequality in Health Care Delivery
those treating a more deprived patient population. The same variables were also related to longer waiting times for angiography [38]. Interestingly, patients from fundholding practices had a higher admission rate for coronary angiography. Using the cardiovascular disease Quality and Outcomes Framework (QOF) indicators, Saxena et al. assessed the relationship between practice size and caseload and the quality of care they provided as defined by the 26 QOF indicators. They found that generally the quality of care was consistently high regardless of caseload or practice size, except for selected indicators related to early diagnostic investigations and management, when among smaller practices with lower caseloads and in more deprived areas, there was lower quality of care [39]. The authors suggested that this may reflect better access to resources among larger practices, possibly as a result of local planning and commissioning where higher caseloads exist. A related study looked at quality of care provided by primary care practices based on their performance across 147 QOF indicators across eleven chronic disease processes. They demonstrated that practices located in areas with less social deprivation, training practice and group practice status were all independently associated with achieving higher QOF scores [40]. Both the Saxena et al. [39] and Ashworth et al.[40] studies demonstrate that primary care practices can achieve high levels of QOF achievement in less deprived areas by delivering resources to potentially low-risk patients, and therefore, overlook those at higher risk and in most need of the resources.
15.2.3 Secondary (Hospital) Care Characteristics There has, for many years, been a distinction between the “specialist” hospital and the “general” hospital and the ongoing debate as to whether patient outcomes differ between them. Pertinent to this question is the characteristics of the health care that they deliver and whether there exists inequality in their differences, as this will clearly impact subsequent quality of care that they can provide. Cram et al. reported on an assessment into the rate of adverse outcomes between speciality orthopaedic hospitals and general hospital for Medicare beneficiaries undergoing either hip or total knee replacement. Although speciality hospitals displayed better riskadjusted patient outcomes, they were treating patients
183
who had less co-morbidity and resided in more affluent geographical areas [41]. Even within the speciality hospitals group, those that were physician-owned were found to be treating patients with fewer co-morbidities than non-physician-owned centres and patients who underwent major joint replacement in physician-owned speciality hospitals were less likely to be Black, despite the hospitals being located in neighbourhoods with a higher proportion of Black residents [42]. The lower mortality following cardiac revascularisation procedures seen in speciality cardiac hospitals as compared to general hospitals has been shown to be partly explainable by the better health of patients that they treat [43]. Inequality of health care delivery as a result of hospital characteristics such as that described can have a significant impact on the outcomes of patients. Patient outcomes are not just determined solely by “patientbased risk”, but also indirectly from the improved hospital processes that can result from a more efficient and productive patient “throughput” because of less intensive care admissions, shorter lengths of stay, fewer complications, etc. As these factors are partly out of control of the patient, they could be said to be inequitable. Indeed, there is some evidence to suggest that supply incentives could further worsen inequity as a result of “physician-induced demand”, which can increase utilisation more than would be explained by simply providing facilities for increased capacity [44]. In the UK health care system, a significant secondary care characteristic that needs to be considered is independent sector health care access and utilisation. For surgery, “private” operations can account for a significant proportion of those undertaken in the NHS, such as hip replacements, which account for about a quarter of England’s total caseload. Surgery undertaken in the private sector occurs as a result of long NHS waiting times, and therefore, tends to be concentrated in more affluent areas of the UK such as the Southeast of England. Aside this causing inequality in itself, it leads to underestimation of socio-economic inequality in the NHS sector because of unobserved activity in less socio-economic deprived areas [28].
15.2.4 Limitations of Inequality Research Inequality research reveals associations between dimensions of inequality and health care delivery. It does not, however, reveal causal relationships. It is unlikely that a
184
E. Mayer and J. Flowers
single association, if reversed, will eliminate inequity as a result of the complex interaction of numerous factors in the generation of inequality and inequity. This having been said, the demonstration of associations will allow for more focused research to then identify the most likely causal relationships to be acted upon. Although the research to date has produced many interesting results, it has also caused some confusion as often there are contradictions between studies in terms of the direction of association, or whether or not a true association exists. This contradiction is often the result of methodological differences between studies; these differences are described in detail by Dixon et al. [45], but include factors such as use of self-reported morbidity over objective measures, using only inpatient stay data and not including day case data, a focus on only emergency admissions and not addressing issues of appropriateness of care. Differences in reported results between studies over time could additionally be the result of sampling errors, changes to the geographical distribution of health care resources and the growth and use of the private sector. Dixon et al. suggest that in terms of these methodological limitations, a hierarchy of evidence quality results with micro-studies best of all, macro-studies with more disaggregated indices of need and utilisation next, then followed by remaining macro-studies. Goddard and Smith also identify the methodological limitations of studies, which in turn makes it difficult to make firm conclusions about inequities in the access to health care, identify potential causes and recommend appropriate policies to reduce them [19]. These methodological limitations were highlighted after the authors had set out a general theoretical framework within which the equity of health care access can be researched [19]. This framework draws
Table 15.2 Internet resources for inequality in health and health care Organisation URL International Society for Equity in Health
www.iseqh.org
World Health Organisation – Health Systems Performance
http://www.who.int/ health-systemsperformance/
European Public Health Alliance – Health inequalities
http://www.epha.org/r/50
Determine – European Portal for Action on Health Equity
http://www.healthinequalities.org/
together several ambiguous concepts such as “need”, “access”, “utilisation”, “demand” and “supply” and provides a useful platform for the alignment of future research in this area. Table 15.2 provides links to organisations and their websites, which have a focus on health and health care inequality.
15.3 Measuring Inequality in Health Care Delivery There are two parts to any health care inequality measurement: • The aspect of health care of interest, e.g. structure, process, outcome, utilisation or access. • The precise metric used to assess the level of inequality or inequity. To facilitate understanding, it helps to be precise about both aspects.
15.3.1 What to Measure – Measuring Health Care Following the approach of Donabedian or others, we can identify a number of ways of measuring aspects of health care: • Structure – provision of services, staffing levels, numbers of beds and so on • Utilisation – referral and attendance rates, procedure rates, admission rates • Outcomes – hospital mortality rates, adverse event rates, mortality rates, survival rates Two main approaches have been described for measuring inequality in health care delivery: summary “macro” studies that measure utilisation of NHS services at a “high level” such as numbers of GP visits, specialist visits and A&E attendances, etc., and more specific “micro” studies which focus on particular diagnoses or treatments [45]. Before such studies can be performed, more fundamental decisions need to be taken in choosing the methodology employed. The following section discusses features important to the assessment of inequality in health, the area in which they were first developed, though they are equally applicable to measuring inequality in health care delivery.
15
How to Measure Inequality in Health Care Delivery
185
15.3.2 What to Measure – Measuring Health Inequality and Inequity
health inequality. Those who believe in measuring health inequality in a univariate fashion, also known as “pure inequalities in health”, criticise the bivariate approach of “evaluating health differences between categories or values of a socio-economic characteristic because they involve a moral judgement”. There is also the concern that the bivariate approach may not reflect inequalities across individuals in the population [47]. Indeed, these socio-economic characteristics should be “part of the process of explaining health inequalities, not part of the process of measuring them” [6]. Proponents of the bivariate approach believe, to the contrary, that health inequality should be assessed at the socio-economic level as it forms a fundamental component of determining inequity where it exists. It is argued that the “higher level” approach of univariate measurement can fail to identify inequality where it exists at a more local level between socio-economic groups. Both arguments prevail and whether a univariate or bivariate is used will be determined by the question posed. Figure 15.7 illustrates the additional information adduced by analysing the data by socio-economic status. It is useful when trying to identify the potential for inequality in a study population of interest to first use a summary statistic approach. These can be generated using comparator groups as categorical variables or
Health or a measure of health is distributed throughout a study population. The distribution of health can be described in two ways: its location (a measure of central tendency) or its dispersion (variability in the distribution) [46]. Measures of dispersion can either be considered in a univariate or bivariate fashion. Univariate measures of dispersion solely assess the variability in distribution of a health measure across the study population. Bivariate measures of dispersion assess the variability of a health measure in conjunction with a secondary variable, such as a socio-demographic factor. As example, assessing the screening rate for breast cancer across Primary Care Trusts in England and Wales is a univariate measure of dispersion of health care delivery. Assessing the relationship between screening rates for breast cancer and social class across Primary Care Trusts is a bivariate measure of dispersion. When measuring dispersion, the unit of interest within the study population can either be at the level of the individual, or as in the above example, among groups. There are two groups of researchers split by their differing opinion of what should be measured in terms of
Fig. 15.7 Map showing small area variation in life expectancy at birth within a county area in England (most deprived areas are outlined) (left), and chart showing trend in life expectancy of county as a whole and most and least deprived fifths of areas (right). The life expectancy in the most deprived areas is significantly worse than the area as a whole, a fact not obvious from the univariate analysis represented by the map. Source: Eastern Region Public Health Observatory. http://www.erpho.org.uk/ Download/ Public/17088/1/2007%20 HIP%20Cambridgeshire%20 PCT%20county.pdf. Accessed July 2008
186
continuous variables and can additionally use the univariate and bivariate approaches as described above. The method by which the data are handled determines the statistical approach used to identify inequality, where it is present, and how the results are displayed. As identified by Carr-Hill and Chalmers-Dixon [48], three questions first need to be answered when devising a method to measure inequalities: 1. Which units of interest are to be compared? 2. What type of inequality is it that you are interested in? 3. What is the intended purpose for the results generated? When the data are handled in a categorical fashion, differences in a measure of health between defined groups within a population are sought, known as “health gap” analysis. The differences can either be absolute, or relative, or used in combination depending on the question being answered. Health inequality can also be measured across individuals, thereby treating the unit of interest as a continuous variable. These are known as share-based measures and are generally considered more complicated than health gap analysis. The applicability of these different approaches to measuring health inequality will be explored in further detail later in the chapter. In thinking specifically about health care delivery, this includes both provision (access) which can be measured across geographically defined groups acting as a univariate measure, and patient uptake. In line with definitions of horizontal equity, there should be equality of access to health care services for individuals with equal need. Access should, therefore, be independent of dimensions of inequality such as socio-economic status, ethnicity, etc., except where this affects need [45]. Patient uptake (utilisation) of health care services may be more directly influenced by socio-economic group, gender, ethnicity, etc.; this is more suited to bivariate analysis. There is no point equalising access if you have inequitable uptake, as inequality in health will still prevail. However, inequality of utilisation cannot always be said to be inequitable, as individuals or groups of individuals may have equal need and equal access, but choose not to utilise the health care services for whatever reasons [45]. Distributions that are the outcome of factors beyond individual control are generally considered inequitable; distributions that are the outcome of individual choices are not [49].
Health care delivery is thus composed of the two elements’ access and utilisation: Does inequality in health
E. Mayer and J. Flowers
care delivery indicate a resource issue, or does it represent variation in demand? To measure true inequality in delivery, one needs to identify a shortfall in the supply/demand ratio. As identified by Goddard and Smith, variation in supply (access) of health care can evolve from reasons of availability, quality, costs and information and communication [19]. Data sources and the health outcomes that they record do not generally give an indication to what extent the utilisation, or indeed lack of it, results from the individuals’ choice. Researchers generally accept that any demonstrated inequality in health care delivery results from inequality in access, which is, therefore, inequitable [45]. The second element in the definition of horizontal equity that is far from obvious is “need”. Two definitions of need are commonly presented: current illhealth or severity of illness and “capacity to benefit”. Although the latter incorporates the concept that individuals who are currently not ill can still benefit in terms of their future health by providing preventative health care, it is measuring the entity that care will affect (health), rather than the entity that is needed (health care) [6]. Indeed, if need is defined as capacity to benefit, individuals presenting early in the course of illness have greater need, whereas if need is defined as current ill-health or severity of illness, individuals presenting later will have the greater need [45]. Need is most often defined as current ill-health because it is more readily available as a data source.
15.3.3 Data Sources Datasets collected and collated, which are intentioned for mapping inequality or can be used for this purpose, are done so either at the international, national or local level. A limitation of national datasets is that they are intended to allow comparison between aggregated groups and may, therefore, be ineffective for lower-level analysis. Equally, local level datasets are often restricted for use in more extensive comparison because of heterogeneity in data definitions and methodology of collection and formatting. Researchers may, therefore, be limited in their analysis by the data available to them. There are data sources encompassing many dimensions of health inequality, such as social care, deprivation, education, unemployment, environment, crime and income and benefits. These are covered extensively in
15
How to Measure Inequality in Health Care Delivery
The Public Health Observatory Handbook of Health Inequalities Measurement [48] and the reader is directed to this reference for further reading as they fall outside of the scope of this section. The choice of data sources will be governed partly by the type of study being carried out. Broader “macro” studies will typically use large-sample household surveys, such as the UK’s General Household Survey carried out by the Office for National Statistics or the British Household Panel Survey carried out by the UK Longitudinal Studies Centre at the University of Essex. Both surveys collect and collate information at an individual and household level on a range of socio-economic, socio-demographic variables and include information on health status and use of health services. An international example of such a survey would be the ECHP, which, using a standardised questionnaire, annually interviews a representative sample of households and individuals in each of 14 member countries. More specific “micro” studies use one of three possible types of data as defined by Cookson et al. [28]: • Specialist small-sample patient survey data, with detailed information on condition-specific need • Administrative data on the use of specific procedures linked at individual level to detailed information on socio-economic status and need from patient records or specialist surveys • Administrative data linked at small area level to socio-economic, demographic and other small area statistics – often with no information about need other than population demographics Example of a specialist small-sample patient survey data would include a study by Malin et al. [50]. This study analyses adherence to quality measures for cancer care, including delivery of care specifically for patients with a new diagnosis of either stage I–III breast cancer or stage II or III colorectal cancer. The incorporation of clinical domains representative of the entire patient episode is unique: diagnostic evaluation, surgery, adjuvant therapy, management of treatment toxicity and post-treatment surveillance. Further examination was made of eight components of care integral to these clinical domains: testing, pathology, documentation, referral, timing, receipt of treatment, technical quality and respect for patient preferences. In all, adherences to respectively 36 and 25 explicit quality measures with clinically detailed eligibility criteria, specific to the process of cancer care, were identified for breast and colorectal cancer. Overall
187
adherence to these quality measures was 86% (95% CI 86–87%) for breast cancer patients and 78% (95% CI 77–79%) for colorectal cancer patients. Subgroup analysis across the clinical domains and components of care did, however, identify significant adherence variability: 13–97% for breast cancer and 50–93% for colorectal cancer. Detailed information on condition-specific need was determined from hospital cancer registries, patient surveys and relevant medical records. Example of the second type of data would include the Health Survey for England, which is a series of specialist annual surveys. It contains a “core” component of information such as socio-economic and socio-demographic details, general health and psycho-social indicators, use of health services and measurement of height, weight and blood pressure. In addition, each year, there is a specialist module on a single topic, several topics or about certain population groups, which are assessed using directed questionnaires, physical measurements as well as other relevant objective measures such as analysis of blood samples, echocardiogram readings and lung function tests. Specialist modules to date have looked at cardiovascular disease, asthma and accidents, and children, the elderly and ethnic groups [51]. HES is a data warehouse that contains information about hospital admissions and outpatient attendances in England [8]. HES is representative of administrative data which can be linked to small area demographic statistics, but does not contain specific data on need. Admitted patient care is available from 1989 to 1990 and outpatient care from 2003 to 2004 onwards. HES is derived from the patient administration systems of health care providers. Most recently, responsibility for collation of HES was given to the National Programme for IT’s Secondary User Survey, a recently developed secure data environment which will facilitate actions including health care planning, public health, benchmarking and performance improvement. Although HES does not directly collect data on delivery of health care, the patient level record of admissions, operations and diagnosis that it does collate allow us to begin to look for areas of potential inequality which warrant further investigation. HES also uses a unique patient identifier and includes postcode data; this allows for data within HES to be linked to other geographically defined variables such as deprivation scores. In addition to HES, there also exist broader performance measures of hospital activity collated by the Department of Health as part of their required returns.
188
These Hospital Activity Statistics [52] look more directly at processes concerned with delivery of health care and measure factors such as: • Waiting times of patients with suspected cancer and those subsequently diagnosed with cancer at NHS Trusts in England • Waiting times for operation, elective admission, diagnostics and first outpatient appointment at both the provider and commissioner level • Bed availability and occupancy rates • Day care provision • Cancelled operations • Critical care facilities Initiatives that are in the process of being fully implemented, such as choose and book [53] and 18 week patient pathway programme [54], will all provide more data relevant to identifying inequality in health care delivery. Many of the datasets that are accessible for research into inequality in health care delivery have arisen from governmental policy aimed at either tackling inequality or improving the quality of health care by setting targets against which health care providers’ performance is compared. An increasingly highly competitive NHS under increasing scrutiny from both government and the public to improve transparency and accountability has resulted in the generation and availability of large amounts of information, some of which will now be highlighted. In response to the government’s national inequalities targets for life expectancy and infant mortality outlined in 2001 [55], the London Health Observatory on behalf of the Association of Public Health Observatories developed the local basket of inequalities indicators [56]. This incorporated an extensive review of hundreds of health inequality measures and indicators available in the public domain; they were then compared against explicit criteria targeted at improving their effectiveness in benchmarking, thereby assisting local areas with monitoring progress towards reducing health inequalities. 70 indicators were grouped into 13 categories; two of these “access to local health and other services” and “tackling the major killers” relate in whole or in part to inequality in health care delivery. The basket of indicators provides a framework against which inequality of health care delivery can be assessed, but the authors acknowledged that the indicators are more weighted towards health outcomes and determinants and that there is a paucity of indicators closely related to delivery that can be expected to change in the short term.
E. Mayer and J. Flowers
The QOF [57] was first initiated in 2003 and contains a set of targets across four domains: clinical, organisational, patient experience and additional services, against which primary care practices are assessed and subsequently financially remunerated. The QOF data can be linked at practice level to secondary care data and local population socio-economic and demographic data, at a Primary Care Trust level, to establish differences between primary care practice performance and patient outcomes [58]. This data source may allow for future analysis of the interaction between primary and secondary care sectors and subsequent health care delivery. Perhaps the most important part of QOF is the development of disease registers for a range of chronic diseases which give a good estimate of disease prevalence and useful proxy measure of need, which can help health equity audits.
15.3.4 Limitations of Data Sources Much of the limitation surrounding available data sources stems from the ambiguity surrounding definitions of “need,” “access,” “utilisation” and the indicators which best represent these definitions; this is most pertinent for “need” which has both objective and subjective components. This inevitably leads to difficulties in comparing the results of studies due to their varying definitions and indicators used. The comparison of studies over time that use identical data sources is also complicated by changes in geographical boundaries of health care sectors, such as that seen in the UK in 2006, when Strategic Health Authorities and Primary Care Trusts were reduced from 28 to 10 and 303 to 152, respectively. While this makes it difficult to compare the results of data sources, and therefore, studies either side of these changes, it can also complicate the linkage of data sources. Even at a hospital level, changes in the commissioning process can “redefine” the catchment population of a Trust, and therefore, change the socio-demographic groups that it provides for. There are two recognised concerns with the use of administrative databases for inequality research. The first is with the potential for coding error that can obviously affect the validity of results. Although this was legitimate in previous years, it is true to say that we have in recent years seen much better coding completeness as a result of greater number of studies using the datasets and
15
How to Measure Inequality in Health Care Delivery
so identifying inadequacies and also initiatives such as payment by results, which have incentivised health care providers to pay more attention to the coding process and its accuracy. The second limitation of administrative datasets is the paucity of data related to appropriateness of care or case-mix adjustment. Techniques to overcome the case-mix adjustment issue have been developed using the Charlson index, although it was not initially developed for this purpose. Appropriateness of care has been dealt with by some US studies by using the linked data of the Surveillance Epidemiology and End Results Program of the National Cancer Institute with the administrative Medicare files database of health care services contact. This allows for adjustment of data with respect to cancer grade and stage, and therefore, to explore if disparities in health care delivery such as mammographic screening affects overall and stage-specific survival [59]. In the UK, 12 population-based cancer registries collect information about new cases of cancer and produce statistics about incidence, prevalence, survival and mortality. Better future linkage of these cancer registries with administrative datasets such as HES will help to further research into the effectiveness of local cancer service provision. There are acknowledged limitations with the data sources currently used for inequality research, but not all of them are as a direct result of the process of data collection, and merely reflect the difficulties in agreeing on consensus definitions of the endpoints to be measured. Resolution of this issue alone will significantly improve future research efforts.
189
continuous or categorical fashion and whether the dispersion of health care delivery is measured in a univariate or bivariate manner. Summary measures are a useful tool to display health inequality, its extent and variation. The more commonly used measures will be described in more detail below. For a more detailed discussion and mathematical description of measures of health inequalities, the reader is referred to references [47, 60].
15.4.1 Health Gap Analysis Health gaps are measures of the relative or absolute difference in health status between different groups in the population. They can be reported in a number of ways, including a range (difference between the “best” and “worst” or highest and lowest) and relative rates (rate ratio), e.g. relative mortality rates between two health care providers for a specified operation and absolute differences, with the national or population average used as the comparator. Figure 15.8 illustrates the various methods of reporting for health gap analysis. Health gap analysis in this form uses a categorical approach for the
Proportion of population
A
B C
D
15.4 Methods for Measuring Health Care Inequality The methods developed and used for measuring inequality have been applied to the analysis of health outcome or indexes of health, with little application for measuring health care delivery. They are, however, directly transferable and the indications and limitations for the different statistical approaches hold true. Inequality can either be measured at a single point in time, or changes can be measured over time. It is important to remember that if a health indicator is measured over time, it must be done so with reference to a comparator, otherwise health gain and not inequalities are being assessed. The methodological approach used will be guided by whether the health measure of interest is handled in a
Fig. 15.8 Demonstrates the distribution of a simulated health measure throughout the population. The range is the difference between highest and lowest (B–A), the rate ratio can be described by B/A and the absolute difference with an average “national” comparator can be described as D–C. Source: Eastern Region Public Health Observatory. http://www.erpho.org.uk/Download/ Public/6949/1/Measuring%20and%20monitoring%20 health%20inequalities%20locally%20version%202.doc. Accessed February 2008
190
health measure of interest and the dispersion of health care is handled in a univariate manner. Health gap analysis can also be reported for a health measure categorised into groups in conjunction with a secondary variable, i.e. handled in a bivariate manner. Under these conditions, it is typical to report the ratio of or difference between the rates of each group. Bivariate handling of data can also be used to measure differences among groups by means of multilevel analysis, where the unit of observation is the person, but groups’ variables are included [61]. A degree of caution needs to be exercised when interpreting health gap analysis. The absolute difference may vary even when the relative difference is constant. This can be illustrated by four trusts (W, X, Y and Z) that report on their readmission rates following total hip replacement. For illustration purposes, the readmission rates are as follows: Trust W 10%, Trust X 20%, Trust Y 2% and Trust Z 4%. The relative differences between Trusts W and X and Y and Z are both 2 (20/10 and 4/2). However, in absolute terms, Trust Z only has a 2% higher readmission rate than Trust Y, whereas Trust X has a 10% higher rate. In addition, situations can arise whereby as the absolute difference decreases (such as when the frequency of the health outcome is low), the relative difference increases. If we think about the same four trusts reporting on their mortality rate following coronary artery bypass surgery: reported rates are Trust W 0.5%, Trust X 0.25%, Trust Y 2% and Trust Z 3%. While comparing Trust Y with Trust Z, there is an absolute difference of 1%, but a relative difference of 1.5. Trust W and X, however, have an absolute difference of only 0.25%, but a relative difference of 2. Authors have suggested, therefore, that it might be preferable to report both absolute and relative differences [60].
E. Mayer and J. Flowers
15.4.2.1 Lorenz Curve and Gini Coefficient The Lorenz curve is derived by plotting cumulative health share against cumulative population share, (Fig. 15.9), where the values are ranked in order of rate and are, therefore, a univariate measure of inequality in a distribution of health. If resources were equally distributed throughout the population, the bottom 20% of the population would have 20% of the resource, 40% of the population, 40% of the resources and so on. This is represented by the diagonal line on the Lorenz curve. Unequal distributions will have a curve – the nearer the curve to the diagonal, the greater degree of equality; the further away, the more inequality. If we are plotting say, deaths against population, the slope of the Lorenz curve is the death rate. The Gini coefficient denoted as “G” is a numerical summary of the Lorenz curve. Looking at Fig. 15.9, the coefficient is calculated as A/(A + B). The resulting value is between 0 and 1, where 0 represents perfect equality and 1 perfect inequality. 15.4.2.2 Concentration Curve and Concentration Index The concentration curve is a variation on the Lorenz curve and likewise treats the unit of interest as a Cumulative % health
15.4.2 Share-Based Measures A
Share-based measures are summary measures of the whole distribution of a resource, i.e. treating the unit of interest as a continuous variable. They compare the cumulative proportion of the resource with the cumulative population among which that resource is shared. As for health gap analysis, share-based measures can either be univariate or linked to a secondary variable such as socio-economic deprivation to make then bivariate.
B
Cumulative % population
Fig. 15.9 Lorenz curve generated by plotting cumulative health measure (y-axis) against cumulative population (x-axis). Gini coefficient derived with formula A/(A + B)
15
How to Measure Inequality in Health Care Delivery
continuous variable. It plots cumulative health share against cumulative population and the values are ranked by an external variable, instead of in order of rate as for the Lorenz curve. The external variable is usually, but not necessarily, decreasing deprivation or socio-economic status. This method of assessment, therefore, considers the measure of dispersion in a bivariate fashion. The concentration index (C), calculated in the same way as the Gini coefficient, can take values between −1 and + 1. This index summarises the socio-economic (SE) (or whichever external variable is chosen) gradient in the health measure of interest. A value of −1 indicates that all the health/ ill -health is concentrated in the worst off, + 1 shows an inverse SE gradient and 0 shows no SE gradient. A negative C value corresponds to the curve sitting above the diagonal and vice versa. The further the curve is from the diagonal, the greater the degree of health inequality.
15.4.2.3 Slope and Relative Index of Inequality (SII and RII) This has been used to summarise SE gradients and is calculated from a regression line drawn through a health measure stratified by a measure of socio-economic status, e.g. social class. It is mathematically related to the concentration index that has been described above. The SII differs from the concentration curve in its construction because it plots a measure of health (e.g. mortality rate) of defined population groups (e.g. primary care trusts) against the relative ranking of that group according to a deprivation indicator (the relative rank is calculated to be a value between 0 and 1). A regression line is then calculated and used to define the absolute difference (health gap) in the health measure chosen across all groups (the estimated SII). A relative gap (RII) can be calculated by dividing the absolute health gap into the average level of health across all groups and is usually expressed as a percentage value [62].
191
population subgroup) is already incorporated into the denominator (the overall population). This has been described as overlap and means that two independent quantities are not being compared [63]. As described by Hayes and Berry, ignoring this overlap between the subgroup and overall population can have profound effects on results “… the significance tests are conservative and correspondingly the confidence intervals are too wide. If the result is statistically significant ignoring overlap, it is even more significant after allowing for the overlap, whilst a non-significant result could become significant at a specified significance level”. Adjusting for overlap is relatively simple if a complete dataset is available, by comparing the subgroup with the remainder of the population. If a complete dataset is not available, a correction factor can be used to adjust for the overlap effect. Under certain circumstances (the result is already significant or the subgroup is <10% of the whole population), correction for overlap is less important [63]. The importance of differences among univariate measures of inequality has been discussed by Williams and Doessel [46] who concluded that “it is often important in measuring inequality to report several measures/ indexes, within the constraints of the data available, and to examine the strengths and weaknesses of each measure. In so doing, the nature of the inequality is depicted more accurately, and one can weigh equity judgements more wisely, than is possible by emphasising any single measure of inequality”. By considering lessons learnt from economic inequality research, Williams and Doessel discuss how depending on the distribution measure used, different and sometimes conflicting rankings in terms of inequality can result. The use of only a single measure of inequality can, therefore, produce misleading results. In contrast, Manor et al. [64] had previously shown that three different bivariate measures of inequality produced similar results when analysing data across six health measures. The indicator of social position, however, appeared to be of more substantive importance demonstrating consistent differences [64].
15.4.3 Methodological Limitations 15.5 Conclusions As discussed for health gap analysis, it is common to compare the health of a group of patients or a subgroup of the population with the overall population. Consideration needs to be given to the fact that the numerator (the
Research into health care inequality is an integral component in improving the quality of care that our patients receive and the non-random distribution of health in the
192
population provides us with the opportunity to understand it better. Although inequalities in health tend to originate outside the health care system, mechanisms of health care delivery can either worsen or improve inequality that might exist. Indeed, three out of four proposed defining components of equity in health care, equality of utilisation, distribution according to need and equality of access, are directly linked with health care delivery. Research needs to encompass all dimensions of inequality in health care delivery, namely patient, primary care and secondary care characteristics. Future research in this area needs to be done under the guidance of a framework that helps define several ambiguous concepts such as “need”, “access”, “utilisation”, “demand” and “supply”, and incorporates the methods for measuring health care inequality including indications and limitations for the different statistical approaches. In doing so, contradictions in terms of the direction of association or whether or not a true association exists, as a result of methodological differences between existing studies, will be reduced. This will allow for more focused research to identify likely causal relationships that can then be used to critically appraise health care delivery with the aim of reducing inequality and inequity in health and simultaneously increasing the quality of care for a greater proportion of the population.
References 1. Whitehead M (1992) The concepts and principles of equity and health. Int J Health Serv 22:429–445 2. World Health Organisation. Health impact assessment (HIA) – Glossary of terms. Available at http://www.who.int/ hia/about/glos/en/index1.html 3. International Society for Equity in Health. Working definitions. Available at http://www.iseqh.org/workdef_en.htm 4. Braveman P (2006) Health disparities and health equity: concepts and measurement. Annu Rev Public Health 27: 167–194 5. Starfield B (2002) Equity and health: a perspective on nonrandom distribution of health in the population. Rev Panam Salud Publica 12:384–387 6. Wagstaff A, van Doorslaer E (1998) Equity in health care finance and delivery. In: Culyer AJ, Newhouse JP (eds) Handbook of health economics. Elsevier 7. Aday LA FG, Anderson RM (1984) An overview of current access issues. In: Access to medical care in the US: who have it, who don’t. Pluribus Press/University of Chicago, Chicago 8. Hospital Episode Statistics. Health and Social Care Information Centre. Available at www.hesonline.nhs.uk/ Ease/servlet/DynamicPageBuild?siteID = 1802&categoryID = 62. Accessed April 2007
E. Mayer and J. Flowers 9. Culyer AJ, Wagstaff A (1993) Equity and equality in health and health care. J Health Econ 12:431–457 10. Robert Gordon University A. An introduction to social policy – Health Policy. Available at http://www2.rgu.ac.uk/publicpolicy/introduction/healthf.htm 11. Ibrahim SA, Thomas SB, Fine MJ (2003) Achieving health equity: an incremental journey. Am J Public Health 93: 1619–1621 12. Kyffin RGE, Goldacre MJ, Gill M (2004) Mortality rates and self reported health: database analysis by English local authority area. BMJ 329:887–888 13. Donabedian A (1966) Evaluating the Quality of Medical Care. Milbank Mem Fund Q 44:166–203 14. Donabedian A (1980) Exploration in quality assessment and monitoring: definition of quality and approaches to its assessment. M Health Administration Press, Ann Arbor 15. Donabedian A (1990) The seven pillars of quality. Arch Pathol Lab Med 114:1115–1118 16. Schiff GD et al (2001) Beyond structure-process-outcome: Donabedian’s seven pillars and eleven buttresses of quality. Jt Comm J Qual Improv 27:169–174 17. Fiscella K, Franks P, Gold MR et al (2000) Inequality in quality: addressing socioeconomic, racial, and ethnic disparities in health care. JAMA 283:2579–2584 18. European Public Health Alliance (2008) What role for health care systems in reducing health inequalities? Available at http://www.epha.org/a/1907. Accessed March 2008 19. Goddard M and Smith P (2001) Equity of access to health care services: theory and evidence from the UK. Soc Sci Med 53:1149–1162 20. Mackenbach JP (2003) An analysis of the role of health care in reducing socioeconomic inequalities in health: the case of the Netherlands. Int J Health Serv 33:523–541 21. van Doorslaer E, Wagstaff A, Bleichrodt H et al (1997) Income-related inequalities in health: some international comparisons. J Health Econ 16:93–112 22. van Doorslaer E, Koolman X, Jones AM (2004) Explaining income-related inequalities in doctor utilisation in Europe. Health Econ 13:629–647 23. d’Uva TB JA, van Doorslaer E (2007) Measurement of horizontal inequity in health care utilisation using European panel data. Tinbergen Institute Discussion Paper, Tinbergen Institute, pp 1–28 24. van Doorslaer E, Cristina M (2004) Income-related inequality in the use of medical care in 21 OECD Countries. OECD Health Working Papers, No. 14. OECD Health, Paris 25. Morris S, Sutton M, Gravelle H (2003) Inequity and inequality in the use of health care in England: an empirical investigation. CHE Technical Paper Series 27, Centre of Health Economics 26. Morris S, Sutton M, Gravelle H (2005) Inequity and inequality in the use of health care in England: an empirical investigation. Soc Sci Med 60:1251–1266 27. Milner PC, Payne JN, Stanfield RC et al (2004) Inequalities in accessing hip joint replacement for people in need. Eur J Public Health 14:58–62 28. Cookson R, Dusheiko M, Hardman G (2006) Socio-economic inequality in small area use of elective total hip replacement in the English NHS in 1991 and 2001. Centre for Health Economics, University of York, CHE Research Paper 15 29. Smaje C, Grand JL (1997) Ethnicity, equity and the use of health services in the British NHS. Soc Sci Med 45: 485–496
15
How to Measure Inequality in Health Care Delivery
30. Shah SM, Cook DG (2008) Socio-economic determinants of casualty and NHS Direct use. J Public Health (Oxf) 30: 75–81 31. Allin S, Masseria C, Mossialos E (2006) Inequality in health care use among older people in the United Kingdom: an analysis of panel data. LSE Health, London 32. Shaw M, Maxwell R, Rees K et al (2004) Gender and age inequity in the provision of coronary revascularisation in England in the 1990s: is it getting better? Soc Sci Med 59:2499–2507 33. Farmer J, Iversen L, Campbell NC et al (2006) Rural/urban differences in accounts of patients’ initial decisions to consult primary care. Health Place 12:210–221 34. Robertson R, Campbell NC, Smith S et al (2004) Factors influencing time from presentation to treatment of colorectal and breast cancer in urban and rural areas. Br J Cancer 90: 1479–1485 35. Dixon T, Shaw ME, Dieppe PA (2006) Analysis of regional variation in hip and knee joint replacement rates in England using Hospital Episodes Statistics. Public Health 120: 83–90 36. Flowers J. Measuring health inequalities II. 2003. Available at http://www.erpho.org.uk/viewResource.aspx?id = 6949 37. Chaturvedi N, Ben-Shlomo Y (1995) From the surgery to the surgeon: does deprivation influence consultation and operation rates? Br J Gen Pract 45:127–131 38. Hippisley-Cox J, Pringle M (2000) Inequalities in access to coronary angiography and revascularisation: the association of deprivation and location of primary care services. Br J Gen Pract 50:449–454 39. Saxena S, Car J, Eldred D et al (2007) Practice size, caseload, deprivation and quality of care of patients with coronary heart disease, hypertension and stroke in primary care: national cross-sectional study. BMC Health Serv Res 7:96 40. Ashworth M, Armstrong D (2006) The relationship between general practice characteristics and quality of care: a national survey of quality indicators used in the UK Quality and Outcomes Framework, 2004–5. BMC Fam Pract 7:68 41. Cram P, Vaughan-Sarrazin MS, Wolf B et al (2007) A comparison of total hip and knee replacement in specialty and general hospitals. J Bone Joint Surg Am 89:1675–1684 42. Cram P, Vaughan-Sarrazin MS, Rosenthal GE (2007) Hospital characteristics and patient populations served by physician owned and non physician owned orthopedic specialty hospitals. BMC Health Serv Res 7:155 43. Cram P, Rosenthal GE, Vaughan-Sarrazin MS (2005) Cardiac revascularization in specialty and general hospitals. N Engl J Med 352:1454–1462 44. Nallamothu BK, Rogers MA, Chernew ME et al (2007) Opening of specialty cardiac hospitals and use of coronary revascularization in medicare beneficiaries. JAMA 297: 962–968 45. Dixon A, Le Grand J, Henderson J et al (2003) Is the NHS equitable? A review of the evidence. Discussion Paper Number 11, LSE Health and Social Care 46. Williams RF, Doessel DP (2006) Measuring inequality: tools and an illustration. Int J Equity Health 5:5
193 47. Regidor E (2004) Measures of health inequalities: part 1. J Epidemiol Community Health 58:858–861 48. Carr-Hill R, Chalmers-Dixon P (2005) The public health observatory handbook of health inequalities measurement. Association of Public Health Observatories 49. Le Grand J (1991) Equity and choice: an essay in economics and applied philosophy. Harper Collins Academic, London 50. Malin JL, Schneider EC, Epstein AM et al (2006) Results of the National Initiative for Cancer Care Quality: how can we improve the quality of cancer care in the United States? J Clin Oncol 24:626–634 51. Department of Health. Health survey for England. Available at http://www.dh.gov.uk/en/Publicationsandstatistics/ PublishedSurvey/HealthSurveyForEngland/index.htm. March 2008 52. Department of Health. Hospital activity statistics. Available at http://www.performance.doh.gov.uk. 53. Connecting for Health. Choose and book. Available at http:// www.chooseandbook.nhs.uk/. 54. Department of Health. 18 weeks – delivery the 18 week patient pathway. Available at http://www.18weeks.nhs.uk/ Content.aspx?path = /. 55. Department of Health. Health inequalities. Available at http://www.dh.gov.uk/en/Publichealth/Healthinequalities/ index.htm. February 2008 56. London Health Observatory. Health inequalities – Local basket of indicators data retrieval tool. Available at http://www. lho.org.uk/HEALTH_INEQUALITIES/Basket_Of_ Indicators/BasketOfIndicatorsDataTool.aspx. February 2008 57. Department of Health. Quality and outcomes framework. Available at http://www.dh.gov.uk/en/Healthcare/Primarycare/ Primarycarecontracting/QOF/index.htm. February 2008 58. Downing A, Rudge G, Cheng Y et al (2007) Do the UK government’s new Quality and Outcomes Framework (QOF) scores adequately measure primary care performance? A cross-sectional survey of routine healthcare data. BMC Health Serv Res 7:166 59. Curtis E, Quale C, Haggstrom D et al (2008) Racial and ethnic differences in breast cancer survival: how much is explained by screening, tumor severity, biology, treatment, comorbidities, and demographics? Cancer 112:171–180 60. Regidor E (2004) Measures of health inequalities: part 2. J Epidemiol Community Health 58:900–903 61. Diez Roux AV, Merkin SS, Arnett D et al (2001) Neighborhood of residence and incidence of coronary heart disease. N Engl J Med 345:99–106 62. Low A, Low A (2004) Measuring the gap: quantifying and comparing local health inequalities. J Public Health (Oxf) 26:388–395 63. Hayes LJ, Berry G (2006) Comparing the part with the whole: should overlap be ignored in public health measures? J Public Health (Oxf) 28:278–282 64. Manor O, Matthews S, Power C (1997) Comparing measures of health inequality. Soc Sci Med 45:761–771
The Role of Volume–Outcome Relationship in Surgery
16
Erik Mayer, Lord Ara Darzi, and Thanos Athanasiou
Contents Abbreviations ..................................................................... 195 16.1
Introduction ............................................................ 195
16.2
Methodological Framework for Assessing Volume–Outcome Relationship ............................. 197
16.2.1 Data Sources and Data Quality ................................ 197 16.2.2 Data Presentation ..................................................... 197 16.2.3 Methodological Limitations ..................................... 199 16.3
Outcome Measures ................................................. 201
16.3.1 Morbidity/Mortality/Length of Stay/ Re-Admission Rates ................................................. 201 16.3.2 Limitations of Case-Mix Adjustment....................... 201 16.3.3 Quality of Life Measures ......................................... 202 16.4
The Influence of the Surgeon and/ or Institution ........................................................... 202
16.4.1 The Role of the Surgeon........................................... 202 16.4.2 The Role of the Institution ....................................... 203 16.5
Immeasurable Factors ........................................... 203
16.6
Public Health Implications .................................... 204
16.6.1 Policy Change and Healthcare Restructuring........... 204 16.6.2 Research and Ethical Implications ........................... 205 16.7
Conclusion............................................................... 205
Abbreviations HES U.K. U.S.
Hospital Episode Statistics United Kingdom United States
Abstract The chapter considers the complex interaction of numerous factors that determine the nature of any volume–outcome relationship within surgery. The methodological basis for assessing the volume– outcome relationship in surgery and some of the limitations that surround it are discussed. Commonly used outcome measurements are explored, with their possible alternatives and consideration made of the interaction between the surgeon’s volume and institutional volume and their effect on patient outcome. Finally, we examine the public health impact and health policy implications of incorporating volume–outcome relationship research into health service provision. The need for future research in this area to be conducted under the guidance of a methodological framework is justified.
References ........................................................................... 206
16.1 Introduction
E. Mayer () Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, QEQM Building, St Mary’s Hospital Campus, Praed Street, London, W2 1NY, UK e-mail: [email protected]
It has long been postulated that improved outcomes in Healthcare can result from treating greater numbers of patients and are explained by “practice makes perfect” [15]. Over the last 10 years, we have seen an acceleration of more formal research in this area and particularly within the surgical specialities as a result of their interventional nature. Numerous studies have reported on a higher volume, better outcome association with
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_16, © Springer-Verlag Berlin Heidelberg 2010
195
196
E. Mayer et al. Surgeon Skill
Patient Selection
Patient Co-morbidities /Case-mix
Processes of Care
Skills of other clinicians and affiliated healthcare professionals
Patient Outcomes
Institutional Performance
Fig. 16.1 Conceptual framework: how could volume affect quality? Adapted from reference [11]
the implication that the quality of care that patients receive can be greatly influenced by this relationship. Despite the considerable evidence, many uncertainties remain about the true relationship. Some low-volume surgeons and/or institutions have excellent outcomes and some high volume surgeons and/or institutions have poor outcomes. An alternative explanation for the volume–outcome relationship may therefore be that Healthcare providers, either at an institutional level or surgeon level, that display better outcomes receive more referrals and resultantly treat greater volumes of patients. This is known as the selective-referral hypothesis [15]. Volume, therefore, does not automatically result in better outcomes for patients and as such is an inexact indicator of the quality of care. On its own, volume is acting as a surrogate marker for the other numerous and complexly interacting factors within a patient’s treatment episode that combine to determine their outcome, favourable or not (Fig. 16.1). We are seeing an increasing trend towards more detailed performance monitoring of surgeons and surgical Healthcare providers. More so, this collated data are not solely for “internal” consumption, but are being made publicly available to allow better informed patients to exercise a degree of choice as to where they receive
their treatment. A number of organisations, including the Healthcare Commission and Center for Medical Consumers, produce data at a surgeon and institutional level which report on a number of operations performed and outcomes of those operations, although a direct link between the two is not advertised. The internet resources for these organisations and other useful resources on the volume–outcome relationship in surgery can be found in Table 16.1. There are vast implications to the patient and Healthcare providers alike of acting upon the published evidence in this area. When clinicians are appraising the volume–outcome relationship literature, in order to practise evidence-based surgery, three fundamental questions need to be answered [13]: 1. Are the results of the study valid? Is the database accurate? How was the volume determined? Was the volume analysed as a continuous or a categorical variable? If categorical, were volume groups determined a priori? Is the primary outcome appropriate? Are patients the same among volume groups? How were the data analysed? Was multivariable analysis used to adjust for important prognostic differences? Was a sensitivity analysis performed to test for statistical robustness? 2. What are the results? What was the magnitude of the results? How precise was the estimate of the treatment effect? 3. Will the results help me care for my patient? In this chapter, we will discuss the methodological basis for assessing the volume–outcome relationship in surgery and some of the limitations that surround it. We
Table 16.1 Internet resources for the volume–outcome relationship in surgery Organisation URL Center for Medical Consumers
www.medicalconsumers.org
Centre for Multilevel Modelling
www.cmm.bristol.ac.uk
Department of Health – Patient Reported Outcome Measures
www.dh.gov.uk/en/Publicationsandstatistics/Publications/ PublicationsPolicyAndGuidance/DH_4124266
Healthcare Commission
www.healthcarecommission.org.uk
Institute of Medicine
www.iom.edu
Leapfrog Group
http://www.leapfroggroup.org/media/file/2007_Survey_Fact_Sheet_links.doc
16 The Role of Volume–Outcome Relationship in Surgery
will explore the commonly used outcome measurements with their possible alternatives and consider the interaction between the surgeon’s volume and institutional volume and their effect on patient outcome. Finally, we will examine the health policy implications of incorporating volume–outcome relationship research into health service provision.
16.2 Methodological Framework for Assessing Volume–Outcome Relationship 16.2.1 Data Sources and Data Quality The majority of research exploring the volume–outcome relationship in surgery has relied on large administrative databases for its source data, as extracting information direct from patient charts would be too timeconsuming and expensive. The inferences that we make from this research are therefore solely reliant on the accuracy of these databases and comes back to the old adage “what you put in, is what you get out”. The nature of Healthcare funding in the United States (U.S.) by provider reimbursement through private health insurance companies means that the size and comprehensiveness of the associated databases make them the most exploitable for volume–outcome research, and, as such, the majority of publications in this field originate from the U.S.. The administrative databases such as The Medicare Provider Analysis and Review Files and the Healthcare Cost and Utilization Project Nationwide Inpatient Sample can be linked to epidemiological databases such as Surveillance, Epidemiology, and End Results to improve the sophistication of the data extraction, although this use of administrative data as opposed to clinical data inhibits the degree of risk adjustment that can be performed. Equally, some administrative databases derived from managed care plan enrolees, although encompassing thousands of patients, may not be representative of the general population by virtue of only including patients older than 65. In the United Kingdom (U.K.), Hospital Episode Statistics (HES) are data routinely collected within the health service for administrative purposes and not specifically for clinical audit. It has contained admitted patient care data since its inception from 1986 onwards. Its use as a data source in health service research has
197
been limited by worries over the completeness and accuracy of the data input at a patient coding level. However, over time, the quality of the HES database has improved and particularly so since the introduction of the U.K. government’s “payment by results” initiative in 2002 as a means to provide a transparent, rulesbased system for paying Healthcare providers, which is linked to activity and adjusted for case-mix. As a result, it is being used more frequently as a more reliable source database for exploring the U.K.’s volume– outcome relationship in surgery. Indeed, there is now evidence to suggest that HES has similar discrimination to clinical databases when used in risk prediction models for death [1]. The availability of centrally collated administrative databases should not distract from the usefulness of data originating from a single centre or close network of centres, which tend towards relatively smaller caseloads. The ability in these settings to more closely quality control the data acquisition and subsequent risk adjustment and analysis overcomes some of the very issues of administrative databases.
16.2.2 Data Presentation Although studies have demonstrated a correlation between volume and outcome, it is unclear whether the relationship is continuous, step-wise, or has a single clear cut-off. (Fig. 16.2) Until this has been determined, the optimal mode of data handling display equally cannot be established. Handling volume as a continuous variable will improve the chance of detecting an outcome difference along the volume gradient and equally may reveal a cut-off volume after which there is no further change in outcomes achieved. Presenting volume as a continuous variable is typically done by scatter graph. Although methodologically superior, visual difficulties can occur when apparent correlations are seen, but are not statistically proven because of a large number of low-volume providers with a zero outcome, which are not visually obvious [5]. Volume is often therefore analysed and displayed as a categorical variable. Under these circumstances, it should be handled as a minimum of three volume groups, of approximately equal size, which should be determined prior to analysis. Display is then in the form of histogram and importantly with confidence intervals displayed.
198
E. Mayer et al.
Fig. 16.2 Possible relationships between volume and outcome (mortality). Adapted from reference [5]
Mortality
Linear relationship
Mortality
Non-linear relationship
Procedural volume
Procedural volume Step-wise relationship
Mortality
Mortality
Cut-off relationship
Procedural volume
Procedural volume
30 Trust demonstrates truly divergant performance, lying outside of the 99.8% control limit
25
Fig. 16.3 Mortality rates following paediatric surgery in under-1’s in 12 English specialist centres, using Hospital Episode Statistics 1991–1995. The target is the overall average rate of 12%. Adapted from reference [22]
Mortality(%)
20
15
10
5
0 0
100
Funnel plots have been used extensively to assess the quality of aggregated data in meta-analysis. This requires plotting a measure of study precision against treatment effects. They are therefore ideal for making an initial assessment of volume of cases (measure of precision) against outcome. The most public example of this in the U.K. surrounded the independent Bristol
200
300 Volume of Cases
400
500
600
Royal Infirmary Enquiry into paediatric cardiac deaths. By plotting the mortality rates in under 1s for paediatric cardiac surgery against caseload undertaken for each of 12 English Hospitals, it becomes evident that there is an appreciable decrease in percentage mortality as the total number of operations performed increases (Fig. 16.3). A formal statistical test of the
16 The Role of Volume–Outcome Relationship in Surgery
volume–outcome relationship can then be performed using regression analysis with an appropriate error structure. This is described in more detail by Spiegelhalter [22]. The basic structure of a funnel plot is no more than a scatter graph. Unique is the construction of control limits as a function of the measure of precision and these do not depend on the data being plotted. They are formed by calculated control limits and typically represent 2 and 3 standard deviations. As the volume of cases (denominator) increases, the control limits narrow and a “funnel” shape forms. Any plotted data point that lies within the control limits is said to be acting under common cause variation, and as such, its performance is within acceptable normal variation. Any data point lying outside of the control limits (see Fig. 16.3) is acting under special cause variation and as such is a “true” outlier. The implications of this are that such outliers need to be carefully considered before inclusion in volume–outcome correlations as they will inevitably have a significant impact, which may only be the result of extenuating circumstances (e.g. methodological errors).
16.2.3 Methodological Limitations The quality of the data and methods used to analyse it will necessarily affect the results and substance of any conclusions. The quality of commonly used data sets has already been discussed and is generally well recognised in the literature. More recently, there has been a growing awareness of the methodological limitations of data handling and presentation. The majority of studies exploring the volume–outcome relationship in surgery support an inverse relationship, such that the greater number of cases performed by a Healthcare provider improves the patient-related outcome, typically reported as mortality. The magnitude and degree of statistical significance of the reported correlation between studies, however, vary considerably and much of this variation can be attributed to heterogeneous study design. The Institute of Medicine (IOM) held a workshop in 2000 to review the current understanding of the relationship between volume of health services and health-related outcomes [11]. As part of this, Halm et al. created a quantitative method of assessing the research design of volume–outcome studies, such that higher scores would reflect increasing likelihood of the study’s ability to discern generalisable conclusions
199
about the nature and magnitude of the relationship between volume and outcome. The designed scoring system assessed 10 integral methodological criteria, including representativeness of the dataset, risk adjustment, clinical processes of care assessment and methods of volume analysis [8] (Fig. 16.4). The authors then scored 88 studies extracted by systematic review of the literature. Possible total quality scores ranged from 0 to 18. The mean total quality score was 7.8 ± 1.9, with a median score of 8 (interquartile range 6–9). Only 18% of studies achieved a score greater than 10. A similar process was subsequently repeated for studies assessing volume–outcome relationships for oncological operations between 1984 and 2004 [14]. Again no study scored greater than 11 out of a possible 18 with the vast majority scoring between 7 and 10. The implication of this finding is that at best, existing study design is only modest, which limits the transferability of the studies reported correlations to clinical practice. It also raises the question whether much of the volume–outcome relationship will be voided by more methodologically robust research? Volume, being easily measured, is used as a proxy for the expertise of a surgeon or institution. It is assumed that any volume-categorised differences in outcome are the result of better surgical practices resulting from undertaking more cases and so improving expertise. Many studies use standard statistical methods that assume patients to be independent observations, whereas the outcomes of patients treated by the same provider (irrespective of volume treated) are likely to experience similar outcomes. Indeed the potential for this phenomenon, known as clustering, to occur is greater the smaller the provider caseload. Similarly, standard statistical methods do not consider that outcomes between providers with similar volumes can vary considerably. These factors will serve to falsely accentuate any derived volume–outcome relationship. Example of this was published by Panageas et al. [18] who performed a re-analysis of three previously published volume–outcome studies using statistical methods for analysing clustered data: a random-effects model and generalized estimating equations. They demonstrated that for colectomies, prostatectomies and rectal cancer surgery, attenuation of the volume–outcome relationship occurred, depending on the outcome measured, when adjustment was made for clustering in addition to case-mix and volume of operations performed. The demonstration of clustering on its own is an important entity and the authors provide an example
200
Characteristic
Question 1.
2.
Score
Representativeness of sample: Not Representative Representative
0 1
Number of hospitals or Surgeons Hospitals < 20 and Surgeons < 50 Hospitals 20 or Surgeons 50 Hospitals 20 and Surgeons 50 < <
Fig. 16.4 Scoring system for rating the quality of research on volume–outcome relationship. Adapted from reference [8]
E. Mayer et al.
0 1 2
< 1000 1000
0 1
< 20 21 - 100 >100
0 1 2
Hospital or Surgeon Both Separately Both Together Both Together + further component
0 1 2 3
<
<
3.
Total sample size (cases) <
4.
5.
6.
7.
8.
9.
No. of Adverse Events
Unit of analysis
Appropriateness of patient selection Not measured Measured separately Measured and analysed
0 1 2
Two categories Multiple categories
0 1
None Administrative Data only Clinical data only clinical + 0 1 2 3 Clinical data + C >.75 and H/L test positive
0
Volume
Risk Adjustment
Clinical processes of care Not measured One 2
0 1 2
Death only Death + other
0 1
<
10.
1 2 3
Outcomes
of a simple but plausible explanation “Some colon cancer surgeons are more likely to perform colostomies (thereby potentially avoiding anastomotic leaks and postoperative infections), while others are substantially more likely to attempt primary re-anastomosis. This variation in surgical practice leads directly to observed variation in these outcomes when analyzed on a surgeon-by-surgeon basis”. Consideration of the presence of clustering is, therefore, an important aspect to
volume–outcome relationship assessment and will go some way to identifying unexplained variation in outcomes. It likely reflects important differences in processes of care that will provide opportunities for improving quality of care. The variability in the observed outcome will naturally be greater the fewer times it occurs, i.e. for lowvolume providers. For high volume providers, the observed outcome, having occurred a sufficiently large
16
The Role of Volume–Outcome Relationship in Surgery
number of times (as determined by the underlying true rate of occurrence), is likely to be more representative of the underlying true rate, thereby displaying less variability. So when analysing and presenting crude observed outcomes, there will be bias against low-volume providers as the observed outcome will vary substantially from the true underlying outcome rate, often as the result of a few outliers. Some of this bias can be removed by considering the observed outcome as a rate of the expected outcome, a calculated estimate of the true outcome rate. As applied to mortality, this ratio of observed overexpected mortality is known as the standardised mortality ratio. The calculated expected mortality rate can be adjusted for a number of confounding variables such as age, gender, race, co-morbidities, deprivation scores and volume, and as such, improve the robustness of any remaining volume–outcome relationship.
16.3 Outcome Measures 16.3.1 Morbidity/Mortality/Length of Stay/Re-Admission Rates One of the most important aspects in researching the volume–outcome relationship is identifying the outcome to be assessed. Availability of measures recorded consistently and reliably in administrative databases has meant that mortality, either inpatient or 30-day, has predominated. Mortality is however not always the best outcome measure and indeed may result in the absence of any volume–outcome relationship, when one exists for more procedure-specific outcomes [2]. Mortality is unlikely to be useful as an outcome measure if the mortality rate for the operation of interest is too low to discern differences between high- and low-volume providers. This phenomenon likely explains why the volume–outcome (mortality) relationship appears strongest for complex operations such as oesophagectomy and pancreatectomy and weakest for operations such as colectomy and carotid endarterectomy, in the existing literature. Depending on the operation of interest and its associated mortality rate, procedure-specific morbidity could be a more appropriate outcome measure to detect a volume–outcome relationship. Indeed, there is evidence to suggest that among high volume surgeons, those who performed well for one morbidity endpoint, performed well in others [3].
201
Mortality measured as long-term survival may overcome the problems described above for operations with low mortality rates. Needless to say, it is reflecting very different quality components as compared to 30-day mortality. Long-term survival will be affected by surgical technique, e.g. positive margin rates for oncological procedures, follow-up diagnostics, availability and thresholds for giving adjuvant treatment, to name a few. As a result, operations which demonstrate no volume–outcome relationship because of their low 30-day mortality can display volume–outcome relationship for long-term survival [19]. Hospital lengths of stay and 30-day re-admission rates are other important outcome measures that were infrequently assessed [9]. In the last 3–4 years, we are seeing them more frequently included [7]. They act as important outcome measures reflecting the efficiency of post-operative care pathways and intensity of discharge planning. Besides the disadvantages to the patient, prolonged length of stay and high re-admission rates carry a significant financial burden and are, therefore, politically motivated. Emergency re-admission rates can be a clinically useful indicator, especially when the risk of death following surgery is negligible. This is particularly true in the day surgery setting or the elective setting where the severity of a suitable condition of interest displays little variance. The studied populations resultantly tend towards homogeneity and assessment of the re-admission rate as an outcome measure allows comparison of like-with-like units of interest [17].
16.3.2 Limitations of Case-Mix Adjustment It is clearly advantageous to adjust outcome data for as many variables as possible, thereby eliminating confounding explanations for demonstrated correlations. Limitations in data availability will naturally limit the degree of adjustment. Administrative databases as the main data source usually record data that allow adjustment for age, sex, ethnicity, socio-economic status, method of admission and hospital size (teaching status). Adjustment for patient co-morbidities is less readily available, albeit arguably the most important. It has been reasoned in terms of methodological quality that risk adjustment only from administrative databases is of relatively inferior quality and should originate from medical records or prospectively designed clinical
202
registries [8]. The resources that would be required in order to fulfil this when dealing with thousands of administrative patient records in a retrospective fashion generally make this impractical. In order to circumnavigate this issue, much work has been done on the manipulation of administratively recorded co-morbidity variables. Using predictive modelling incorporating the Charlson co-morbidity score, it has been shown that the U.K.-based routinely collected hospital episode statistics administrative database can be used to predict risk of mortality following surgical intervention with similar discrimination to national speciality-specific clinical datasets. In this regard, performance suitably adjusted for patient case-mix can be determined [1]. In addition to adjusting for patient-orientated variables, the processes of care that a patient experiences as they progress through their treatment episode will clearly impact on the end outcomes. Existing volume–outcome related research rarely accounts for the inter-institutional variance in these processes [9]. The variability that exists has been clearly demonstrated, using patient pathways for treatment of colorectal and breast cancer as example. The study analyses adherence to quality measures incorporating clinical domains representative of the entire patient episode: diagnostic evaluation, surgery, adjuvant therapy, management of treatment toxicity and posttreatment surveillance. Further examination was made of eight components of care integral to these clinical domains: testing, pathology, documentation, referral, timing, receipt of treatment, technical quality and respect for patient preferences. Although overall adherence to these quality measures was high, significant adherence variability was identified (13–97% for breast cancer and 50–93% for colorectal cancer) [16]. As the quality of administrative databases improves both in terms of data acquisition and handling, so will our ability to adjust for a number of confounding variables. Statements concerning the existence of volume– outcome relationships will be resultantly more robust in nature.
E. Mayer et al.
where possible, assessed endpoints that directly affect a patient’s quality of life, such as long-term incontinence following radical prostatectomy [3]. There is, however, a paucity of studies that use validated health surveys which incorporate patient values using qualitative indicators (SF 36, EQ 5D). More recently, there is awareness that outcome measures which better assess the quality of life of patients following surgical intervention can add an important aspect to any outcome assessment tool. The U.K.’s Department of Health has recently completed a project which aimed to determine the feasibility of collecting pre- and postoperative patient reported outcome measures from patients undergoing elective surgery. The results of this exercise are awaited, but signify an important shift away from centrally collating and reporting “pure” clinical outcome measures, to ones that encompass patient-orientated experiences of treatment.
16.4 The Influence of the Surgeon and/or Institution Surgeons will care for their patients in ways that they feel appropriate, but will be influenced by the institution’s structure in which they work. Surgeons and institution could therefore be seen as a single influencing factor due to their integrated nature, and yet they reflect very different influences on the patients’ final outcome. A surgeon’s volume may be indicative of his or her technical skill and case selection, whereas an institution’s volume will encompass the peri-operative processes and multi-disciplinary personnel that will all impact on a patient’s quality of care, some of which the surgeon will not be able to directly influence. This issue is at the very heart of volume–outcome research as it directly questions whether volume should be defined at the institutional or surgeon level, or both.
16.4.1 The Role of the Surgeon 16.3.3 Quality of Life Measures Outcome measures such as mortality, morbidity and length of stay are used frequently because of the ease with which they can be measured and the abundance of data held in administrative databases. Studies have,
A number of studies have shown a statistically significant relationship between surgeon volume and mortality across a variety of operations [9]. The lack of adjustment for hospital volume in many cases, however, has resulted in much questioning of the true role of the
16
The Role of Volume–Outcome Relationship in Surgery
surgeon’s caseload in surgical outcomes. Yet, studies have shown that even when you adjust for the influence of hospital volume on patients’ outcome, the surgeon volume–outcome relationship persists. Indeed in some instances, the surgeon volume accounted for the entire apparent volume effect [4], and adjusting for surgeon volume can remove the hospital volume–outcome relationship [20]. This clearly demonstrates the potential importance of the surgeon’s volume. However, even among high volume surgeons, inter-surgeon differences in performance do result in significant variability in patient outcome [3]. So it might be that independent of prior case volume, technically skilled surgeons achieving better patient outcomes attract greater numbers of referrals and so are rewarded with a higher volume of cases (the selective-referral hypothesis). Similarly, high volume hospitals might attract surgeons who are already achieving better outcomes, which will in part explain the hospital volume–outcome relationship. As the methodological quality of volume–outcome studies improves, we may see that the surgeon and his or her caseload have a greater influence on outcomes for more technically complex operations. This does not mean to say that the hospital volume will cease to play a role, as the quality of intensive care services, ward nursing and physiotherapy remains integral to achieving the desired patient outcomes.
16.4.2 The Role of the Institution The institution in which a surgeon works will impact on the patient outcomes achieved. High volume hospitals are likely to be larger facilities and will provide a broader range of specialist and sub-specialist services. Associated academic and training programmes may result in higher staff to patient ratios, availability of innovative treatments and technologies and on-site multi-disciplinary referral networks. The important question is “does this translate to improved outcomes for patients”? The importance of the structure and processes within an institution aside the surgeon’s technical expertise can be suggested from the studies assessing the outcomes of cancer patients managed non-surgically. Although confounded by differences in treatment regimes, lymphoma and testicular cancer patient mortality appears to be lower for patients receiving their treatment at a comprehensive cancer centre [12]. For colorectal resections, Harmon
203
et al. [10] demonstrated that in terms of inpatient mortality, medium volume surgeons could achieve results similar to high volume surgeons when operating in medium or high volume hospitals but not low-volume hospitals. Similarly, the outcomes achieved by low-volume surgeons showed some improvement with increasing hospital volume, although they never equalled those of high volume surgeons. The relationship between hospital volume and mortality following colorectal surgery appears not only to affect inpatient mortality, but also 30-day, 2-year and overall mortality [21]. This study corroborated the findings of Harmon et al. in showing that hospital volume was the greater predictor of each measure of mortality as compared to surgeon volume. Even when the authors used stoma formation rates as the outcome, which might be presumed to be almost solely determined by surgeon technique and preference, after adjusting for case-mix, hospital volume was an important independent predictor as surgeon-specific volume. The exact reasons for this association have not been proven, but availability and quality of post-operative care available to a patient may well influence the surgeon in his or her choice of operation performed [21]. It is now generally agreed that ongoing volume– outcome relationship research should focus at the institutional level and try and identify factors such as processes of care for which volume is acting as a proxy measure. The impact of surgeon volume should not, however, be disregarded as the surgeon and the institution in which he operates remain integrally linked.
16.5 Immeasurable Factors The direction of causality in the volume–outcome debate has not been proven, but the “practice makes perfect” hypothesis alludes to factors which have been thought immeasurable, such as the surgeon’s skill. These factors have been implicated both at a surgeon level and at an institutional level. While the majority of volume–outcome studies have used large administrative databases to investigate the relationship, in order that they identify small differences in outcomes, access to clinical data extracted from medical records is likely to reveal differences in processes of care that were once considered immeasurable factors. This approach was used successfully by Thiemann et al. to investigate the association between hospital volume and
204
survival after acute myocardial infarction in elderly patients [23]. A third of the survival advantage associated with treatment at higher volume hospitals could be attributed to the use of aspirin, thrombolytic agents, b-blockers, angiotensin-converting-enzyme inhibitors and re-vascularization. While availability of technology for angioplasty and bypass surgery was greater in high volume centres, this was not independently associated with overall mortality. No predominant mechanism was identified to explain outcome differences between high and low-volume institutions, but as alluded to by the authors, this is not surprising as the mechanisms and processes involved within a patients care pathway are multifactorial and interlinked. Variations in processes of care may go someway to explaining why even for a cohort of high volume institutions, there are good and bad performers and why volume per se does not always adequately reflect quality of care in terms of its process. If such relatively subtle differences in processes of care can be responsible for altering outcomes, then could low-volume hospitals, which enforce clinical best practice throughout the patient treatment pathway, achieve outcomes approaching those of high volume hospitals? There is evidence to suggest that when low-volume institutions have as many residents/interns and registered nurses per 100 beds as high volume institutions, overall mortality rates for paediatric heart surgery and heart transplants appear equivalent [6]. Further volume–outcome research should focus on incorporating clinical data with administrative databases. This will identify important differences in the structure and processes between good and badly performing institutions that may or may not relate to caseload volume. Before we attribute outcome differences to immeasurable factors, we must first ensure that all measurable factors have been identified and considered.
16.6 Public Health Implications 16.6.1 Policy Change and Healthcare Restructuring Volume outcome research has enormous potential to impact current and future policy initiatives. It is for this very reason that its validity is assured. The Leapfrog Group is a coalition of America’s largest corporations and public agencies that buy Healthcare on behalf of
E. Mayer et al.
their employees, dependants and retirees, and was founded in November 2000. One aspect of their mission statement is to advance the safety, quality and affordability of Healthcare. With this premise, they acted as early proponents for selective referral to high-volume institutions for five surgical procedures. Although better understanding of the volume–outcome research has necessarily caused a redefinition of the “basket” of operations suitable for regionalisation, the message has not wavered and indeed has gained momentum with a number of other organizations joining in. Surely whatever the underlying causative factors, centralisation of Healthcare can only be beneficial for the patient. Opponents, however, describe a number of valid disadvantages to centralisation: long travel times for patients and family of patients particularly in rural areas, alterations of local referral patterns that could destabilise other non-centralised Healthcare services, the potential for a two-tier Healthcare system generating yet further health inequality, inhibiting continuity of care because of segmentation of preoperative, peri-operative and post-operative care and the possible overburdening of high-volume centres which could negatively affect quality of care. Policy change is the “end effector” of health service research and acts as the initiator for Healthcare restructuring. Restructuring and subsequent organisation of Healthcare based on the centralisation model would implement a selective-referral strategy. While the evidence might indicate probable or potential improvements for patients in terms of treatment outcomes, it is unable to incorporate the variability that exists in terms of disease incidence, patient demographics, existing Healthcare resource and patient choice. In this regard, flexibility and adaptation need to be exercised during any implementation process to reflect local requirements. Change can be accompanied by a period of instability and uncertainty. Potential barriers to implementing a selective-referral process are summarised in Table 16.2. Restructuring in order to facilitate the selectivereferral process is not the final step in ensuring improved outcomes for patients. As discussed, high volume per se is no guarantee of improved outcomes. The continued evaluation of processes and systems of care within existing or newly created high-volume institutions will act as a quality control mechanism, but also continue to enhance our understanding of the determinants of improved outcome, thereby facilitating any future modifications.
16 The Role of Volume–Outcome Relationship in Surgery Table 16.2 Potential barriers to implementing a selectivereferral programme based on a volume standard. Adapted from reference [11] A potential for decline in quality of care at higher volume providers Patients’ preferences for care closer to home Difficulties for patients to travel to hospitals that are located far away Patients who need immediate treatment or are too unstable to transfer Loss of access in areas where low-volume services have been closed (e.g. cardiac surgery) Resistance from surgeons and hospitals to co-operate in quality monitoring efforts Effects on marketplace structure and competition: Increased market power of high-volume hospitals (e.g. prices could rise) Barriers to entry of new competitors (e.g. it is difficult to start off at high volume) Potential for medically inappropriate admissions to boost volumes to meet cut-offs
16.6.2 Research and Ethical Implications The last 5 years has seen a great increase in health services research exploring the volume–outcome relationship within surgery, yet there remain many unanswered questions. At the turn of the century, the American IOM outlined the feedback of participants of a workshop “Interpreting the Volume–Outcome Relationship in the Context of Healthcare Quality” and their proposals for future warranted research actions [11]. Proposals broadly fell into four categories: implementation of related research, new areas of research, methodological development and health services research data infrastructure. Research since publication of this consensus opinion has reflected in part some of the recommendations and as a result, improved in quality. Improvements have generally been seen in the field of methodological development, where authors have examined outcomes other than mortality, such as functional status, quality of life and longer term outcomes, and better adjusted for confounding variables such as processes and systems of care. Outcomes and developed risk adjustment tools have also been tailored more towards being procedurespecific as opposed to generic within surgery. As with all translational research, the implementation of health services research into directing future
205
developments in service provision remains the end objective. With the potential for such important policy implications as described previously, we must be certain not only of our methodological rigour, but also that we have asked the correct question. Many groups have pursued defining a minimum annual caseload above which better outcomes can be assured. This relies solely on the basis that volume acts as a surrogate marker for factors resulting in better outcomes for patients. As our understanding of the volume–outcome relationship has developed, we know this not always to be true. Should we therefore not focus more on exploring factors that can be shown to correlate with improved outcomes and then verify if they are independently reliant on volume of delivery and scales of economy? This approach would better identify local variations in service provision and better influence policy change. Not surprisingly, volume–outcome research is not spared of ethical considerations. Regional variation in service provision demographics means that widespread and all-encompassing service provision re-configuration based on volume–outcome research will have to overcome a number of challenges. It has been suggested that this may be particularly true in rural areas and as a result, suggestions for “initial implementation efforts be focused in urban areas with dense concentrations of hospitals, where success is more likely” [11]. During this implementation phase, there is clearly the potential for a “two-tier” Healthcare system separating rural and urban populations. Policy change based on this research can only occur once we are certain of its validity and that validity must be confirmed by a number of stakeholders, particularly those who will not have commercial gain. This is not to say that we should discourage Healthcare providers from carrying out volume–outcome research, but we need to be happy that the output is robust and demonstrates equity to all patient groups.
16.7 Conclusion Research of the volume–outcome relationship in surgery will continue to thrive in the current Healthcare environment and will help shape its future structure. As we have dissected apart the relationship, we have unveiled a complex interaction of numerous outcome, process and structural factors. Although the “practice makes perfect “or “selective-referral” hypotheses are
206
not the sole determinant of the causal relationship, they will continue to form a part. Continuing research in this area should be conducted under the guidance of a methodological framework that incorporates many of the methodological limitations and ideals discussed within this chapter and similarly emphasises, and accounts for, the importance of the hierarchical nature of patient’s outcomes within modern Healthcare. We have made great strides and our understanding today far surpasses that of 20 years ago. This is, however, not the end and continued research will form an important component of society’s wish and the medical communities desire to improve the quality of care that patients receive.
References 1. Aylin P, Bottle A, Majeed A (2007) Use of administrative data or clinical databases as predictors of risk of death in hospital: comparison of models. BMJ 334:1044 2. Begg CB, Riedel ER, Bach PB et al (2002) Variations in morbidity after radical prostatectomy. N Engl J Med 346:1138–1144 3. Bianco FJ Jr, Riedel ER, Begg CB et al (2005) Variations among high volume surgeons in the rate of complications after radical prostatectomy: further evidence that technique matters. J Urol 173:2099–2103 4. Birkmeyer JD, Stukel TA, Siewers AE et al (2003) Surgeon volume and operative mortality in the United States. N Engl J Med 349:2117–2127 5. Christian CK, Gustafson ML, Betensky RA et al (2005) The volume–outcome relationship: don’t believe everything you see. World J Surg 29:1241–1244 6. Elixhauser A, Steiner C, Fraser I (2003) Volume thresholds and hospital characteristics in the United States. Health Aff (Millwood) 22:167–177 7. Goodney PP, Stukel TA, Lucas FL et al (2003) Hospital volume, length of stay, and readmission rates in high-risk surgery. Ann Surg 238:161–167 8. Halm E, Lee C, Chassin M (2000) How is volume related to quality in Healthcare? A systematic review of the research literature. In: Hewitt M (ed) Interpreting the volume–outcome relationship in the context of Healthcare quality: workshop summary, Institute of Medicine, Washington, DC, National Academic Press, pp Appendix C, 27–102
E. Mayer et al. 9. Halm EA, Lee C, Chassin MR (2002) Is volume related to outcome in Healthcare? A systematic review and methodologic critique of the literature. Ann Intern Med 137:511–520 10. Harmon JW, Tang DG, Gordon TA et al (1999) Hospital volume can serve as a surrogate for surgeon volume for achieving excellent outcomes in colorectal resection. Ann Surg 230:404–411; discussion 411–413 11. Hewitt M (2000) Interpreting the volume–outcome relationship in the context of Healthcare quality: workshop summary, Institute of Medicine, Washington DC, National Academic Press 12. Hillner BE, Smith TJ, Desch CE (2000) Hospital and physician volume or specialization and outcomes in cancer treatment: importance in quality of cancer care. J Clin Oncol 18:2327–2340 13. Hong D, Tandan VR, Goldsmith CH et al (2002) Users’ guide to the surgical literature: how to use an article reporting population-based volume–outcome relationships in surgery. Can J Surg 45:109–115 14. Killeen SD, O’Sullivan MJ, Coffey JC et al (2005) Provider volume and outcomes for oncological procedures. Br J Surg 92:389–402 15. Luft HS, Bunker JP, Enthoven AC (1979) Should operations be regionalized? The empirical relation between surgical volume and mortality. N Engl J Med 301:1364–1369 16. Malin JL, Schneider EC, Epstein AM et al (2006) Results of the National Initiative for Cancer Care Quality: how can we improve the quality of cancer care in the United States? J Clin Oncol 24:626–634 17. Mason A, Goldacre MJ, Bettley G et al (2006) Using routine data to define clinical case-mix and compare hospital outcomes in urology. BJU Int 97:1145–1147 18. Panageas KS, Schrag D, Riedel E et al (2003) The effect of clustering of outcomes on the association of procedure volume and surgical outcomes. Ann Intern Med 139:658–665 19. Roohan PJ, Bickell NA, Baptiste MS et al (1998) Hospital volume differences and five-year survival from breast cancer. Am J Public Health 88:454–457 20. Schrag D, Panageas KS, Riedel E et al (2002) Hospital and surgeon procedure volume as predictors of outcome following rectal cancer resection. Ann Surg 236:583–592 21. Schrag D, Panageas KS, Riedel E et al (2003) Surgeon volume compared to hospital volume as a predictor of outcome following primary colon cancer resection. J Surg Oncol 83:68–78; discussion 78–79 22. Spiegelhalter DJ (2005) Funnel plots for comparing institutional performance. Stat Med 24:1185–1202 23. Thiemann DR, Coresh J, Oetgen WJ et al (1999) The association between hospital volume and survival after acute myocardial infarction in elderly patients. N Engl J Med 340:1640–1648
An Introduction to Animal Research
17
James Kinross and Lord Ara Darzi
Contents
17.12.3 Inhalational Anaesthetic Agents ............................ 224 17.12.4 Induction and Maintenance of the Airway ............ 224
Abbreviations ..................................................................... 207
17.13
Surgical Technique............................................... 225
17.1
Introduction .......................................................... 208
17.14
Post-Operative Care ............................................ 225
17.2
History of Animal Research ................................ 208
17.15
Conclusion ............................................................ 226
17.3
Ethical Considerations ........................................ 209
17.16
Useful Web Sites ................................................... 226
17.3.1 17.3.2
The Challenge of Hybrid Embryo Research .......... 210 Animal Rights Extremism ..................................... 210
References ........................................................................... 227
17.4
Current Trends in Animal Research .................. 211
17.5
Legal Requirements ............................................. 212
17.5.1 17.5.2 17.5.3
The United States ................................................... 212 The European Union .............................................. 213 The United Kingdom ............................................. 213
17.6
The Animals (Scientific Procedures) Act 1986 .. 214
17.6.1 17.6.2 17.6.3
Definitions.............................................................. 214 Licence to Practice ................................................. 214 Legal Requirements ............................................... 214
17.7
Projects to Generate Genetically Modified Animals ................................................. 216
17.8
Personal Health Protection and Monitoring ..... 216
17.9
Animal Husbandry .............................................. 217
17.9.1 17.9.2
Definitions of Lab Animals.................................... 217 Disease Recognition .............................................. 218
17.10
Humane Killing of Animals ................................ 219
17.11
Analgesia ............................................................... 220
17.12
Anaesthesia ........................................................... 220
17.12.1 Selection of Methods of Anaesthesia................... 221 17.12.2 Inhalational Anaesthetics ....................................... 222
J. Kinross () Department of Biosurgery and Surgical Technology, Imperial College, 10th floor, QEQM, St. Mary’s Hospital, Praed Street, London W2 1NY, UK e-mail: [email protected]
Abstract Despite advances in computer modelling and bioinformatics, animal models remain a vital component of biomedical research. The growth in this area of work is in part due to the evolution next generation of biotechnologies, which more than ever necessitate the need for in vivo experimentation. An understanding of the principals of animal research therefore remains a necessity for medical researchers as it permits scientific analysis to be interpreted in a more critical and meaningful manner. Initiating and designing an animal experiment can be a daunting process, particularly as the law and legislation governing animal research is complex and new specialist skills must be acquired. This chapter reviews the principles of animal research and provides a practical resource for those researchers seeking to create robust animal experiments that ensure minimal suffering and maximal scientific validity.
Abbreviations ACLAM The American College of Laboratory Animal Medicine APHIS Animal and Plant Health Inspection Service AWA Animal Welfare Act
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_17, © Springer-Verlag Berlin Heidelberg 2010
207
208
COSHH ECBR FDA H & SE HFEA
Control of substances hazardous to health European Coalition for Biomedical Research The Food and Drug Agency Health and Safety Executive The Human Fertilisation and Embryology Authority IACUC Institutional Animal Care and Use Committee IVC Individually ventilated cage IVF In vitro fertilisation LAS Laboratory animal allergy NACWO Named animal care and welfare officer NICE National Institute for Clinical Excellence NVS Named veterinary surgeon OLAW The Office of Laboratory Animal Welfare PHS Public Health Service PIL Personal License under the Scientific (Animal Procedures) Act 1986 PPL Project License under the Scientific (Animal Procedures) Act 1986 RSPCA Royal Society for the Prevention of Cruelty to Animals SPF Specified pathogen free USDA The United States Department of Agriculture
17.1 Introduction Animal experimentation has contributed to 70% of the Nobel prizes for physiology and medicine, and despite advances in computer modelling and bioinformatics, it remains a vital component of biomedical research. However, the ethical challenges created by animal research continue to polarise both the scientific community and the public. This is in part caused by the increasingly complex philosophical obstacles posed by the use of genetically modified species and the development of novel scientific approaches such as human–animal hybrid research. As a result, animal work is now subject to closer public scrutiny than at any previous moment in history and all researches involving animals must maintain the highest standards of clinical and veterinary practice in accordance with the law. But this has not meant that less work is being performed. Over three million scientific procedures were started in 2006, a rise of 4% from 2005 [1], which has led to recommendations to cut back on Britain’s annual statistics on animal experiments [2]. The growth in this area reflects the evolution of next generation of
J. Kinross and L. A. Darzi
biotechnologies, which more than ever necessitate the need for in vivo experimentation. An understanding of the principals of animal research, therefore, remains a necessity for medical researchers as it permits scientific research to be interpreted in a more critical and meaningful manner. Initiating and designing animal experiments can be a daunting process, particularly as the law and legislation governing animal research is complex and new specialist skills must be acquired. This chapter reviews the principles of animal research and provides a practical resource for those researchers seeking to create robust animal experiments that ensure minimal suffering and maximal scientific validity.
17.2 History of Animal Research The earliest references to animal testing are found in the writings of the Greeks in the second and fourth centuries bc and Aristotle and Erasistratus (304–258 bc) were amongst the first to perform experiments on living animals [3]. Galen, the father of modern vivisection, dissected pigs and goats after which animals played a significant role in experimental science and a particularly important part in the development of surgical and anaesthetic science. Humphrey Davy, a surgeon’s apprentice, demonstrated that nitrous oxide produced a state of reversible unconsciousness in animals 50 years before it was first trialled in humans. Carl Koller pioneered local anaesthesia in a partnership with Sigmund Freud and investigated the analgesic effects of cocaine on animals in the 1880s. Spinal analgesia began when Leonard Corning accidentally pierced the dura of a dog in a cocaine experiment. He deliberately repeated the injection in a patient and called it spinal anaesthesia. Corning’s and Koller’s studies were the prelude to modern epidural anaesthesia and analgesia and the introduction of highly effective local anaesthetics like lignocaine [4]. Louis Pasteur demonstrated germ theory by giving anthrax to sheep, and animal studies on surgical aseptic technique significantly improved morbidity and mortality after surgical intervention. John Heysham Gibbon developed the first heart-lung bypass machine, which he trialled on a cat in 1935. After further studies with dogs after World War II, he finally performed the world’s first open heart operation on a human patient in 1953 [4].
17 An Introduction to Animal Research
Marshall Hall developed the first code of conduct for animal work in response to public concern. He stated that all animal experiments should be original, result in an endpoint and be as pain free as possible. In 1822, the first Act of Parliament dealing with cruelty to animals came into effect. Marin’s Law prevented cruelty to horses and cattle and domestic animals. But it was not until 1844 that the Royal Society for the Prevention of Cruelty to Animals (RSPCA) was founded, conferring protection to cats and dogs. In 1873, the Burden Sanderson published the “Handbook for the Physical Laboratory”, containing references to animal experiments. The “Cruelty to Animals Act” in 1876 regulated the use of vertebrates in experiments calculated to cause pain, working through a licensing system of persons and authorised premises. This underwent numerous but unsubstantial modifications until the Animals (Scientific Procedures) Act was introduced in 1986.
17.3 Ethical Considerations The issue of animal research provides us with the definitive marks around the phrase Catch 22. We depend on similarities between animals and humans to validate the results of animal research, yet we also depend on the differences between animals and humans to justify their use. Aristotle believed that there was a hierarchy of animals, with humans at the top, as humans could reason and had “rational souls” [5]. In the nineteenth-century Britain, the protest was about cruelty to animals in laboratories. In more recent times, the debate has also focused on whether animal research is necessary, what medical progress it has produced and whether alternatives could be used. A careful analysis of the public debate about animal experimentation shows that essentially all of it revolves around two basic arguments: is animal experimentation cruel, and is it necessary [6]? So, is animal experimentation cruel? To answer this question, it is necessary to determine if animals feel pain and experience distress in the same manner as humans. It has been argued that what we do to animals matters to them and that they feel not only pain, but also the full range of emotions that feature in our moral deliberations about humans: fear, loneliness, boredom, frustration and anxiety [7]. However, the manner in which an animal may suffer pain or emotional trauma
209
defies easy definition because animals are unable to communicate or describe their distress. The concept of cruelty is also complicated by the question of how many animals is the alleviation of one person’s suffering worth? Those against animal research may argue that this statement makes the fundamental assumption that animals do not have rights. If this is not the case, then one could argue that their sacrifice is immoral no matter what the benefit is, and thus, that there is no moral justification for regarding the pain or pleasure experienced by animals as being of less importance than that felt by animals [8]. However, scientists argue that granting rights to animals is not a valid response because human rights are founded on social values. Therefore, a utilitarian approach has been adopted to ensure that animal work is of sufficient importance and that there is no viable alternative to the proposed research, i.e. animal studies should only be carried out if the benefit outweighs the potential pain and suffering of the animal. In essence, this signifies a “cost/benefit” balance, upon which The Animals (Scientific Procedures) Act 1986 is based. The Principals of humane experimental technique are, therefore, founded on the notion of the three Rs: 1. Replacement or the use of non-animal techniques wherever possible. For example, if a specific mechanism is being investigated, could alternatives such as cell lines be used? 2. Reduction of the total number of animals used to the minimum necessary. For example, by minimising variability and using a robust statistical power calculation and good experimental design. 3. Refinement of techniques used, both to assist the reduction of animal usage, and to reduce any potential of stress to a minimum. For example, good housing, improved analgesia and better post-operative care. The three Rs form the corner stone of the ethical review required for each new project licence, given by the Home Office for animal research in the UK. This process aims to provide independent ethical advice to the certificate holder and support or advice for those applying for personal and project licences. It is important to consider that the Secretary of State may not grant a project licence if the work cannot be achieved satisfactorily by any other reasonable and practical method or if the potential suffering is minimised. However, the balancing act performed by the government is not exact, as the assessment of scientific and medical benefit and that of animal
210
suffering are not expressed in the same terms. The degree of suffering or the severity of an experiment is expressed as low, medium or high. In an ideal world, research inflicts a low amount of suffering on an animal and yields a high scientific return. However, this is not always the case and for most research which involves medium suffering for a medium chance of generating a beneficial outcome would be unacceptable. Clearly, the final judgement will depend on a consensus view. Some scientists believe that the results from animal experiments cannot be applied to humans because of the biological differences between the species and as the results of animal experiments often depend on the type of animal model [9]. It has also been argued that few methods exist for evaluating the clinical relevance or importance of basic animal research [10]. Pound et al. reviewed six animal studies that had been conducted to find out how the animal research had informed the clinical research and highlighted inconsistencies in species selection, experimental design and outcome measures [11]. Further discordance between animal and human studies due to bias or the failure of animal models to mimic clinical disease adequately was also found in a systematic review of animal studies for interventions with unambiguous evidence of a treatment effect in clinical trials such as head injury and antifibrinolytics in haemorrhage [12]. However, both of these reviews were limited by the small number of studies available and more modern studies are subject to a sterner ethical and peer review. None the less, this work highlights the importance of selecting species that are able to provide the closest approximation to the human system and selecting research protocols based on robust clinical evidence.
17.3.1 The Challenge of Hybrid Embryo Research In 1998, James Thomson et al. derived and successfully cultured human embryonic stem cells (hES cells) from a human blastocyst.[31]. Since then a revolution has in stem cell research has occurred, igniting the debate on the complex issue of hybrid embryo research. This issue is too great to be discussed in full here and is discussed elsewhere. [32]. However, a brief overview of the salient legal and ethical developments are offered in this text. The Human Fertilisation and Embryology Authority (HFEA), the independent regulator of in vitro
J. Kinross and L. A. Darzi
fertilisation (IVF) treatment and embryo research in the United Kingdom, has recommended further public discussion to debate the broad principles for handling any research proposals involving animal-human hybrids or chimeras (organisms or organs consisting of two or more tissues of different genetic composition) [13]. Human–animal fusion products have been widely used in biomedical research for many years (for example, in xenograft models of cancer in which human cells are introduced into mice); however, there remains a certain amount of discomfort regarding their use. In 2005, the US National Academy of Sciences stated its opposition to research in which human embryonic stem cells are introduced into non-human primate blastocysts (pre-implantation embryos), or in which any embryonic stem cells are introduced into human blastocysts, as well as the breeding of any animal into which human embryonic stem cells have been introduced [14]. Debate regarding the restrictions will greatly affect current work to combine factors from animal eggs with animal or human nuclei and the engraftment of human stem cells into animal hosts. Therefore, the key issues that require resolution are as follows: (1) How robustly the transplanted cells are incorporated into the host and at what stage and into what tissues and organs they are introduced? (2) Whether there is a possibility that introducing such cells would alter the production of sperm or eggs in the host animal? (3) For neuronal transplantation, whether there is a risk of transferring human functions or behaviours to the host animal [15]? In 2001 President George W. Bush restricted federal funding of research with hES cells to use of specific federally approved cell lines already in existence before August 9, 2001. However, Clearer ethical and regulatory guidelines based on informed scientific debate are now slowly being developed [ref 2]. This is reflected in recent alterations in both UK and American law and a slow by significant change in attitudes towards this important work. In May 2008, The House of Commons voted to allow the use of animal-human embryos for research after a national debate and in 2009, President Barack H. Obama issued Executive Order 13505: Removing Barriers to Responsible Scientific Research Involving Human Stem Cells. This stated that the Secretary of Health and Human Services, through the Director of NIH, could support and conduct responsible, scientifically worthy human stem cell research, including human embryonic stem cell
17 An Introduction to Animal Research
(hESC) research, to the extent permitted by law, effectively reversing the previous position of the NIH. However, the debate over hES is far from over ensuring that public opinion over this emotive topic has yet to be decided.
17.3.2 Animal Rights Extremism Animal activists in the United Kingdom have been present for over 30 years. They have a history of not only harassing researchers, but also technicians, security guards and third parties who do business with animal researchers. In 2004, the University of Cambridge scrapped plans for a large primate research facility because of excessive animal rights extremism. A similar battle occurred over the construction of an animal research laboratory at the University of Oxford. Despite this, in the United Kingdom, researchers have seen a swelling of public support and legislative actions in response to the more extreme activities of animal campaigners. Many of the more grotesque actions of the most extreme anti-vivisectionists have simply served to undermine any public support. For example, activists dug up the remains of a member of family that ran a guinea-pig farm, forcing it to close in August 2005. The public outrage was mostly directed against the activists and in May 2007, the UK Prime Minister signed a petition in support of animal research signed by more than 21,000 people. In a public statement, the prime minister noted that new powers for the police and courts would be used to counter the threat from animal-rights extremists. None the less, due care should be taken when working with animals. Therefore, close attention should be paid to all security measures placed in and around the laboratory and all suspicious behaviour should be reported to the appropriate authority.
Fig. 17.1 Purpose and proportion of animals used for the study of disease [17]
211
17.4 Current Trends in Animal Research In the UK, universities perform the majority of animal research; however, commercial and non-profit organisations, government departments, National Health Service hospitals, public health laboratories and other public bodies also utilise animal models. There has been a significant reduction in the annual number of scientific procedures since 1976, although since 2000 the number of procedures has risen by 7%, with the rise in breeding procedures accounting for a significant part of this increase [16]. Advances in molecular biology have opened up new areas of research, resulting in an increase in the use of genetically modified animals. In addition, new regulatory proposals set out in the European Union Chemicals Strategy White Paper will, if agreed and implemented, have led to increased use of animals for human health and safety purposes. The use of mice increased for fundamental research and fish for studies on the protection of man, animals and the environment. Mice, rats and other rodents are used in the majority of procedures. The genome of the mouse is similar to the human genome, with regard to the size and number of genes. They also have a very high reproduction rate, a short life span and mature quickly. This makes it possible to easily follow the effects of changed genes over many generations. As a result, mice have become the most important and common animal models for human diseases. Most of the remaining procedures used fish (9%) and birds (4%). Dogs, cats, horses and non-human primates, which are afforded special protection by the Animals (Scientific Procedures) Act 1986, are collectively used in less than 1% of all procedures. The proportion of human diseases for which animal experiments are used can be seen in Fig. 17.1, as can the purposes of the experiments. The number of procedures using non-human primates is falling, mainly due to a decrease in old-world
212 Fig. 17.2 Examples of animals commonly used in research
J. Kinross and L. A. Darzi Rabbits Vaccine production Vascular research
Dogs Cardiovascular research Urological research
Cats Neurophysiological studies
primates. The use of genetically normal animals now represents 55% of all procedures and the number of genetically modified animals is increasing, the vast majority of which are rodents. (Common examples of animals used in research can be seen in Fig. 17.2). Nearly 40% of all procedures used some form of anaesthesia to alleviate the severity of the interventions [1]. Current UK law does not permit the use of live animal tissue to be used to practice surgical techniques, unlike in Europe, the USA and other countries. Worldwide, a number of live animal models have been developed. Live animal models are expensive and their anatomy can greatly vary from humans. Despite this, they are used widely and provide live simulations that mimic the real operative reality. An example is the dog and the coronary bypass model. This model has not only been used to train surgeons but evidence suggests its existence has lead to a fall in technical complications in human operations and improved patency rates. Other models include laparoscopic cholecystectomy in pigs, even though the pigs anatomy varies from humans at Calot’s triangle. The use of dead animal tissue is cheaper and has less ethical ramifications. Moreover, it is permitted in the UK to be used. Sheep and pig materials, for example, have been used for bowel anastomosis in surgical skills courses.
17.5 Legal Requirements Numerous legal bodies are responsible for determining the legal status of animal research and these vary between countries. However, it is important to have an overview of the most important statutes that govern work
Pigs Cardiovascular research Surgical technology Orthopaedic/osteoprosis Animals used in research
Rats/mice Cancer research Metabolic disease Drug efficacy Genomics/proteomics/metabonomics Primates Pharmacology/toxicology AIDS/HIV
internationally, as it is not uncommon to practice animal work at varying sites. Indeed, because of the stringent conditions applied by the Animals (Scientific procedures) Act 1986, many researchers often find it faster and cheaper to practice animal research outside of the UK.
17.5.1 The United States In the USA, animal testing on vertebrates is primarily regulated by the 1966 Animal Welfare Act (AWA), which is enforced by the Animal Care division of the Animal and Plant Health Inspection Service (APHIS) of the United States Department of Agriculture (USDA). The AWA contains provisions to ensure that individuals of covered species used in research receive a certain standard of care and treatment, provided that the standard of care and treatment does not interfere with the design, outlines or guidelines of actual research or experimentation. Currently, AWA only protects mammals. In 2002, the Farm Security Act of 2002, the fifth amendment to the AWA, specifically excluded purpose-bred birds, rats and mice (as opposed to wild-captured mice, rats and birds) from regulations. The AWA requires each institution using covered species to maintain an Institutional Animal Care and Use Committee (IACUC), which is responsible for local compliance with the Act. IACUCs are of central importance to the application of laws concerning animal care and use in research in the United States. In 2001, the results of a study that evaluated the reliability of IACUCs found little consistency between decisions made by IACUCs at different institutions. Institutions are subject to unannounced annual inspections from USDA APHIS Veterinarian inspectors.
17 An Introduction to Animal Research
Another regulatory instrument is the Public Health Service (PHS) Policy on Humane Care and Use of Laboratory Animals, which became statutory with the Health Research Extension Act 1985. This is enforced by the Office of Laboratory Animal Welfare (OLAW). This Act applies to any individual scientist or institution in receipt of federal funds and requires each institution to have an IACUC. OLAW enforces the standards of the Guide for the Care and Use of Laboratory Animals published by the Institute for Laboratory Animal Research, which includes all vertebrate species in its care protocols, including rodents and birds. The Food and Drug Agency (FDA) also maintains guidelines for laboratory practice, which are instituted to assure quality and integrity of data submitted to non-clinical laboratory studies in support of applications to carry out clinical trials and the marketing of new drugs in the USA. Therefore, foreign pharmaceutical companies that wish to export products to the USA must follow these guidelines or risk having the testing facility disqualified and disclosed to the public. Furthermore, the FDA may inspect foreign facilities that seek a USA marketing permit [19].
17.5.2 The European Union The European Union is subject to Directive 86/609/ EEC on the protection of Animals used for Experimental and other Scientific purposes, adopted in 1986. There is considerable variation in the manner member countries choose to exercise the directive. For example, French legislation is determined by the decree of 19 October 1987. This requires an institutional and project licence before testing on vertebrates may be carried out. An institution must submit details of their facilities and the reason for the use of animals they house, after which a 5-year licence may be granted following an inspection of the premises. But, personal licences (PLs) are not required for individuals working under the supervision of a project licence holder and these regulations do not apply to research using invertebrates. There has been pressure for some time to update Directive 86/609/EEC, and in 2006, the European Commission gave the first indication of the likely amendments. In response, the European Coalition for Biomedical Research (ECBR) was formed in 2007 to address these changes. The coalition, which currently
213
represents some 48,000 academics, is seeking to ensure that amongst other things, non-human primates are not limited from medical research [20].
17.5.3 The United Kingdom Unfortunately, the Home Office is not the only government body responsible for regulating animal research in the UK [19] and this is particularly pertinent in the field of Toxicology. Other regulatory bodies that also determine testing regulations include: 1. National Institute for Clinical Excellence (NICE) 2. European Agency for the Evaluation of Medicinal Products 3. UK Data protection agency 4. UK Medicines and health care products regulatory agency 5. UK research governance 6. UK health and social care 7. US FDA (see above) Therefore, these individual regulations should also be accounted for while considering the study design. Despite this, all work in the UK is fundamentally regulated by The Animals (Scientific Procedures) Act 1986. This makes provision for the protection of animals used for experimental or other scientific purposes in the UK. It replaced the Cruelty to Animals Act 1786 and implements the requirements of the European Directive 86/609/EEC. The Act is administered in England, Scotland and Wales by the Home Office, and in Northern Ireland by the Department of Health, Social Services and Public Safety of the Northern Ireland Office. The Animals (Scientific Procedures) Act 1986 requires experiments to be regulated by three licences: a project licence (PPL) for the scientist in charge of the project, which details the numbers and types of animals to be used, the experiments to be performed and the purpose of them; a certificate for the institution to ensure it has adequate facilities and staff and a PL for each scientist or technician who carries out any procedure. In deciding whether to grant a licence, the Home Office refers to the Act’s cost-benefit analysis as described in the ethical review section and the three Rs. The experiments must, therefore, use “the minimum number of animals, involve animals with the lowest degree of neurophysiological sensitivity, cause the least
214
pain, suffering distress or lasting harm, and be the most likely to produce satisfactory results” (Section 5(5) (b) ). During a 2002 House of Lords select committee inquiry into animal testing in the UK, witnesses stated that the UK has the tightest regulatory system in the world, and is the only country to require a cost-benefit assessment of every licence application. However, this concluded that the basis on which research and testing is regulated in the UK should continue.
17.6 The Animals (Scientific Procedures) Act 1986 17.6.1 Definitions 1. Protected animal: Any living vertebrate, other than a man. 2. Regulated procedure: any experimental or other scientific procedure applied to a protected animal which may have the effect of causing that animal pain, suffering or lasting harm.
J. Kinross and L. A. Darzi
The personal licensee is entrusted with the primary responsibility for the welfare of the animals on which he or she has performed regulated procedures. Therefore, they must ensure that animals are properly monitored and cared for and must inform the project licence holder if the severity limit of any procedure has been exceeded. From a procedural perspective, no personal licensee shall carry out a regulated procedure for which authority has not been granted or which is not authorised by a project licence, and they are not permitted to use any neuromuscular blocking agent in place of anaesthesia. The personal licence holder is also responsible for ensuring that each cage or confinement area is properly labelled. In the case of cats, dogs, primates and other farm animals, each animal should be identifiable. For rodents, each cage should be labelled with the name of the personal licence holder (PIL), and the strain, species, project licence (PPL) number, procedure and special requirements. Finally, it is the responsibility of the PIL to arrange the humane killing of an animal which is suffering or likely to suffer.
17.6.3 Legal Requirements 17.6.2 Licence to Practice Applicants for PLs are now required to have successfully completed an accredited training programme, consisting of four modules. There are very limited exemptions from these requirements, which may be considered by the Home Office. Personal licensees seeking extension of authority from minor surgical procedures to major surgical procedures will be expected to complete Module 4 of the programme before application for such amendment. New applicants for project licences are required to have successfully completed at least Modules 1, 2 and 5 and also Modules 3 and 4 when appropriate to the procedures to be carried out in the project. In most cases, project licence applicants will have held or still hold PLs, and therefore, will only need to complete Module 5 prior to application for a project licence. Persons who have never been personal licensees or who have very limited experience of animal science or animal welfare will be considered to have sufficient appropriate experience to hold a project licence.
Certificate holders and administrative staff involved performing animal research need to be familiar with the legal framework within which they carry out their duties. (a) Protected animals The definition of a protected animal defines the boundaries for animal research. Protection extends to certain immature forms of mammals, birds and reptiles from halfway through the gestation or incubation period. In the cases of fish or amphibians, this is defined from the point at which they become capable of independent living. Protection is also provided when regulated procedures are applied at an earlier stage of development if the animal is allowed to live beyond the stage of development, or the procedure results in pain, suffering, distress or lasting harm after the animal has reached that stage in development. Therefore, a licence would be required for virus propagation in an embryo if inoculation takes place before the midpoint of incubation, and if the embryo is allowed to survive into the second half of the incubation period.
17 An Introduction to Animal Research
215
Fig. 17.3 Persons involved in animal research Adapted from Central Biomedical Services New Licensee Training Course, Imperial college London, 2005
Home Office
Project Licence Holder (PPL)
Certificate holder
Ethical Review
Named Veterinary Surgeon(s)
Named Animal Care and Welfare Officer(s)
Personal Licence Holder (PIL)
Animal Technicians
Animals undergoing procedures
Stock/Breeding Animals
Accountability Advisory
(b) Regulated procedures Pain, suffering, distress and lasting harm encompass any material disturbance to normal health (defined as physical, mental and social well-being of the animal). Therefore, they may be acts of commission (such as dosing or sampling) or of deliberate omission (withholding food and water). Thresholds have been devised by the Home Office for specific classes of procedure and regulation starts at the “skilled insertion of a hypodermic needle”.
assume overall responsibility to the Home Office for compliance with the terms and conditions of a certificate. The holder is, therefore, responsible for the ethical review process, the prevention of unauthorised procedures, animal care and accommodation, staffing and identification of animals. They are also responsible for record keeping, animal sourcing and disposal. The Home Office has access to all areas of the establishment and regularly inspect premises to ensure that standards are maintained.
(c) Certificate of designation
(d) Named persons
Most researchers will not need to apply for a certificate of designation, and this is beyond the scope of the chapter. However, applicants usually represent the governing authority of the establishment, and agree to
Several groups of individuals are required to function as part of a team during an animal project (Fig. 17.3). The following are persons named by the Home Office:
216
1. Named Animal Care and Welfare Officer (NACWO) This person is responsible for day-to-day animal husbandry, care and welfare of the animals. These persons have expert knowledge and suitable experience of animal technology. 2. Named Veterinary Surgeon (NVS) This person is a member of the Royal College of veterinary Surgeons, which maintains a register of those veterinary surgeons holding specialist and other higher qualifications. NVSs are able to advise on developments in the use of laboratory animals, including the selection of appropriate models, techniques and procedures. 3. Other suitably qualified person The Secretary of State can permit this in exceptional circumstances where no suitable veterinary surgeon is available and the other suitably qualified person has proven considerable expertise.
17.7 Projects to Generate Genetically Modified Animals The breeding and construction of transgenic animals require specific regulation and this comes under the control of The Genetically Modified Organisms (Contained Use) Regulations (2000) and its amendment (2002). Facilities used for work on transgenic animals must be registered with the Health and Safety Executive (H & SE), and the animals must be contained within specific levels of containment. Further information can be found at http://www.hse.gov.uk/ biosafety/gmo/law.htm. Furthermore, the Home Office requires project licence approval for breeding of transgenic animals with genetic defects that are likely to interfere with their capacity to live normally under conventional conditions. A general statement should be included in the project licence on the likely consequences on the phenotype of the insert of the particular gene, e.g. growth hormone gene, if it inserts as expected. Each new construct (or group of constructs) must also have a proper scientific justification for its use (including justification for unusual techniques or species), and details of expected animal usage and adverse effects of the construct must be detailed. A register should be maintained for inspection by the
J. Kinross and L. A. Darzi
Home Office of all genotypes/phenotypes and associated adverse effects, and details of expected adverse effects and their control should be given of any breeding protocol. Unusual protocols, e.g. ovarian transplant, need to be specifically justified to the home office and cryopreserved embryos may be held, but cannot be allowed to develop in culture beyond midgestation (or inserted into a protected animal) without appropriate project licence authority. The following records need to be maintained: 1. Figures showing the efficiency of the generation programme 2. The number of animals used per construct, pregnancy rates and percentage success in generating viable new modifications 3. A register of genotypes/phenotypes (see (d) above) 4. The results of the health monitoring programme 5. Any morbidity or mortality experienced 6. The fate of the animals produced 7. Figures showing the subsequent breeding performance of each genetically-modified line 8. Details of any cryopreserved material produced Genetically modified animals not deliberately inoculated with micro-organisms must be “contained” under the Genetically Modified Organisms (Contained Use) Regulations 2000. The containment measures specified must take account of the perceived risk to humans and the environment if the animals escape. In some cases, standard practice containment may be adequate, for example, a transgenic rat maintained in standard housing in an animal unit or a transgenic sheep in a securely fenced enclosure in a field. If a significant risk is identified, it may be necessary to adopt additional containment measures to minimise the likelihood of escape. The appropriate level of animal containment, as assigned in the genetic modification risk assessment as either A or B (which in broad terms equate to standard containment and standard containment with additional measures, respectively), should be adopted.
17.8 Personal Health Protection and Monitoring A Control of Substances Hazardous to Health (COSHH) will need to be carried out in all work places where animals are housed. Adherence to occupational health
17 An Introduction to Animal Research
guidelines is clearly an important part of animal research, and thus all issues relating to work in a laboratory or clinical space are pertinent (such as needle stick injuries) and local guidelines should be strictly adhered to. A complete examination of this important topic is beyond the scope of this chapter, although two aspects are worth highlighting. Firstly, most researchers are anxious about the prospect of being bitten or scratched. Crush injuries should also be taken into account when working with larger animals. Since the latter can be much more serious and in some cases fatal, it is particularly important that safe working practices are rigorously implemented. Measures to prevent injury by animals include training in the correct method of handling animals, the use of gloves or other protective clothing, and various restraining devices or cages. The training requirements for personal licence holders include procedures for handling and restraint and a section on personal health and safety. All bites should be thoroughly cleaned and irrigated, and reported to the safety officer immediately and the person responsible for the research. Secondly, one-third of people working with animals suffer from laboratory animal allergy (LAA). This is caused by respiratory sensitisation that occurs with repeated inhalation of airborne animal proteins. There is no established link with a previous history of atopy (although smokers are more likely to be affected). LAA may have a latent period, and presents with a classic hypersensitivity reaction ranging from rhinitis/ conjunctivitis and occupational asthma to a life threatening anaphylaxis. Occupational asthma will improve in 40% of cases after removal of the exposure. Treatment of an acute episode necessitates urgent medical help (anaphylaxis is a medical emergency). However, the most effective treatment is prevention, and therefore, all people working in the animal laboratory are subject to health surveillance. Avoidance of exposure can also be maintained through the wearing of suitable respiratory protective equipment.
217
1. 2. 3. 4. 5.
Freedom from thirst, hunger and malnutrition Freedom from discomfort Freedom from pain, injury and disease Freedom to express normal behaviour Freedom from fear and distress
A poorly controlled animal environment causes stress for the animal in question and introduces variability into the study outcomes, and animal husbandry, therefore, ensures protection against both these deleterious outcomes. All animal facilities are, thus, designed to maintain the most suitable environment possible on a macro and microscopic scale (Table 17.1). Macrosopic changes such as temperature, humidity, light, noise and ventilation must be continuously monitored and their requirements will vary between species. For example, a fully stocked rodent room will require 15–20 complete changes of fresh or air conditioned air every hour distributed evenly to each cage. Animals are to be obtained only from designated breeding or supplying establishments. Social relationships are important and animals take time to acclimatise to their environment, whilst formulating a hierarchal structure within the group. Therefore, to prevent the division of groups during an experiment, it is advisable to stock densities within a cage at the commencement of a procedure or on arrival within the establishment. The manner in which this is achieved will depend on the infectious state of the animal (see below) and the type of experiment. The standard acclimatisation period for rodents is 1 week. For larger species (e.g. a cat or dog), it may be considerably longer (3–4 weeks). It is important to maintain records/notes on all animals. The Animals (Scientific Procedures) Act 1986 stipulates the minimum requirements for labelling of all cages. The American College of Laboratory Animal Medicine (ACLAM) has now also provided guidance on the definition and content of medical records, and clearly identifies the Attending Veterinarian as the individual who is charged with authority and responsibility for oversight of the institution’s medical records programme [21].
17.9 Animal Husbandry Animal husbandry was defined in 1965 as “the routine application of sensible methods of animal care based on experience – a natural gift refined by study and improved by experience”. Broadly, it can be thought of by considering the five freedoms as follows:
17.9.1 Definitions of Lab Animals Broadly speaking, there are four main types of animal models in use today: naturally occurring, teratogen induced, surgically created and transgenic animals.
218
J. Kinross and L. A. Darzi
Table 17.1 Macro and microscopic conditions required for adequate animal husbandry Macroenvironment Microenvironment Temperature (°C) Mouse = 19–23 Rat = 19–23 Guinea pig = 19–23 Rabbit = 16–20
Animal accommodation Mice >30 g: 100 cm2 (group housing) Mice <30 g: 200 cm2 (single housing) Rats 350–450 g: 300 cm2 (group housing) 700 cm2(single housing) Rats 250–350 g: 250 cm2(group housing) 700 cm2(single housing) Rats 150–250: 200 cm2(group housing) 500 cm2 (single housing)
Relative humidity (%)
Bedding and nesting material
55 ± 10
Comfortable, dry, absorbent, dust free, non-toxic and free from infection
Ventilation The ventilation rate is related to the stocking density and the heat generated. In a fully stocked rodent/rabbit room, there are 15–20 changes of air per hour, distributed evenly to each cage
Environmental enrichment: (Altering a captive animals environment to enable them to express their natural behaviour) e.g. varied diet, other animals, sensory stimuli, cage/pen furniture
Lighting Intensity 350–400 lux at bench level 60 lux at cage level Light:dark cycle Wavelength: few animals have colour vision Dawn and dusk
Diet: Should be of a formulation to meet the nutritional requirements of the animal. Usually dry pellet form for rodents. Care should be taken to ensure that subordinate animals have access to food and water
Noise
Water
Unfamiliar/loud sounds can be harmful
Clean drinking water must be provided at all times
Background noise <50 dB(A), frequency 43–47 kHz
Animals are also defined according to their sterile status. 1. Germ free (axenic): Totally free of all aerobic and anaerobic organisms with the exception of endogenous viruses. 2. Gnotobiotic: Harbour only a specific number or organisms that are intentionally administered for a specific purpose. 3. Specified pathogen free (SPF): Animals are monitored to detect the presence of specified bacteria 4. Conventional: Clinically healthy animals, but not screened (microbiological status is unknown).
17.9.2 Disease Recognition It is essential that all animals are free from conditions potentially transferable to man, or which could adversely influence their clinical outcome and the experimental
work. As a result, health screening and monitoring of animals is an essential component of animal husbandry. Breeders and suppliers have a responsibility to provide clients with details of their health monitoring programme and their most recent screening report on request. However, units that are multipurpose or that have frequent entry of animals or personnel to the unit are at a higher risk of animal health problems. Over one hundred and fifty Zoonotic diseases (transmissible from animal to man) have been identified and these range from common gastrointestinal conditions such as Campylobacter and Salmonella to rarer conditions such as leptospirosis (in rat urine). The chance of human infection from a “normal” healthy animal is small, but theses still carry a diverse flora and they should be considered as potentially opportunistic sources of infection. Despite the presence of a COSHH, definitive control is practiced by disease prevention through the use of barriers to
17
An Introduction to Animal Research
fomites, vectors, biological materials, ventilation and staff. These can take the form of the following: 1. Importation control of animals/biological materials: Examine health profile, routine quarantining and regular screening. 2. Environmental control: Temperature/humidity, air pressure differentials, ventilation, air filters, control of noise, light etc. 3. Containment: Filter top cages, filter cabinets, isolation cubicles, Individually ventilated cages (IVC’s), laminar flow cabinets. 4. Barrier system: Rodent and insect barriers on doors, restricted entry, use of footbaths, change of footware/outer clothing, face masks, gloves etc. 5. Sterilisation of equipments, food, bedding and water. 6. Movement of staff: One way flow system. 7. Standard operating procedures: Enforcement of staff routines. 8. Design of facilities. 9. Other: Strain of animal, nutritional status, immunosupression disease. The recognition of specific disease in animals requires specialist knowledge. However, if in doubt, all findings should be reported to the NACWO or the named veterinary officer. Infections may be acute, chronic or subclinical. The management of these conditions may follow four possible routes. If the pathogen does not adversely affect the animal, it can be tolerated (e.g. Pasteurella in rabbits); however, in treatable conditions that adversely affect the host, the sick animals are culled and the remainder are treated (e.g. Clostridium piriformis). Most frequently, it necessitates a period of depopulation, disinfection and repopulation in order to minimise the spread to animals in the rest of the unit. (e.g. Strep. moniliformis). In severe circumstances, a burn-out policy is followed and breeding is ceased for 12 weeks, with no introduction of new animals. Pups delivered by caesarean section to infected animals can sometimes be cross fostered by clean mothers that have recently given birth. If in doubt about the health of any animal, the NVS should be informed immediately.
17.10 Humane Killing of Animals The termination of an experimental animal represents the most challenging aspect of animal work for many investigators. However, this is an unavoidable component of
219
biomedical research, because in most cases it is used to prevent prolonged suffering for a study animal. Although the skills need to be learned by all persons performing animal research, no person should be expected to kill an animal unless they are willing and feel confident to do so. Terminating an animal for a scientific process at a designated establishment does not require a licence if a method listed in schedule 1 of the Animals (scientific procedures) Act 1986 and that is appropriate to the animal is used. Under section 15, when regulated procedures have been completed and the animal is suffering or likely to suffer adverse events, the person that applied the regulated procedure must cause it to be killed by an appropriate method under schedule 1, or by another method as authorised by the PIL. Animals to be killed should be removed from the presence of others, and conditions whereby the animal may become stressed or frightened should be avoided. Where death is not caused instantaneously, the aim is to induce unconsciousness as quickly as possible. Physical methods of killing animals are quick and humane if carried out competently in the context of routine handling. It is, therefore, essential that the physical method is only employed by those who are trained and fully confident. Whatever method is used, the death must be confirmed before the animal is disposed of. This is achieved by ensuring one of the following has occurred: 1. Confirmation of permanent cessation of the circulation 2. Destruction of the brain: This implies permanent loss of brain function 3. Dislocation of the neck 4. Exsanguination 5. Onset of rigor mortis 6. Mechanical disruption The following methods are used for humane killing under schedule 1, and refer to animals other than foetal, larval and embryonic forms: 1. Overdose of an anaesthetic This is achieved by using a route and an anaesthetic agent appropriate for the size and species of the animal. This technique is appropriate for all animals and the aim is to induce anaesthesia as quickly as possible. In larger species, an intravenous injection is the preferred, although the intracardiac route should be avoided as it is painful and causes distress. Inhalation is acceptable for smaller laboratory animals, but should be avoided in those with a diving reflex. The usual
220
method for fish, amphibians or Octopus Vulgaris is immersion in water containing an appropriate anaesthetic agent until all reflexes are lost and opercular or other respiratory movement has stopped. 2. Exposure to carbon dioxide gas in rising concentration This is appropriate for rodents, rabbits and birds up to 1.5 Kg. A controllable metered dose should be used through a simple chamber with a lid, and not the dry ice form. Smaller animals will become unconscious when the concentration reaches 30%, and they will die at 70%. 3. Dislocation of the neck This is appropriate for rodents up to 500 g, rabbits up to 1 Kg and birds up to 3 Kg. The aim is to fracture the neck at the level of C1–3. Training is essential. 4. Concussion of the brain by striking the cranium This is appropriate for rabbits up to 1 Kg, birds up to 250 g, amphibians and reptiles up to 1 Kg and fishes. Only those who have been trained on recently killed animals should be allowed to perform this method. Striking the brain, either means hitting it against a solid object such as the edge of a bench or hitting it with a blunt instrument. 5. Ungulates (hoofed animals) Separate methods may be employed by registered veterinary surgeons. These include destruction of the brain by a free bullet or captive bolt, percussion or electrical stunning followed by destruction of the brain or exsanguinations before the return of consciousness. 6. The killing of foetal, larval and embryonic forms This requires its own techniques. The stage of development must be taken into consideration. i. Overdose of an anaesthetic using a route and stage appropriate for the size, stage of development and species of animal. ii. Refrigeration, or disruption of membranes, or maceration in apparatus approved under appropriate slaughter legislation or exposure to carbon dioxide in near 100% concentration. This is appropriate for birds and reptiles. iii. Cooling of foetuses followed by immersion in cold tissue fixative: Appropriate for mice, rats and rabbits. iv. Decapitation: Appropriate for mammals and birds up to 50 g.
J. Kinross and L. A. Darzi
17.11 Analgesia It is a requirement of all people involved in animal research to recognise pain, suffering and lasting harm because if it cannot be recognised, it cannot be minimised. Novel techniques are attempting to quantify the biological response to stress and pain [22], but at present, this is not a viable alternative. All project licences must, therefore, contain certain severity limits for pain, which must not be exceeded. They will also describe measures to alleviate and avoid pain and distress. Pain is a subjective sensation that can be difficult enough to judge in humans, so detecting its presence in animals relies on the observational skills of the scientist and a basic understanding of an animal’s response to pain. An animal that is well will obviously move normally, exhibit curiosity and keep itself well groomed as well as eating and drinking normally. It is, therefore, advantageous to observe the behaviour of the animal before the commencement of the experiment. As in humans, changes in behaviour should also alert the PL holder to the presence of pain. Certain behavioural responses (Table 17.1) and signs of clinical pain have been described (Table 17.2). Also some organs will be more sensitive to pain than others, such as eyes, ears and teeth, nerves and genitals. Chronic pain is more likely to produce more subtle behavioural cues, such as weight loss, reduced activity and irritability. Therefore, such signs are not entirely specific, as illness or indisposition may also illicit a similar response (see section 17.9.2). As pain can be difficult to recognise, and researchers are not keen to risk side effects or complications of analgesics, analgesia can too often be withheld. Therefore, a liberal attitude should be maintained, as the adverse effects of overdose are less significant than the distress of paint itself. The physiological response to pain should also be monitored, e.g. tachycardia, tachypnoea or laboured breathing, panting with nasal discharge or constipation and diarrhoea. Flecknell P (2009). Laboratory Animal Anaesthesia. Elsevier, Third edition.
17.12 Anaesthesia Most surgeons do not have to consider the intra-operative anaesthetic requirements of their human patients during routine clinical practice, as this is taken care of
17 An Introduction to Animal Research
221
Table 17.2 Signs and symptoms of pain Sign Rat Rabbit
Guinea pig
Dog
Cat
Monkey
Posture
Persistent Dormouse Posture, guarding
Looks anxious, faces back of cage (hiding posture)
–
Anxious glances, seeks cold surfaces
Tucked in limbs, hunched head and neck
Head forward, arms across body
Vocalising
Squeals on handling
Piercing squeal
Urgent repetitive squealing
Howls, distinctive bark
Distinctive cry, hissing and spitting
Screams
Temperament
May become docile or aggressive
Kicks and scratches
Rarely vicious, usually quiet, terrified or agitated
Aggression or cringing and extreme submission
Ears flattened, fear Facial of being handled, grimace may cringe
Locomotion
Reluctance to move, difficulty in rising
Drags back legs
Penile protrusion, frequent urination
Other
Abdominal No spillage of writhing in mice. food and water, Eats bedding, eats neonates eats neonates, self mutilation
–
No spillage of food
–
–
–
–
–
Adapted from Central Biomedical Services New Licensee Training Course, Imperial college London, 2005
by the anaesthetist. Juggling anaesthetic demands with the responsibility of the experimental procedure can be challenging and it requires proper planning and a robust understanding of anaesthetic technique. General anaesthesia refers to the loss of consciousness and loss of sensation throughout the body. Surgical anaesthesia is defined by a triad of general anaesthesia (unconsciousness), muscle relaxation and analgesia. Schedule 2A of the Animals (scientific procedures) Act 1986 states that all experiments be performed under local or general anaesthesia, unless the administration of the anaesthetic would be more traumatic to the animal than the experiment itself. A recovery period should be considered after procedures such as blood sampling, before an anaesthetic is administered (especially if this is significant). The standard time period is 1 week. Only healthy animals should, therefore, be selected for anaesthesia, and this should be confirmed by the animal health check. As in humans, a full external examination should be performed pre-operatively, and the animals’ identity and details (age, sex, species confirmed). Consider (1) the cage and history: food and water intake, environmental conditions, health status; (2) appearance: coat, mucous membranes, discharge, wound or tumours; (3) activity: breathing, locomotion and curiosity.
Pre-operative starvation is recommended in some species. Pigs, cats, primates and ferrets should have food removed 12 h before surgery, and water removed at 2 h before theatre. Rodents and rabbits cannot vomit, and so are not fasted pre-operatively. Furthermore, mice may become hypoglycaemic after only a few hours [23], and a rat’s stomach is empty after only 6 h of food deprivation [24]. Pre-medication refers to the administration of any drug prior to anaesthesia. A large number of anticholinergic, tranquilisers and narcotics are now in routine use (Table 17.3). This is because they reduce apprehension and fear, reduce the dose of anaesthesia (through their synergistic effects), promote a smoother induction and offer pre-emptive post-operative analgesia.
17.12.1 Selection of Methods of Anaesthesia The exact choice of the anaesthetic agent to be used must also be informed by a wider appreciation of the animal’s requirements. It must therefore: (i) Provide an appropriate length and depth of anaesthesia and analgesia.
222
J. Kinross and L. A. Darzi
Table 17.3 Type of available pre-medication Pre-medication Beneficial effects
Examples
Side effects/complications
Anticholinergics
Reduce oral secretions, prevent vasovagal reflex
Atropine, Glycopyrrolate
80% of rabbits resistant as have atropinases
Tranquilisers
Reduce fear and apprehension, reduce anaesthetic requirements Reduce involountary reflex responses
Acetylpromazine maleate (ACP)
Do not produce analgesia Has hypotensive, antiemetic and hypothermic properties
Sedatives
CNS depression
Benzodiazepines (diazepam, midazolam),
Wide species variation. Good skeletal muscle relaxation, metabolised by liver, mild CVS depression
(ii) It should be strain, age and species specific. There are important variations in response to anaesthetics between strains, such as pentobarbitone in mice. (iii) Type and duration of the procedure, and the type of pain that it could induce. (iv) Anatomical area of surgery: A face mask may impede adequate access. (v) Available expertise/equipment. (vi) Pharmacological interference with the experiment. Anaesthetic codes are then used to describe the anaesthetic status of the animal and are used on a project/personal licence as follows: AA – No anaesthetic. AB – Anaesthesia (local or general) with recovery. Procedure performed with the intent of recovering the animal to consciousness from the anaesthesia. AC – Non recovery anaesthesia. Euthanasia is performed before the animal recovers. AD – Anaesthesia involving the used of neuromuscular blocking agents. Consider both inhalational and injectable routes of anaesthesia. Inhalational agents offer more control and a more rapid recovery. However, they require specific equipment and it is relatively expensive. Injectable agents are simple to administer and they are inexpensive and some of the newer injectable agents possess antidotes to speed recovery. But, they are metabolised by the liver, and the depth and duration of anaesthesia are not readily adjustable. It is best to discuss all anaesthetic requirements with the NVS. Two injectable anaesthesia regimens are widely used in rodents, and they tend to be fentanyl (potent, short term opiod agonist analgesic) or ketamine-based (See Table 17.4).
17.12.2 Inhalational Anaesthetics To provide an adequate inhalation anaesthetic, the following are required: (a) A controlled compresses carrier gas, e.g. Oxygen (b) A volatile anaesthetic liquid in a vaporiser (c) A breathing circuit (d) A waste gas scavenger The set up of these systems will vary according to the physiology and specific requirements of each species. However, the flow rate of oxygen (l/min) must also always be sufficient to allow the vaporiser to function. There should always be a spare oxygen cylinder and a “Low Oxygen Pressue Alarm” is preferable. These are calibrated according to the anaesthetic agent, and should be serviced regularly. A Charcoal canister should be weighed before it is used to ensure that it is not saturated (<1,400 g). A breathing circuit provides a method for delivering the anaesthetic, removing exhaled gasses and controlling ventilation. There are two principal formats: (1) rebreathing and (2) nonrebreathing. Rebreathing circuits are often used in larger animals, and exhaled gases are absorbed by soda lime. These circuits may be closed (gas flow = minute volume) or semi-closed (gas flow > minute volume). Although they are economical and gases may be humidified, the resistance is high within the circuit making it difficult to ventilate smaller animals. Mechanical ventilation and neuromuscular blockade will not be discussed further here. Non-rebreathing circuits are more commonly used in small animals from mice upwards and are much easier to manage. These do not absorb CO2, have a small dead space and a low resistance. Numerous eponymous variations exist, but two common set-ups
17
An Introduction to Animal Research
Table 17.4 Non-inhalational anaesthetic agents Anaesthetic agent Indications/uses
223
Side effects/complications
Anaesthetic properties Good analgesic effect, reversible
Fentanyl/fluanisone (Hypnorm™)
Neuroleptanalgesic (can be used as a sedative)
Moderate respiratory depression, poor muscle relaxation, CVS depressions, prolonged recovery
Fentanyl/fluanisone and midazolam or diazepam
Used in Rabbits, rodents and guinea pigs
Better muscle relaxation
Ketamine
Causes “dissociative anaesthesia”
Characterised by muscle rigidity, no CNS depression, can produce excessive salivation
–
Some analgesic properties Wide safety margin, maintains BP
CI with renal/hepatic disease Raises ICP Ketamine + alpha-2 adrenergic agonist
For rodents, rabbits, cats and dogs and larger animals. s.c., i.m. (painful due to low pH) or i.v.
Significantly hypotensive Toxic in rats when used with buprenorphine
–
Ketamine + benzodiazepine
For minor procedures/ immobilisation
Less CVS depression than alpha-2
–
IP route in rodents. Also hamsers, gerbils, guinea pigs, rabbits, cats, ferrets, pigs, sheep and primates Alphaxalone/ alphadolone (saffan, althesin)
Progesteronal steroids
Hypotension
For use in cats and primates. CI in dogs (causes massive histamine release and death)
Associated with laryngospasm, pulmonary oedema, paw and ear swelling in cats
Propofol
1% solution in soya bean oil. Primarily used as induction agent, short procedures, continuous infusion
No analgesic properties, severe respiratory depression, apnoea
Pentobarbitone
Barbituate with medium length of action Main use as terminal anaesthetic
Metabolised in liver, severe respiratory depression, poor muscle relaxation No analgesia Easy to overdose
Thiopentone
Short acting barbituate (induction over 30 s, lasts 10–15 min)
Repeated exposure leads to tolerance,
Fast metabolism, therefore, can be given as infusion
Rapidly induces unconsciousness, Rapid clearance, Can be given i.v. –
Minimal cardiac depression
Poor analgesic effect
are applied for non-rebreather circuits. The Ayres T-piece is the circuit of choice for animals weighing between 3 and 10 Kg. An open tube acts as a reservoir and it is valveless. During the expiratory phase, the exhaled gases are forced out of the open end of the reservoir. Bain’s co-axial circuit is the most commonly used face mask in rodents and it is a modification of the T-piece where the fresh gas inflow pipe
runs inside the reservoir limb. Open face masks rely on the principle that exhaled gases will pass around the edge of the face mask and that if the flow rate is high enough, rebreathing of exhaled gases will be small. Therefore, the gas flow rate must be three times the animals’ minute volume. This mask is less commonly used as it is not possible to assist ventilation artificially if required.
224
17.12.3 Inhalational Anaesthetic Agents These are volatile compounds that are allowed to vaporise, before it is inspired with specific CNS effects. The most commonly used agents are isoflurane and halothane, and both can be rapidly fatal if overdosed. Halothane is potent with a rapid induction and recovery time (1–3 min). It is non-flammable and non-irritant, but it is highly metabolised and hepatoxic. It is also a respiratory and cardiovascular depressant and may lead to moderate hypotension. At high concentration, it also sensitises the heart to catecholamines. Consequently, it is no longer used as firstline choice. Isoflurane has a more rapid induction and recovery, but causes less cardiovascular depression with good muscle relaxation. It is not significantly metabolised, so is much better suited for high frequency or longterm anaesthesia or if an animal has poor liver function. However, it is more expensive than halothane and possesses a strong smell which can be aversive. It also acts as more of a respiratory depressant than halothane. Enflurane, Desflurane (I653) and Sevoflurane are commonly used in human anaesthesia, but do not offer much advantage over isoflurane and are considerably more expensive. Methoxyflurane is the lease volatile and the most metabolised of the inhalants, but it is rarely used in laboratory animals, e.g. for neonatal rodents. Occasionally, Nitrous Oxide is used in conjunction with isoflurane. This is because it reduces the required dose of other agents, it is inexpensive, speeds induction and adds additional muscle relaxation. However, it is not potent enough to be used on its own, and lowers the percentage of oxygen that is inspired. It may also lead to the “second gas effect” where after prolonged anaesthesia, O2 is displaced by N2O in the lungs, leading to diffusion hypoxia and suffocation. Therefore, 100% O2 should be given for 5–10 min after N2O.
17.12.4 Induction and Maintenance of the Airway General anaesthesia can be induced using injectable agents (single or combination), injectable and gas or gas alone. It is important to ensure that the anaesthetic work station is complete, the circuit is functioning and that the scavenger system is on. Small animals can then be placed in a clean induction chamber if using
J. Kinross and L. A. Darzi
inhalational anaesthetics. Animals will loose their righting reflexes when the anaesthesia has taken effect, and then they can be removed and placed on the facemask. The level of anaesthesia should be checked by noting the hind limb muscle tone and the pedal withdrawal pain reflex or by use of the corneal reflex in larger animals. One must ensure the animal has a patent airway, that the depth of respiration is not too heavy or too light and that it is appropriate for the species. End-tidal capnography should also be used to assess the efficacy of breathing. Flecknell P (2009). Laboratory Animal Anaesthesia. Elsevier, Third edition. Endotracheal intubation is performed in animals that are already anaesthetised. Larger animals are relatively safe and easy to intubate, whereas there is a higher chance of causing laryngeal spasm and oedema in rabbits and rodents [25–27]. However, this technique must be learned and it should not be performed unless the researcher is competent to do so. Most small animals will sit comfortably with their face in the mask for the duration of the procedure. They should be positioned in a normal configuration for the species where possible with adequate support. Forelimbs should not be excessively abducted to prevent neuronal injury and good circulation to the periphery should be maintained. As in human surgery, animals should be placed on a heat mat and normothermia should be maintained as hypothermia is a common cause of anaesthetic death in small rodents. The most important side effect of anaesthesia is cardiovascular and respiratory depression, and as a result, continuous monitoring of circulation and ventilation is essential. Non-invasive monitoring equipment for measuring the pulse, respiratory rate, oxygen saturations and blood pressure should be used whereever possible. Never leave the anaesthetised animal unattended. If the respiratory rate is too high, the animal may be “light” and consider deepening the anaesthesia. Similarly, if the animal is apnoeic, lighten or reverse the anaesthesia. If breathing stops, administer high flow oxygen and commence hand ventilation if the animal is ventilated, or chest compressions, if not. Diagnosing and correcting anaesthetic emergencies require expertise, and in most circumstances, the animal is best killed, as the experimental model will have been compromised. In this event, it is important to ascertain the cause of death to prevent a recurrence. On recovery, turn the vaporiser off and maintain the animal on 100% oxygen for about a minute. When the animal voluntarily begins to move and has regained its
17 An Introduction to Animal Research
righting reflex, it can be placed in a warmed recovery chamber (28–30°C). This cage should not contain bedding that is easily inhaled or which may interfere with the surgical site. It may take some time for the animal to recover, so place it on its right side or in sternal recumbency during recovery. Monitoring should continue until the animal has completely recovered and recovery care is an extension of anaesthetic care. Provide the animal with moist food if they unable to take adequate fluids. Do not be afraid to administer further analgesia if the animals demonstrate signs of pain.
17.13 Surgical Technique It is assumed that most clinicians entering medical research will have some basic founding in the principals underlying good surgical practice. It is not possible in this chapter to describe in full the complete complement of required basic surgical skills, as this would require a completely separate textbook. However, it should not be forgotten that there is a tremendous amount of similarity between human and animal surgery and all animals should be subject to the high clinical standards afforded to humans. Too often this is easily forgotten. This means that medical researchers should be competent to perform the experimental procedure laid out in the PPL and confident in their ability to undertake surgical procedures. Developing these skills takes time and training, which may mean practicing on cadavers or spare tissue. This requires a degree of patience, but it is not ethically acceptable to practice on live animals and it will ultimately ensure the success of the experimental model. If possible, find out if anyone is performing a similar technique in your local institution or area and ask to observe them. During planning of a surgical procedure, take into account the personnel that will be required (e.g. technical assistance and experience), the species, the operating facilities available and the instruments and materials you will require. The recommended areas for performing a surgical technique will include a pre-surgery area (for anaesthetic induction), a surgical area and a post-operative room for recovery and monitoring. Not all facilities will have these available, but spaces should be created to permit different procedures to be performed during surgery. Finally, carefully consider the timing of the surgery to ensure you will be available to carry out the
225
post-operative care. Specific animal models will, of course, be required for studying individual pathological conditions. Some organisations have begun to offer guidelines for standards of surgery in particular specialities. AOVET (the veterinary specialty group of the AO Foundation), in concert with the AO Research Institute (ARI) and the European Academy for the Study of Scientific and Technological Advance, convened a group of musculoskeletal researchers, veterinarians, legal experts and ethicists to determine guidelines for the use of animals in musculoskeletal research [17]. Specific technical procedures will also require individual tools and a specialised approach, such as endoscopic surgery [28, 29]. If you are performing intricate surgery on small animals, it is worth attending a microsurgical course and ensuring that you are confident operating either with loops or under a microscope. If a relatively simple procedure needs to be carried out on a large number of animals, it may be feasible to perform batch surgery. This requires careful planning, an adequate amount of space and a minimum of three sets of sterilised instruments. It may also be necessary to use hot bead sterilisation for all equipment. Animal surgery also requires the same level of asepsis required in human surgery. It cannot be stressed enough that as in humans, all animals should be scrubbed and prepped before surgery, and surgeons should wear sterile gloves and gowns as it is unacceptable to loose an animal to preventable sepsis and wound infection. Maintaining these high levels of asepsis can be challenging in batch surgery.
17.14 Post-Operative Care Animals should not be left immediately after surgery, as they will require regular and close observation until they have adequately recovered. Then recovery process aims to re-establish homeostasis as soon as is reasonably safe by providing heat, fluids, oxygen and analgesia. Ideally, animals are placed in an incubator or hot box, or failing this, it is acceptable to place a normal cage filled with Vetbed on top of a pre-heated mat or to use infrared heating lamps. The recovery area should be quiet and the lighting should be subdued when not examining the animal. Post-operative fluid requirements vary with species (Table 17.5). Oral intake of fluids may be reduced for 12–24 h and there will be a requirement to replace insensible and surgical fluids
226
J. Kinross and L. A. Darzi
Table 17.5 Fluid requirements of animals commonly used in animal experiments Species Daily fluid Average intake for standard intake (mL/ animal Kg/24 h) Mouse
150
20 mg mouse = 3 mL/24 h
Rat
100
250 g rat = 25 mL/24 h
Guinea-pig
100
750 g GP = 75 mL/24 h
Rabbit
100
4 Kg rabbit = 400 mL/day
Cat
40–60
4 Kg cat = 160–240 mL/24 h
Dog
40–60
10 Kg dog = 400–600 mL/24 h
Table 17.6 Degrees of severity of dehydration and guidelines for assessment [30] Body Sunken Skinfold PCV (%) Fluid weight eyes, test requires to loss (%) shrunken persists replace face, dry for (s) volume membranes deficit (mL/ Kg) BW 4–6
Barley detectable
<2
40–45
20–25
6–8
++
2–4
50
30–50
8–10
+++
6–10
55
50–80
10–12
++++
20–45
60
80–120
If the animal is conscious, the most appropriate route is oral replacement from the bottle or by syringe into the mouth. If this is unfeasible, the subcutaneous route into the scruff is an acceptable alternative. However, if the animal is in shock or has suffered >5% fluid loss, intravenous fluid therapy is generally required at a rate of 10 mL/Kg/h. If siting a cannula is not possible, intraperitoneal therapy should also be considered. Remember not to extend the severity limits of your project licence, and if the animal does not respond or continues to suffer, they should be subjected to schedule 1 and terminated. The choice of fluids will depend on the route and aetiology of the dehydration. While giving oral medication, the fluid should be palatable which is challenging with some electrolyte solutions. A good example is “Lectade”. For subcutaneous or intravenous routes, Hartmann’s solution is preferable, but as in humans, crystalloids such as 0.9% NaCl are also acceptable. Pain control also forms a crucial part of the postoperative protocol, and analgesia should never be withheld if an animal demonstrates signs of distress.
17.15 Conclusion
PCV = Packed Cell Volume
loss and to maintain the fluid balance during recovery. By examining the animal, make a clinical estimate of the state of hydration by assessing the mucous membranes and the capillary refill time (Table 17.6). Slatter D (2003). Textbook of small animal surgery. Saunders, Third edition. As a rule of thumb: Volume of fluid to be replaced = fluid deficits + blood loss + maintenance fluids. Or simply, Volume of fluid to be given = [sutgical time (h) × basic fluid rate (ml) × wt of mouse (kg)] + [blood loss] + [time with no oral intake (h)].
Despite the controversy, animal experimentation has remained a vital part of medical research and it continues to provide us with novel therapeutic strategies for numerous pressing medical catastrophes. By seeking to provide the most efficacious treatments for our patients, we are, therefore, obliged to carry out all animal research in the most scientifically valid and humane method possible. A successful animal experiment requires meticulous planning and careful consideration to ensure that these goals are attained.
17.16 Useful Web Sites The websites listed in Table 17.7 are useful resources for all those planning an animal experiment.
17 An Introduction to Animal Research
227
Table 17.7 Useful web resources http://www.felasa.org
(Federation for the European Laboratory Animal Sciences Association)
http://www.apc.gov.uk
The Animal Procedures Committee
http://www.scienceandresearch.homeoffice.gov.uk/ animal-research
The Home Office
http://www.animalaid.org.uk
Animal Aid
http://www.amrc.org.uk
Association of Medical Research Charities
http://www.bsf.ac.uk/default.htm
The Biosciences Federation
http://www.buav.org/index.php
British Union for the Abolition of Vivisection
http://www.frame.org.uk
Fund for the Replacement of Animals in Medical Experiments
http://www.rds-online.org.uk
The Research defence Society
http://www.drhadwentrust.org.uk/
Dr Hadwen Trust (Replacing Animals in Research)
http://www.virtual-anaesthesia-textbook.com/vat/ vet.html
Virtual Anaesthesia textbook
http://www.informatics.jax.org/mgihome/nomen/ strains.shtml
Mouse Genome Informatics
http://dels.nas.edu/ilar_n/ilarhome/
Institute for Laboratory Animal Research
http://www.veterinary-instrumentation.co.uk/
Veterinary instrumentation
http://www.digires.co.uk/
Digital resource for Veterinary trainers
http://www.ivis.org/home.asp
International Veterinary Information Service
http://www.intmtc.com/
International Microsurgery Training Centre
References 1. (2006) Animals (scientific procedures) inspectorate annual report. Home Office: Science, Research and Statistics, London 2. Davidson N (2006) Davidson review: implementation of EU legislation. http://wwwhm-treasurygovuk/independent_ reviews/davidson_review/davidson_indexcfm 3. Fox JG, Cohen BJ, Loew FM (1984) Laboratory animal medicine. Academic press, New York 4. RDS, Understanding Animal Research in Medicine, Coalition for Medical Progress (2007) Medical advances and animal research: the contribution of animal science to the medical revolution: some case histories. Available at: http://www.pro-test.org.uk/MAAR.pdf 5. Foex BA (2007) The ethics of animal experimentation. Emerg Med J 24:750–751 6. Matfield M (2002) Animal experimentation: the continuing debate. Nat Rev Drug Discov 1:149–152 7. Rollin BE (2007) Animal research: a moral science. Talking Point on the use of animals in scientific research. EMBO Rep 8:521–525 8. Singer P (2006) In defense of animals: the second wave. Blackwell, Malden, MA 9. Croce P (1999) Vivisection or science? An investigation into testing drugs and safeguarding health. Zed Books, London
10. Schechter AN, Rettig RA (2002) Funding priorities for medical research. JAMA 288:832; author reply 832 11. Pound P, Ebrahim S, Sandercock P et al (2004) Where is the evidence that animal research benefits humans? BMJ 328: 514–517 12. Perel P, Roberts I, Sena E et al (2007) Comparison of treatment effects between animal experiments and clinical trials: systematic review. BMJ 334:197 13. Mayor S (2007) UK regulatory body wants public consultation on human–animal hybrid research. BMJ 334:112 14. Anon (2007) Avoiding a chimaera quagmire. Nature 445:1 15. Karpowicz P, Cohen CB, van der Kooy D (2004) It is ethical to transplant human stem cells into nonhuman embryos. Nat Med 10:331–335 16. Thompson SG (1994) Why sources of heterogeneity in meta-analysis should be investigated. BMJ 309:1351–1355 17. Auer JA, Goodship A, Arnoczky S et al (2007) Refining animal models in fracture research: seeking consensus in optimising both animal welfare and scientific validity for appropriate biomedical use. BMC Musculoskelet Disord 8:72 18. Sarker SK, Patel B (2007) Simulation and surgical training. Int J Clin Pract 61:2120–2125 19. Dolan K (2007) Laboratory animal law: legal control of the use of animals in research. Blackwell, New York 20. Rice M (2007) Deadline approaches for animal experimentation directive. Eur J Cancer 43:1641
228 21. Field K, Bailey M, Foresman LL et al (2007) Medical records for animals used in research, teaching, and testing: public statement from the American College of Laboratory Animal Medicine. ILAR J 48:37–41 22. Hauser R, Marczak M, Karaszewski B et al (2008) A preliminary study for identifying olfactory markers of fear in the rat. Lab Anim (NY) 37:76–80 23. Vermeulen JK, de Vries A, Schlingmann F, Remie R. (1997) Food Deprivation: common sense or nonsense? Animal Tech 48:45–54 24. Levine S, Saltzman A (2000) Feeding sugar overnight maintains metabolic homeostasis in rats and is preferable to overnight starvation. Lab Anim 34:301–306 25. Brown C (2007) Endotracheal intubation in the dog. Lab Anim (NY) 36:23–24 26. Price H (2007) Intubating rabbits. Vet Rec 160:744 27. Spoelstra EN, Ince C, Koeman A et al (2007) A novel and simple method for endotracheal intubation of mice. Lab Anim 41:128–135
J. Kinross and L. A. Darzi 28. Buscaglia JM (2007) Animal laboratory endoscopic research: a fellow’s perspective. Gastrointest Endosc 65: 882–883 29. Fritscher-Ravens A, Patel K, Ghanbari A et al (2007) Natural orifice transluminal endoscopic surgery (NOTES) in the mediastinum: long-term survival animal experiments in transesophageal access, including minor surgical procedures. Endoscopy 39:870–875 30. Radostits OM, Mayhew IG, Houston DM (2000) Veterinary clinical examination and diagnosis, 2nd edn. W.B. Saunders Company, Philadelphia 31. James Thomson et al, "Embryonic stem cell lines derived from human blastocysts," Science 282:1145–1147, November 6, 1998 32. The National Academies' Guidelines for Human Embryonic Stem Cell Research. 2008. http://books.nap.edu/catalog. php?record_id=12260
The Ethics of Animal Research
18
Hutan Ashrafian, Kamran Ahmed, and Thanos Athanasiou
Contents . 18.1
Introduction ............................................................ 229
18.2
Development of Ethics in Animal Research ......... 230
18.3
The Benefits of Animal Research .......................... 231
18.4
The Case Against Animal Experimentation ........ 231
18.5
The Case for Animal Experimentation ................ 232
18.6
Conclusions (The Middle Ground) ....................... 233
References ........................................................................... 234
Abstract Many of our research processes require the use of animals to test hypotheses and advance treatments. A persisting ethical issue survives as to whether the use of animals in scientific and surgical research is justified. Integrating several foundation philosophies, society has now set up arguments for and against animal research that initiates strong debate. In this chapter, the role and extent of animal research in our society are considered. Furthermore, the ethical standpoints that influence animal research in surgical research are also addressed.
18.1 Introduction
H. Ashrafian () The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
The use of animals for research and investigation has a long tradition in the history of mankind. Excluding their use for nutrition, sustenance, physical labour and social company, a variety of different species have been employed or sacrificed for a diverse array of medical applications, whether it is in the foretelling of the future by the reading of entrails or by determining the safety of water supplies by giving samples to stock animals to foretaste. This reflects rather poorly on humanity as habitually viewing their animal counterparts as “inferior” beings, but does not give us the whole picture as contradictions to this inferior view of beasts also exist [1]. Many ancient cultures describe dogs as “man’s best friend” [2], and numerous examples exist of holy animals in Ancient Egypt [3] and India [4] that were regularly treated better than many of their human counterparts; indeed, Alexander the Great founded a whole city to his beloved horse Bucephalus [5]. As education, erudition and moral philosophy have progressed over time in our interconnected world
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_18, © Springer-Verlag Berlin Heidelberg 2010
229
230
culture; we have come to see the sanctity of life in our fellow animals, but at the same time, are aware of the necessary scientific steps needed to make advancements in our society. Many of our research processes have and continue to need the use of animals to test hypotheses and advance treatments, and thus, a persisting ethical issue survives as to whether the use of animals in scientific and surgical research is justified.
18.2 Development of Ethics in Animal Research The specific history of the use of animals in medical research begins in the Greek era where both Hippocrates (ca. 460–370 bc) and Aristotle (384–322 bc) had begun to define knowledge on the structure and function of human anatomy based on animal dissection reported in their respective “Hippocratic Corpus” and the “Investigation of Animals”. During the same era, a number of Greek philosophers founded a school of philosophy known as Stoicism based on the teachings of Zeno of Citium (333–264 bc) that as all humans are rational, whether king, foreigner or salve, all deserve justice [6]. Although an important milestone was therefore set by this group in the progression of humanity, the Stoics themselves considered animals as nonrational, and thus not deserving of justice. Modern science, however, has progressed to demonstrate that rational attributes in animals have led to a reconsideration of the view of moral rights and justice deserved by animals. It was approximately a millennium later that the Neoplatonic philosopher Porphyry of Tyre (ca. 233– 309 ad) proposed that the difference between eating animals and plants was that the former would feel pain. Nevertheless, much of the research on animals continued, particularly for ever-improved anatomical and surgical knowledge that was continued by influential physicians such as Galen (129–ca. 200 ad) and his followers, using monkeys, dogs and pigs for physiological observations. By the middle ages, however, the Arab world was also responsible for major medical advances, and although the Islamic faith prevented the “painful” or the “disfiguring” uses of animals in science, physicians such as the famed author of the first “Canon of Medicine”, Avicenna (980–1037 ad) and his students were able to improve on suture techniques and discern key physiological concepts on animals with the use of anaesthesia [7].
H. Ashrafian et al.
In medieval Europe, a prevailing concept, however, developed citing man as the master of nature, and although theologians such as St Augustine (354–430 ad) had clearly ratified the concept of “Thou shalt not kill” without exception, later religious thinkers such as St. Thomas Aquinas (ca. 1225–1274 ad) amended this concept by explaining that “plants are made for the sake of animals, and animals are made for the sake of man” [8]. Following The Renaissance, Andreas Vesalius (1514–1564) championed his empirical method to anatomy. Here, knowledge is derived from and verified by means of direct observation, thereby rationalizing his numerous meticulous animal and human dissections published in his seven-volume magnus opus entitled “De Humani Corporis Fabrica”. As a consequence of this new wave of science and thought, the French philosopher René Descartes (1596–1650) proposed that animal experimentation could occur without great moral problems because not only did he adhere to the widely accepted notion of the times that man is the “master and possessor of nature”, but also that animals are “machines” as they cannot think as men do, thus they cannot have awareness or the capability of feeling pain [9]. A proposal therefore was made that was considered to be a complete contradiction to the work of Porphyry’s concept of animals feeling pain approximately 1,500 years earlier. In the eighteenth century, the German philosopher Immanuel Kant (1724–1804), one of the greatest thinkers of The Age of Enlightenment, proposed his notion of “Categorical Imperative”, whereby human being were unique in creation, existing as the only beings who are able to make more and ethical judgements. He explained that all such decisions or “imperatives” were derived from one ultimate commandment of reason. Each imperative is defined as any proposal that necessitates a certain action to complete it, with a “categorical imperative” signifying an unconditional action required for all circumstances. Its application to animal research therefore can be expressed by the argument that we have a need for tissue biological research that is the imperative which necessitates our experimentation on animals. However, this argument cannot be extended beyond animals to humans as they are exempted by their unique creation and moral status [10]. Kant’s view of the ethical acceptability of actions and its derived arguments are typically classified as the Kantian or deontological view whereby humans are considered as unique among other animals based on
18
The Ethics of Animal Research
the premise that humans have the exclusive ability to make moral and ethical judgments. By the late eighteenth century, however, the British philosopher Jeremy Bentham (1748–1832) championed a significant shift in our consideration of animals by applying what is called today as the Utilitarian view [11]. Here, the onus is on the decision-maker of an action to objectively evaluate the acceptability of the action and its consequences on all the beings that it affects (overall utility). He derived this view by adding to the work of the Greek philosopher Epicurius’s concept of “pleasure and pain” and that of the Scotsman David Hume (1711–1776) who proposed that morality does not depend on reason, but on sympathy which importantly we share with animals [12]. Bentham, therefore, opposed Descartes and concluded that animals such as horses and dogs are rational, but importantly re-addressed the whole question of rationality by explaining that the ethical query is not whether animals can reason, but can they suffer? Explaining that they can indeed have suffering therefore, with a Utilitarian view, before we perform any actions such as experiments on animals, we have to consider that some of these actions will cause suffering as they would in us, and thus, we would need to reconsider our reasoning for performing such an action. Integrating these foundation philosophies, society has now set up arguments for and against animal research that are argued strongly by both sides. These will be considered; however, before such a dichotomy is addressed, the role and the extent of animal research in our society need defining.
18.3 The Benefits of Animal Research Approximately 50–100 million animals worldwide are annually used for research, with species that range from fruit flies to non-human primates. The United Kingdom’s Home Office quotes that the number of scientific animal procedures started in 2006 was just over 3.01 million, the majority of which (83%) are on rodents [13]. Dogs, cats, horses and non-human primates are afforded a special protection, and their use in research has recently fallen in this country. This is also reflected in the United States of America, where The Office of Technology Assessment approximates that 17–23 million animals are used for research each year [14]. Here, about 95% of animals used are rodents, and similar to the UK, there is a decline in the use of dogs,
231
cats and non-human primates [15]. For comparison, the United States alone reports that over 365 million animals per year are being killed by cars on the road [16]. The overall medical and surgical benefit of animal testing is difficult to directly quantify, and has thus consequently led to much debate as to its uses. Contributions to introducing novel life-saving surgical procedures or refining them add to a barrage of uses that include the creation of international vaccines and innumerable drug therapies. Nevertheless, animal research contributes heavily to most of the world’s top biomedical institutions, universities and companies, and it has even been purported that the only thing in society not tested on animals is jokes. The National Center for Health Statistics approximated that for every dollar spent on health care, 3.5 cents were spent on research. Such an investment in research ($45 billion) will no doubt reap benefits in the future, as economists estimate that the increase in life expectancy from the 1970s and 1980s was worth $57 trillion to the United States [17]. Furthermore, in 2005, over 500 leading academic UK scientists and doctors signed a Declaration on Animals in Medical Research that stated [18, 19]: “Throughout the world people enjoy a better quality of life because of advances made possible through medical research, and the development of new medicines and other treatments. A small but vital part of that work involves the use of animals.” This reaffirmed the positive statements about the need for animal research by the Weatherall Committee in 2006 [20], The Nuffield Council on Bioethics in 2005 [21], The Royal Society in 2004 [22] and the House of Lords select committee on animals in scientific procedures in 2002 [23].The general consensus is for researchers to gain the medical and scientific benefits of animal research with minimal suffering and distress and make every effort to safeguard animal welfare [24].
18.4 The Case Against Animal Experimentation Currently, the main arguments against research on animals fall into the category that animals have equal rights to humans, and, therefore, if it is unethical to experiment on our fellow man who are worthy of human rights, it is unethical to experiment on animals who have “equal” rights (Fig. 18.1) [25, 26]. In this school of thought, the attributes that give humans the entitlement to have rights are deeply
232
H. Ashrafian et al. Although animal research is not ‘ideal’, it is nevertheless beneficial to mankind Animal research can be performed, but only when absolutely necessary
Animals have ‘equal rights’ to us Animals have a ‘soul’ Animals have cognition Animals have sensation and emotion Animal research can cause harm
Reasons Against
The middle road
Animals may experience pain Animals may die Animals have ‘Inherent Value’ Animal Research Ethics Animal research has saved lives We are obliged to animals, though this does not necessarily afford them ‘equal rights’ The concept of ‘rights’ is a human one and does not necessarily apply to animals
If animal research is performed, then pain, death and suffering have to be absolutely minimised. The principle of the Replacement, Reduction and Refinement ( 3R’s) needs to be srictly followed. If animal research is performed, it needs to be done under the auspices of governmental, legal and ethical regulations
Reasons For
To some extent, humans have a priority over animals
Fig. 18.1 Overview of animal research ethics
scrutinised. From a religious view, some theologians would argue that as “humans” and “animals”, we all have a soul given to us by God and are, thus, worthy of equal rights, although this view is sometimes excluded by moral philosophers on the grounds that it limits societies objectivity [27].The question nevertheless continues, and is followed up by the facts that as humans and animals, we have autonomy and rationality which inanimate objects and plants do not have. Thus, if it is our autonomy and rationality that give us our rights, then animals are afforded them too. Now it can be said that not all animals have the same levels of autonomy and rationality, thus a mouse, a great whale and a nonhuman primate differ in their ability to express these characteristics, so would there be a theoretical cut-off point or maybe a set of “cognitive criteria” [28] as to which creatures with a certain level of autonomy and rationality are worthy of moral rights while others are not? Setting such a cut-off point would have serious implications, and there are examples that some human beings including children do not express true rationality or autonomy and therefore would have a lower standing than some animals in terms of rights. Hence, proponents of this argument would advocate giving all animals equal rights to humans and thus would argue against animal research. The umbrella term of cognitive criteria has, thus, been used to characterise us as humans, but also the nature of animals and its concepts have been used to link our worthiness to having rights, but also why animals deserve them as well, thereby negating our reasoning to research on them:
1. 2. 3. 4. 5.
Capacity of self-consciousness Capacity for purposive action Capacity to communicate in a language Capacity to make moral judgements Rationality
However, proponents against animal research further qualify their arguments that in addition to the factors that we share with animals, there are also non-cognitive elements that need to be considered such as sensation and emotion. Both of which have been demonstrated in a variety of species, and thus on these ground, they argue that animal experimentation would be unethical. All these factors then add together to create a so-called “inherent value” in all animals that gives them “equivalent” rights and therefore exemption from research [27, 29]. In addition, those against animal research also apply the abolitionist view that it is wrong to cause suffering in an experiment when it is not for the benefit of the victim. They voice the opinion that if some things cannot be learnt from the fact that we cannot perform research on animals because of their rights, then that would be acceptable to those against animal research.
18.5 The Case for Animal Experimentation As already alluded, the historical argument for research on animals rested on the concept that the “humans deserve more” than their animal counterparts. Although
18
The Ethics of Animal Research
these theories still exist in some circles, they are seen by the mainstream as rather outdated and the current arguments advocating animal research are largely made by the philosophy that while it is accepted that in an ideal world sacrificing the lives of other whether human or animal would not exist, in reality, tissue research is essential and animals cannot be afforded the same rights as humans in this setting. In order to account for this, a number of moral concepts are cited (Fig. 18.1). It is proposed that although as a society we are grateful and “obliged” to animals for their contribution to human research and welfare, we are not required to afford them the same rights as ourselves. Thus, obligations to animals do not automatically entail animals’ equal rights as ours [16, 22]. Let us take the example of an ill first-class passenger on an aircraft whose ailment is cured by a fellow economy-class passenger, who by chance happened to be a doctor. In this circumstance the patient may be grateful and “obliged” to his fellow passenger for helping him, though the doctor does not have any right to automatically expect subsequent firstclass treatment (though many doctors would enjoy the prospect of this!). Adding to this view is the concept that animals do not have the same rights as humans due to the fact that the whole concept of rights is a totally human perception and idea. Thus, in an example where we consider both zebras and lions to have inherent rights, if we were to observe a lion killing a baby zebra to feed her own lion cubs, we would find the killing of the baby zebra highly unpleasant, some of us even considering intervening. However, both species justify their arguments, one could argue that the zebra has a right not to be killed, just as a lion has a right to kill and feed her own babies. It is not in the fact that nature takes its course a factor in our thoughts, rather that we afford these animals these rights to the point that we even consider whether the zebra should be killed, whether the lion has the right to kill it or should it have eaten something equally nutritious but less animate for sustenance! This is not in the psyche of animals, which perform their killing as a matter of fact for survival. They do not question their own rights or the rights of others as they have no capacity to do so, and thus, this school of philosophy argues that as the concept of “rights” is totally unique to humans, it has no place when we apply it to animals. Therefore, although it can be argued that animals have what we call “inherent value”, it does not mean that they cannot be used for appropriately guided research [22, 26].
233
Proponents of animal research also argue that if experiments on them were not done, mankind would come to harm in the sense that the lack of continued animal research would lead to a lack of progression of overall medical research, and thus deficiency in patient treatment. Harming humans therefore in the absence of animal research is not a viable option from a moral standpoint, but leads us to arrive at other conclusions. Although researching on animals may not be morally “ideal”, a lifeboat scenario argument is used that an animal on lifeboat may be sacrificed first to save the humans on board. In the same sense, animals are sacrificed in a correct setting to save mankind from disease. Furthermore, although not ideal, animal research is morally acceptable on the grounds of practicality and empathy for our fellow man. This is exemplified by the case that involves our belief that stealing from one another is a crime and morally unjust. If one was to observe a thief stealing from a starving individual, one would consider it totally heinous and immoral, though seeing a starving thief steal to feed his starving children would lead us to conclude that although stealing in this case is still unjust, it is rendered more acceptable on the grounds that we empathise with the starving thief and his children. The same acceptance is thus afforded to humans performing medical and surgical research on animals. Applying a similar thought experiment to the lifeboat scenario, a British working group looking at the use of non-human primates in research described a “Hospital Fire” scenario [20] where we have to prioritise which living organisms to save from the flames. They concluded that: 1. Humans generally, and almost universally, accord a lower priority to all animals than they accord to any humans (which means, inter alia, that they believe it right to save humans before animals). 2. Humans think it is morally required to sacrifice the lives of animals to save human life (consistency then requires that they should do so – other things being equal – in medical research, as well as in hospital fires).
18.6 Conclusions (The Middle Ground) The concepts of rights for animals and the consequences of performing research upon them have led to a persistent dilemma in today’s society, which according
234
to some may be unanswerable. This quandary is further complicated by our society’s varying degrees of philosophical and ethical consideration towards animals. Some, for example, accept and advocate the need for animal research, but in view of their emotional ties with companion animals and pets, feel that experiment on their pet species is unacceptable. This perception falls between two extremes of pro- or anti-animal research. Those who are pro-experimentation believe that animal research is totally acceptable in all its forms, and those who are anti-experimentation believe that animal research is not acceptable under any circumstances. This inconsistency of ideas has therefore led to the current ethical and legal turmoil for scientists and animal researchers [30–32]. In an attempt to answer these discrepancies, a number of attempts are being made to unite a set of animal research principles that will be accepted by a majority of society. Such an alliance between the various groups considering animal research can be considered as an agreement within a so-called “middle ground” (Fig. 18.1). High among these is the work of Russell and Rex Birch, who adhered to the well-accepted notion that in an ideal situation, no animal experimentation should be carried out, but where necessary, it is important to limit the number of overall animals used by using other methods of research, and particularly to aim to reduce the use of higher mammals for scientific research. In 1959, they published their seminal treatise “Principles of Humane Experimental Technique” [33, 34], which first introduces us to the concept of the 3 R’s in animal research, namely replacement, reduction and refinement. This has since become an international standard when governments and ethics committees consider animal research, and represents the principles that need adhering before animal research is carried out. In replacement, animal research should only be considered if other experimental systems cannot be in as a replacement for animals to attain the same answers. Furthermore, once animal research is envisaged, it is important to maximise the efficiency of the information gained from each animal so as to limit the numbers of individuals used, thereby complying with the reduction principle. Lastly, when animal research is to be carried out, all experimental methods and techniques should be performed or modified to ensure that minimal pain, distress and suffering is experienced by each animal, while also enhancing its well-being to the highest possible degree.
H. Ashrafian et al.
These concepts are now widely accepted and perfuse the scientific design of animal experiments, governmental research authorities and ethics committees around the world [29, 35–37]. They are, however, only ethical principles and need practical application to a wide variety of research methodologies, taking into account each species used, the results needed and costbenefit analyses. This is our global society’s best application of the utilitarian and Kantian approaches refined over millennia. It is certainly by no means complete, but acts as an ethical guide by which we can steer our necessary animal research. With it, we get ever closer to achieving our universal goal of living in a world that is as disease-free as possible, while also assuring that we reach such a utopia through the best ethically responsible path.
References 1. Kalof L (2007) Looking at animals in human history. Reaktion, London 2. Müller FM (2001) The Zend Avesta (sacred books of the East). RoutledgeCurzon, London 3. Callou C, Samzun A, Zivie A (2004) Archaeology: a lion found in the Egyptian tomb of Maia. Nature 427:211–212 4. Majupuria TC (2000) Sacred animals of Nepal and India: with reference to Gods and Goddesses of Hinduism and Buddhism. M. Devi, Gwalior 5. Heckel W, Yardley JC (2004) Alexander the great: historical sources in translation (Blackwell sourcebooks in ancient history). Blackwell, Oxford 6. Wenley RM (2007) Stoicism and its influence. Kessinger, Whitefish, MT 7. Porter R (2001) The cambridge illustrated history of medicine. Cambridge University Press, Cambridge 8. Barad JA (1995) Aquinas on the nature and treatment of animals. International Scholars, San Francisco 9. Descartes R (1985) The philosophical writings of descartes: volume 1. Cambridge University Press, Cambridge 10. Kant I (1992) Theoretical philosophy, 1755–1770 (the cambridge edition of the works of Immanuel Kant in translation). Cambridge University Press, Cambridge 11. Bentham J (1996) An introduction to the principles of morals and legislation (Bentham, Jeremy, Works). Oxford University Press, Oxford 12. Hume D (2006) An enquiry concerning human understanding (The clarendon edition of the works of David Hume). Oxford University Press, Oxford 13. Home Office United Kingdom (2008) Publications and reference – statistics. Available at: http://scienceandresearch. homeoffice.gov.uk/animal-research/publications-and-reference/statistics/ 14. US Congress Office of Technology Assessment (1986) Alternatives to animal use in research, testing, and education.
18
The Ethics of Animal Research
Available at: http://govinfo.library.unt.edu/ota/Ota_3/DATA/ 1986/8601.PDF 15. United States Department of Agriculture (2000) USDA animal care report. Available at: http://www.aphis.usda.gov/ac/ awrep2000.pdf 16. Sterba JP (2002) In the headlights: as man and beast clash on highways, both sides lose. Wall St. J A1 17. Lasker Foundation (2000) Exceptional returns: the economic value of America’s investment in biomedical research. Available at: http://www.laskerfoundation.org/reports/pdf/ exceptional.pdf 18. Research Defence Society (2005) Declaration on animals in medical research. Available at: http://www.rds-online.org. uk/upload/docs/Declaration%202005.pdf 19. Research Defence Society (2005) 15 years on: top scientists and doctors back animal research. Available at: http://www.rdsnet.org.uk/pages/news.asp?i_ToolbarID=6&i_PageID=1964 20. Weatherall D, Goodfellow P, Harris J et al (2006) The use of non-human primates in research. Academy of Medical Sciences, London 21. Nuffield Council on Bioethics (2005) The ethics of research involving animals. Nuffield Council on Bioethics, London 22. The Royal Society (2004) The use of non-human animals in research: a guide for scientists. The Royal Society, London 23. House of Lords (2002) Animals in scientific procedures. Available at: http://www.publications.parliament.uk/pa/ld/ ldanimal.htm 24. Gonder JC (2007) Introduction: recent studies, new approaches, and ethical challenges in animal research. ILAR J 48:1–2 25. Regan T (1997) The rights of humans and other animals. Ethics Behav 7:103–111
235 26. Thomas D (2005) The ethics of research involving animals: a review of the Nuffield Council on Bioethics report from an antivivisectionist perspective. Altern Lab Anim 33:663–667 27. Smith DH (1997) Religion and the use of animals in research: some first thoughts. Ethics Behav 7:137–147 28. Beauchamp TL (1997) Opposing views on animal experimentation: do animals have rights? Ethics Behav 7: 113–121 29. Steneck NH (1997) Role of the institutional animal care and use committee in monitoring research. Ethics Behav 7: 173–184 30. Frey RG (1997) Moral community and animal research in medicine. Ethics Behav 7:123–136 31. Mitchell G (1989) Guarding the middle ground: the ethics of experiments on animals. S Afr J Sci 85:285–288 32. Pluhar EB (2006) Experimentation on humans and nonhumans. Theor Med Bioeth 27:333–355 33. Russell WM (2005) A comment from a humane experimental technique perspective on the Nuffield Council on Bioethics report on the ethics of research involving animals. Altern Lab Anim 33:650–653 34. Russell WMS, Burch RL (1959) The principles of humane experimental technique. Methuen, London 35. Hendriksen CF (2005) The ethics of research involving animals: a review of the Nuffield Council on Bioethics report from a three Rs perspective. Altern Lab Anim 33:659–662 36. Perry P (2007) The ethics of animal research: a UK perspective. ILAR J 48:42–46 37. Schuppli CA, Fraser D, McDonald M (2004) Expanding the three Rs to meet new challenges in humane animal experimentation. Altern Lab Anim 32:525–532
Ethical Issues in Surgical Research
19
Amy G. Lehman and Peter Angelos
Contents 19.1
Introduction ............................................................ 237
19.2
Randomized Controlled Trials in Surgery ........... 238
19.3
Innovation in Surgery ............................................ 240
19.3.1 What Is a “Last Resort” Innovation?........................ 240 19.3.2 What Is Institutionally Regulated Innovation? ......... 241 19.3.3 What Is Peer Regulated Innovation? ........................ 241 19.4
Conclusion............................................................... 242
References ........................................................................... 242
Abstract All biomedical research, including clinical medical and surgical research, should conform to a set of ethical principles first delineated in the Nuremburg Code, and subsequently expanded upon in the Belmont and Helsinki Reports. Despite this agreement on principles of human subjects’ research in the medical field, there is a lack of clarity about special aspects of surgical research ethics when compared with medical research ethics. Because of the unique relationship between the surgeon and his/her patient, which includes extraordinary proximity and a degree of physical violence, many regard surgical ethics as a unique entity. We define surgical research ethics as research that involves actual procedures and/or operations. The specific ethical issues in surgical research that we discuss are randomized controlled trials, placebo surgery, and surgical innovation and its regulation. Strategies for the adaptation to new cultural norms and expectations are also discussed.
19.1 Introduction
P. Angelos () Department of Surgery, The University of Chicago, University of Chicago Medical Center, 5841 South Maryland Avenue, MC 5031, Chicago, IL 60637, USA e-mail: [email protected]
All research, whether conducted by surgeons or other clinicians or scientists, must conform to a set of principles that have been well-delineated since the Nuremberg Code was first written in 1946, in the wake of the atrocities committed against human beings conducted under the aegis of “human research.” Since this time, there have been several iterations of ethical codes of conduct for clinicians and scientists who perform human research including the Belmont and Helsinki Reports. All share the following common themes: research must be carried out with the appropriate design to answer the question at hand, hypotheses must be testable, risks to subjects must be minimized, and
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_ 19, © Springer-Verlag Berlin Heidelberg 2010
237
238
subjects must have undergone a rigorous informed consent process and indicate their understanding and agreement before being allowed to participate. The research enterprise is mainly overseen by Institutional Review Boards (IRBs) (or the equivalent), which then in turn, answer ultimately to a governmental body. These principles, written about in great detail elsewhere, must be upheld by every participant in the research arena, irrespective of specialty or area of interest. To summarize these basic points briefly here, all research must satisfy the following criteria: 1. Appropriate design, or according to Roy et al., “Scientific Adequacy” which is defined as a “respect for accepted scientific principles, knowledge of the natural history of the disease or problem under study, adequate preliminary laboratory and animal experimentation, and proper scientific and medical qualification of investigators.” Are the principles of “validity, generalizability, and efficiency” satisfied? [1] 2. Hypothesis generation. Is the question asked in a way that it can be answered according to the abovedelineated principles? 3. Risk minimization. Are patients adequately protected from undue risks? Are vulnerable populations protected? 4. Informed consent. Fulfilling this obligation is at the root of ethical scientific practice, which holds patient/subject autonomy in the highest possible regard. Therefore, all investigators have the responsibility to ensure full comprehension of all procedures the patient/subject will undergo as well as all the risks and benefits therein. Pawlik and Colletti discuss examples of scientific malfeasance and misconduct in order to emphasize the important, indeed essential, role that ethics plays in scientific discovery and research [2]. Please refer to their chapter for a detailed discussion. An interesting question is whether there is anything particular about the ethics of surgical research per se. Many studies conducted by surgeons do not actually involve the performance of an operation, and many new surgical procedures and techniques are developed outside the traditional research settings, which would ordinarily include protocol development and IRB approval and oversight. Some surgeons describe a unique relationship with their patients, which is often one reason given for why surgeons do not conduct many randomized controlled trials. Little has described characteristics of the surgical relationship, which, while not completely
A. G. Lehman and P. Angelos
unique to surgery, acquire a particular propinquity that is above and beyond other medical relationships. Those characteristics are Rescue (“a traumatic remedy in the face of a threat”), Proximity (“the surgeon will know things about the patient’s body that are hidden from the patient”), Ordeal (involving “pain, anesthesia, loss of autonomy…helplessness, dependence”), Aftermath (“procedures leave scars, both on and in the body, and in the psyche”), and Presence (surgeons must “demonstrate his or her commitment to a caring role by being there, aware of and responsive to the discomforts and fears attending illness and surgery”) [3]. Pearl Katz describes the personality traits of surgeons as exemplified by “action, heroism, certainty, optimism” [4], which are decidedly clinical vs. research-oriented characteristics. At a recent ethics conference at the University of Chicago’s MacLean Center, a remark was made during a presentation stating that surgery “is a highly moral activity because of the magnitude of its violence [5].” What, then, is surgical research, and what particular ethical issues does it raise? We propose that to be surgical research, a subject must undergo an operation or a procedure as part of the scientific and intellectual inquiry. Only in conjunction with this commission of “a very particular kind of physical trauma in order to achieve something good” [3] can we tease out unique and ethically challenging areas.
19.2 Randomized Controlled Trials in Surgery Criteria for conducting randomized controlled trials include: (1) Uncertainty about treatments or clinical equipoise; (2) Trial design that is able to capture differences among treatments; and (3) Protection of patients [1]. Surgical research, historically, has not included many randomized controlled trials (RCT), and among the RCTs that are in the surgical literature, very few of them actually involve an operation or an operative technique. RCTs are considered the gold standard measure among most biomedical researchers. Why, therefore, do surgical researchers avoid this study design? Several hypotheses have been written about over many years, which touch upon issues that are theoretical, methodologic, and practical in nature. Solomon and McLeod speculate that many surgeons are “lacking the necessary training, expertise and desire to perform RCTs” resulting from “inadequate
19 Ethical Issues in Surgical Research
funding from granting agencies, difficulties in securing patient consent or a lack of sufficient patient numbers” [6]. Bonchek discusses the difficulties of performing RCTs of procedures vs. medical therapies. He identifies the following problems that make RCTs logistically difficult: procedures evolve continuously, complications decrease over time, results depend upon individual surgeon performance, adequate placebo control may be impossible, and crossover from patient group is common [7]. It may be that standardization of even commonplace operations may be impossible due to operator preference and experience [8]. There has been much discussion on the ethics of performing placebo, or “sham,” operations. There are some who feel that all sham operations are always unethical [9], some who feel that sham operations are essential to the design of a valid RCT [10], and some who feel appropriate compromises may be made which allow for placebo operations to be conducted so long as certain criteria are met and are analyzed on a cases by case basis [11–13]. In the past, use of placebo operations has revealed the inefficacy of several operations that had been previously performed on many patients, such as internal mammary artery ligation and gastric freezing, although these studies would not conform to current standards of informed consent. Even so, this knowledge has saved thousands of patients from undergoing needless surgery, and being subjected to risk, pain, and complications, as well as unnecessary medical expense. There is clearly a value in determining the efficacy of certain operations; however, there are many logistical and ethical difficulties in designing a sham surgical arm. Namely, it is impossible to blind the surgeon to whether or not he or she has performed the operation, and participating surgeons would have to agree to have others care for the patient in the postoperative period – acceptance of which would require some cultural change on the part of surgeons. Additionally, the risks of a sham arm would have to fall within acceptable limits, which is more difficult to achieve as everyone undergoing surgery is, in the very least, subjected to the risks of bleeding, infection, and anesthesia. Emanuel and Miller defined “a middle ground” for determining the ethics of placebo controlled studies. They call for a two-tiered method to evaluate the need for placebo control: A placebo-controlled trial has a sound scientific rationale if the following criteria are met: there is a high placeboresponse rate; the condition is typically characterized by a waxing and waning course, frequent spontaneous
239 remissions, or both; and existing therapies are only partly effective or have serious side effects; or the low frequency of the condition means that an equivalence trial would have to be so large that it would reasonably prevent adequate enrollment and completion of the study [14].
Only after these conditions have been met, can an investigator go on to evaluate these further criteria: “Research participants in the placebo group should not be substantially more likely than those in the active group to die; to have irreversible morbidity or disability or to suffer other harm; to suffer reversible but serious harm; or to experience severe discomfort” [14]. Patient and surgeon preferences play an enormous role in the lack of popularity of RCTs for operations. Many people do not want to submit themselves to a random process of selection involving surgery or no surgery, for many complex reasons that may touch upon Little’s framework of the surgical relationship as described above. Patients select their surgeons based on a number of both objective and intangible reasons. Therefore, if patients must give up this autonomy of choice, accrual into trials may be inadequate. Moreover, patients often have a very strong preference for a type of operation, well before there is evidence to support the approach’s superiority. Frequently, patients will prefer a “minimally invasive” operation under the belief that “less invasive” means “safer” or “better.” This is strongly demonstrated by the adoption of laparoscopic techniques by practicing surgeons without clinical trials to support efficacy. Patients and surgeons alike felt that the benefits were obvious in comparing laparoscopic vs. open cholecystectomy [15] and therefore to submit to an open operation would present an undue and unnecessary burden. An immediate consequence of these attitudes was an increase in biliary injuries and other complications, which decreased only after the learning curve of the new procedure was surmounted. Unfortunately, not only patients, but also surgeons themselves are frequently convinced that a newer operation is better even when there is little data to support such a conclusion. As mentioned above, equipoise is an essential component that must be present when determining whether or not a procedure should be evaluated by means of an RCT. “Clinical Equipoise” has been defined as “[a] conflict in the opinions of members of the expert clinical community over which treatment is preferred; each side recognizes that there is evidence to support the opposing view” [1]. When new operations are quickly adopted, or when the demand for new and innovative procedures is high and
240
exerts pressure on practicing surgeons, the balance of clinical equipoise may be disturbed inappropriately. Unless new procedures are studied in RCTs before widespread adoption, it may become practically difficult – even impossible – to achieve surgeon and patient buy-in: since surgeons are accustomed to accepting less than Level 1 data to support clinical decision making (in part due to the absence of RCTs in the surgical literature), it does not take long before equipoise is shifted and a surgeon will be unlikely to contemplate enrolling patients in an RCT.
19.3 Innovation in Surgery Surgery, more than any other specialty, includes components of innovation and technical modification as part of its daily activity. This is in part due to the enormous differences that exist among patients including body-size, habitus, variant anatomy, as well as surgeon and patient preferences, product and/or equipment availability, and other common external forces that affect surgeons’ decision-making on a regular basis. Innovation also appeals to the element of the surgical character which Katz describes as “courageous, risktaking, decisive” [4]. These personality traits have produced quantum leaps forward in the surgical treatment of innumerable diseases, thereby helping and saving lives; unfortunately, these same traits have caused innumerable patients to be subjected to unknown risks, unknown benefits, and at worst, without their proper informed consent and agreement [16]. How should innovation then be regulated and what is the ethical responsibility of the surgeon-innovator when he or she invents new way of performing surgery? There are few paradigms that exist in the literature to help guide this question. In spite of this, we will attempt to create a framework in which to understand how a practicing surgeon may go about innovating in his or her practice. The categories can be understood in their most simple forms as: “Last Resort” procedures, Institutionally Regulated procedures, and Peer Regulated procedures.
19.3.1 What Is a “Last Resort” Innovation? This area of innovation can be both the least controversial, as well as occasionally, the most controversial,
A. G. Lehman and P. Angelos
depending on the circumstances surrounding the procedure. One example of surgical innovation in this category is the development of sleeve resections in pulmonary surgery. This technique was first introduced in 1947 (and reported in 1954) as an alternative approach for patients with centrally located lung tumors and other comorbidities which precluded performing the standard operation, i.e., a pneumonectomy [17]. This new procedure was performed on a subset of patients with a likely lethal problem – lung cancer – who were unable to satisfy the accepted criteria required to undergo definitive resection for their malignant tumors, and who were therefore prevented from undergoing treatment at all. The controversy surrounding the new technique of sleeve resection did not revolve around whether the procedure was technically feasible (although the operation is more technically challenging, and 50 years later the data do unequivocally show that there is a significant difference in complications between operators who have experience and those who do not). The issue was whether or not the operation should be considered an appropriate en bloc cancer resection. The prevailing opinion was that the sleeve resection was an inferior cancer operation. But in a subset of patients who had no alternative, and who would likely die or be rendered pulmonary cripples if subjected to the standard operation, this new technique was a “last resort.” Over several years, careful data collection revealed that, in the setting of appropriate staging, sleeve resections may also be appropriate for patients who did not have comorbidities that would preclude pneumonectomy, and further, patients undergoing sleeve resection, instead of pneumonectomy, had an improved quality of life. This innovation in technique occurred in a patient population that had no other recourse, and when proved safe and effective in that population, was able to be studied more rigorously in more expanded populations. There are of course instances of “last resort” innovations that are quite controversial, and they usually occur in moribund patients where the likelihood of benefit to the patient is infinitesimally small. However, as in the example of the development of sleeve resection, when patients are faced with a likelihood of death or severe morbidity with either no treatment or the standard treatment, the risk of undergoing a new procedure that has the potential to prolong length and quality of life may be balanced more reasonably as long as patients have been fully informed of the risks, benefits, and unknowns.
19 Ethical Issues in Surgical Research
19.3.2 What Is Institutionally Regulated Innovation? Institutionally regulated innovations are treated like any other research activity – innovations are publicly discussed and planned before they are performed, they are fully vetted by an IRB, and patients are consented under rigorous protocols. There are many surgeons who feel this is the only ethically appropriate way to engage in substantive surgical innovation. An example of this paradigm is the introduction of adult to child living liver donation at The University of Chicago [18], which represents one of the most rigorously regulated methods of introducing a surgical innovation. In this case, no surgical technique was being developed – surgeons had experience with inserting split cadaveric livers into pediatric patients, as well as performing anatomic liver resections of living patients with cancer, hemangiomas, or other problems requiring removal of a portion of the liver. The innovation involved uniting these techniques in a completely new way. And more controversially, it also involved performing a major operation on a person who was healthy and who would undergo risks without benefit of gains (at least not physical, health related ones). Not only was the procedure fully analyzed by the parent institution’s IRB, but the protocol was fully scrutinized by their Ethics Committee as well. Before the innovative operation had ever been performed, clinical guidelines had been formulated and published as a guide for actual practice. This method of surgical innovation provides the highest level of patient protection as well as transparency of clinical and research goals.
19.3.3 What Is Peer Regulated Innovation? Peer Regulated Innovation is a less formally regulated approach to innovation oversight. McKneally and Daar provide the most cogent framework for this paradigm, and capture a prevailing sentiment about surgical innovation: “Surgeons, accustomed to immediate adaptation to unexpected findings in their daily practice, seek a more nimble, flexible source of institutional and public oversight and approval” [19]. To answer this need, a “Task Force on Innovation” was created at the University
241
of Toronto, to provide an alternative opportunity for oversight and approval for surgeons who were interested in change and innovation, but were not yet certain that their ideas were ready to be examined under a formal IRB protocol. Please see Table 19.1 for a summary. This approach is an important middle ground, which may appeal to surgeon innovators while still providing adequate protection and oversight of their patient-subjects. In the contemporary era of quality improvement that has taken hold within the institution of Medicine, this paradigm provides an appealing framework. However, it is generally agreed upon by those working in academic settings that any new innovation or technique that is to be published, and hence promulgated, must undergo a more rigorous examination by a full institutional IRB. Therefore, the Peer Regulated model can be used as an intermediate step to test new techniques, approaches, and instruments not falling under device guidelines, which would help bolster equipoise and possibly provide evidence and rationale for an appropriate RCT comparing the new method with the standard one.
Table 19.1 Elements of the surgical innovation ethics paradigm (Adapted from [19]) The patient/subject must be informed of the innovative nature of the treatment The usual clinical informed consent is replaced with a more detailed and complete discussion of the potential risks of the innovative procedure The patient/subject must be given the option of choosing standard surgical care rather than the innovative procedure An Innovation Review Committee made up of the surgeonin-chief at the institution and two members of the relevant service must endorse the initiation of the innovation and a study of its effects An Innovation Task Force of relevant stakeholders at the institution should be informed of outcomes, adverse events, and the impact of the innovation on institutional resources and programs The Institutional Review Board (IRB) or equivalent research ethics board should review the consent process for the innovation and the consent form, but a full research protocol need not be reviewed by the IRB When the preliminary information about the innovation (such as identification of appropriate patients and stabilization of techniques) is available, a formal research proposal should be presented to the IRB to more fully study and validate the procedure
242
19.4 Conclusion The culture of medicine has changed dramatically since Henry Beecher first published his alarming findings about the lack of ethical conduct in many published studies [16]. The culture of medicine is still in the process of changing. Patients are demanding, rightly we believe, clear communication and information from their physicians and surgeons in standard clinical settings. Articles and books are published in the lay press to guide patients to ask probing questions, to check facts and inquire about the surgeon’s experience on a daily basis [20]. With the bar being raised in seemingly uncontroversial areas of accepted clinical care, an even higher standard of disclosure for engaging in non-standard and experimental practice is required. This is particularly true of the surgeon scientist, who engages in the most extreme version of “laying on hands” for the purposes of healing. There are some surgeons who lament these changes that inhibit their autonomy, and challenge the image of the surgical maverick, someone who is able to invent and act in an unregulated manner. Societal values have shifted away from accepting that role. Instead, we have accepted a new trade-off: increased ethical standards and patient protections, but a deceleration in the speed of knowledge acquisition and technique development. This does not have to stifle the research and innovation process, but to be successful, we as surgeons must adapt.
References 1. Roy DJ, Black PM, McPeek B et al (1998) Ethical principles in research. In: Troidl H, McKneally MF, Mulder DS, Wechsler AS, McPeek B, Spitzer WO (eds) Surgical research: basic principles and clinical practice, 3rd edn. Springer, New York, pp 581–604
A. G. Lehman and P. Angelos 2. Pawlik TM, Colletti L (2001) Ethics and surgical research. In: Souba WW, Wilmore DW (eds) Surgical research. Academic Press, San Diego, pp 1349–1360 3. Little JM (2001) Ethics in surgical practice. Br J Surg 88: 769–770 4. Katz P (1990) The scalpel’s edge. Allyn and Bacon, Boston 5. McKneally MF: Ethical problems in surgery: innovation leading to unforeseen complications. World J Surg 23:786– 788, 1999 6. Solomon MJ, McLeod RS (1998) Surgery and the randomised controlled trial: past, present and future. Med J Aust 169:380–383 7. Bonchek LI (1997) Randomised trials of new procedures: problems and pitfalls. Heart 78:535–536 8. McLeod RS, Wright JG, Solomon MJ et al (1996) Randomized controlled trials in surgery: issues and problems. Surgery 119:483–486 9 R Lefering and E Neugebauer, Problems of randomized controlled trials (RCT) in surgery. In: U Abel and A Koch, Editors, Nonrandomized comparative clinical studies, Symposium Publishing, Düsseldorf (1998), pp. 67–75 10. Flum DR (2006) Interpreting surgical trials with subjective outcomes: avoiding UnSPORTsmanlike conduct. JAMA 296:2483–2485 11. Angelos P (2003) Sham surgery in research: a surgeon’s view. Am J Bioeth 3:65–66 12. Clark PA (2003) Sham surgery: to cut or not to cut–that is the ethical dilemma. Am J Bioeth 3:66–68 13. Miller FG (2004) Sham surgery: an ethical analysis. Sci Eng Ethics 10:157–166 14. Emanuel EJ, Miller FG (2001) The ethics of placebo-controlled trials–a middle ground. N Engl J Med 345:915–919 15. Hunter JG (2001) Clinical trials and the development of laparoscopic surgery. Surg Endosc 15:1–3 16. Beecher HK (2001) Ethics and clinical research. 1966. Bull World Health Organ 79:367–372 17. Ferguson MK, Lehman AG (2003) Sleeve lobectomy or pneumonectomy: optimal management strategy using decision analysis techniques. Ann Thorac Surg 76:1782–1788 18. Singer PA, Siegler M, Lantos JD et al (1990) The ethical assessment of innovative therapies: liver transplantation using living donors. Theor Med 11:87–94 19. McKneally MF, Daar AS (2003) Introducing new technologies: protecting subjects of surgical innovation and research. World J Surg 27:930–934; discussion 934–935 20. Landro L (2008) The informed patient – learning to ask tough questions of your surgeon. Wall Street J. Available at: http://online.wsj.com/article/SB119983875885176351search.html
Principles and Methods in Qualitative Research
20
Roger Kneebone and Heather Fry
Contents 20.1
Introduction ............................................................ 243
20.1.1 What Is Qualitative Research? ................................. 243 20.2
Why Should Surgeons Know About Qualitative Research? ............................................ 244
20.2.1 Engaging with Key Literature .................................. 244 20.2.2 Conducting Qualitative Studies ................................ 245 20.3
Common Misconceptions....................................... 245
20.4
When Should Qualitative Research Not Be Used? ........................................................... 246
20.5
Characteristics of the Qualitative Research Process .................................................... 246
20.6
Qualitative Methods ............................................... 247
20.6.1 Additional Types of Study........................................ 247 20.7
Issues in Conducting Qualitative Research ......... 249
20.7.1 20.7.2 20.7.3 20.7.4
Objectivity, Subjectivity and Bias ............................ Achieving Rigour ..................................................... Ethical Permission, Risk and Ownership of Data .... Presentation and Writing Up ....................................
20.8
Conclusion............................................................... 251
249 250 250 250
References ........................................................................... 251 Further Reading ................................................................. 251 Appendix ............................................................................. 251
R. Kneebone () Department of Biosurgery and Surgical Technology, Chancellor’s Teaching Centre, 2nd Floor QEQM Wing, Imperial College London, St Mary’s Hospital, Praed Street, London W2 1NY, UK e-mail: [email protected]
Abstract In this chapter we describe the characteristics of qualitative research, considering why and when it might be used within surgical practice. We highlight similarities and differences between qualitative and quantitative approaches, exploring some common misconceptions. The characteristics of qualitative research are described; the key qualitative methods used in different types of enquiry into surgical practice are then outlined. Finally, we consider some practical issues around conducting and interpreting qualitative work, and give guidance to those wishing to find out more.
20.1 Introduction In this chapter we describe the characteristics of qualitative research, considering why and when it might be used within surgical practice. We highlight similarities and differences between qualitative and quantitative approaches, exploring some common misconceptions. We consider the characteristics of qualitative research, then outline the key qualitative methods used in three types of enquiry into surgical practice. Finally, we consider some practical issues around conducting and interpreting qualitative work, and give guidance to those wishing to find out more. Throughout the chapter, we highlight in bold those terms in qualitative research which may be unfamiliar. These are explained in Appendix, which also contains some terms not used in the text.
20.1.1 What Is Qualitative Research? Qualitative research focuses on individual people (exploring the reasons they behave as they do) and on
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_20, © Springer-Verlag Berlin Heidelberg 2010
243
244
specific contexts, interactions and processes. It investigates the meanings of events, as perceived by those affected by them. It asks questions such as “why” and “how”, rather than “how many” and “in what proportion”. In doing so, it uses words rather than numbers. It does not make claims of generalisability, although its findings may be transferable. This kind of research takes place in the real world rather than the laboratory, relying heavily on observation. It looks at individuals rather than populations and is about trying to understand and find the meaning behind people’s actions, situations and beliefs. Crucially, it remains open to what might be out there, rather than confining itself to specific questions and deliberately excluding confounding factors. This kind of research is especially helpful when there is no right or wrong answer, such as when the impact on patients of a new treatment is being explored or when a new health care policy is being introduced. When investigating rectal cancer, for example, a quantitative approach might ask “what is the 5 year survival in a given surgical population after resection of rectal cancer?”. A qualitative approach, however, might focus on a small number of individual patients, asking “how did this operation affect your life at home and at work?”. The differences between these two approaches profoundly affect how data are gathered and analysed, as will be considered later. Qualitative research does not usually start with a hypothesis, but with an open question which makes no presupposition as to where the answer may lie. “Why do patients prefer treatment X to treatment Y?”. “Why does communication sometimes break down in the operating theatre?”. “Why do trainees dislike a particular form of training, even though it has been shown to be effective?”. Often the research outline may not even be framed as a question but as a broad aim, such as “To understand why patients …”. Qualitative research will often generate theories, concepts and models. For those familiar with the deductive reasoning of quantitative research (starting with hypotheses, then testing to confirm or refute them), the (usually) inductive approach of qualitative research (starting with observations, then using these to generate theories and transferable conclusions) can seem alien, unsystematic and even unscientific. Our aim in this chapter is to dispel that impression.
R. Kneebone and H. Fry
20.2 Why Should Surgeons Know About Qualitative Research? 20.2.1 Engaging with Key Literature Traditionally, medicine has been dominated by the “harder” sciences, while qualitative enquiry has been the preserve of the social sciences and humanities. Now, however, qualitative methods are becoming established within surgery and the biomedical sciences, with many major journals publishing qualitative or “mixed methodology” studies. Surgical practice lies at the intersection of a widening group of disciplines which bear on patient care. Patient safety, education and health policy are having a profound effect on clinical practice and outcomes. Research in these fields is predominantly qualitative (exploring individual responses, collecting expert opinion, establishing consensus and making sense of widely differing viewpoints), so it is essential for surgeons to be able to engage critically with this literature and judge it for themselves. Vignette 1 briefly describes an important qualitative study into operating theatre practice. Vignette 1 Tensions in the operating theatre can exert a profound effect upon surgical practice and patient safety. Two excellent qualitative studies use systematic observation and interviews to build up a picture of team communication from different perspectives – surgeons, operating theatre nurses, anaesthetists and trainees [1, 2]. These studies locate their findings within a framework of existing theory, using a rigorous analysis of experimental data to generate further theory. Of course, there is good and bad qualitative research, just as there is good and bad quantitative research. But the critical skills required to judge qualitative work are quite different from the statistically oriented methods of biomedical research. An understanding of the principles of qualitative enquiry is essential in order to judge this literature on its own terms. In particular, unfamiliarity with the methods of qualitative research can get in the way of understanding the methodology (the conceptual framework and rationale within which the methods are used).
20
Principles and Methods in Qualitative Research
20.2.2 Conducting Qualitative Studies Some surgeons will wish to carry out qualitative research themselves, or be part of a team which does so. Others will be involved in conceiving and directing such research, but without being directly involved in performing it. Much will depend on the nature of the enquiry. Qualitative research can be especially useful in the preliminary phases of surgical research, generating theories from which hypotheses can be tested using a deductive approach. It can explore the meaning or reasoning behind quantitative data, e.g. finding out why patients dislike treatment X. Thus, surgeons may use a “mixed methodology” when formulating a research question, helping to identify which issues are key before narrowing the focus and designing quantitative studies addressing specific questions. Such an approach is especially useful when piloting ideas and testing preliminary insights. Qualitative research can also be used in its own right simply to generate better understanding of behaviour, a context or a process. For example, in the rapidly expanding domain of surgical education, a qualitative approach to an entire project may be appropriate. In all cases, an awareness of qualitative methods is essential for effective research design.
20.3 Common Misconceptions 1. Qualitative research is woolly, fuzzy and full of psychobabble Whilst this can be true of bad qualitative research, quite the reverse is true of high quality work. Such work has clear aims and methods, easily assimilable results and self-critical appraisal of strengths and limitations. 2. Qualitative research is not scientific – it is making it up as you go along Inductive reasoning is a well-established alternative to the more familiar deductive approach. At certain stages in research, it is the most logical and productive approach to adopt (e.g. when defining the research questions; when considering possible research designs). Indeed, frequently it is the only approach (e.g. when
245
trying to make sense of human responses to complex experiences or questions). Investigating quality of life after a new pouch operation for rectal excision, for instance, cannot be expressed only in terms of numbers. 3. Anyone can have a chat and write it up Well-conducted and lucidly presented, qualitative research can appear simple. Interviewing may seem a commonsense task, but in fact, good interviewing for research purposes is just as difficult as taking a skilled history. Amounts of data are often vast, requiring considerable skill and experience to manage effectively. Apparent simplicity conceals extensive analysis and thought. 4. Qualitative research is not rigorous Again, this should not be a criticism of qualitative research per se but of how it is carried out. Performed conscientiously, qualitative research is just as rigorous as any other approach – see section later in this chapter. 5. What can you tell from small numbers? It depends on what you are asking. Clearly, it is impossible to generalise to whole populations from ten to twelve interviews. But by selecting the right participants to ask or watch it may be possible to gather key information and gain important insights. This selectivity greatly increases the research’s investigative power, since it concentrates on the sources of information most likely to yield valuable results. By selecting those at the edges of the distribution curve as well as some in the middle, a purposive approach can deliberately include negative or extreme views. Indeed, through interviewing people opposed to the research’s main propositions, it may be possible to identify important new areas of enquiry within a project. 6. You can make quotations say whatever you want, if you are selecting them Of course, there is scope for dishonesty and slanted reporting. This applies equally to quantitative research and is to do with the research design and the integrity of the researcher. Trustworthiness lies at the heart of qualitative research. The qualities of the design (and the researchers within it) are crucially important. This must be made transparent to the reader, who will form
246
R. Kneebone and H. Fry
their own judgement about how much weight to attach to the researcher’s conclusions. How quotations are selected must be made clear in the methods, and any possible biases should be highlighted and explored. 7. Qualitative research is better (or worse) than quantitative research Over several decades, an unhelpful polarisation has emerged between qualitative and quantitative methodologies. Fortunately, this is now subsiding and researchers are recognising the need for methodological synergy. Obviously there is no single “right” methodology for carrying out research. Some questions or project phases will be better served by a qualitative approach, some by a quantitative and many will benefit from a combination of the two. Neither is inherently better or worse, and both can be conducted well or badly. But a clear understanding of the strengths and limitations of each is essential.
20.4 When Should Qualitative Research Not Be Used? Qualitative research is inappropriate in the following cases: 1. Once there is a clear question which can be better answered in numerical terms and analysed using statistical methods. 2. When you do not know how to do qualitative work and have not got the time, money or interest to find out or get someone to do it for you.
1.Define research questions or area
Fig. 20.1 Schematic representation of the process of qualitative research
2.Design study
20.5 Characteristics of the Qualitative Research Process Qualitative research uses words to build up a detailed picture of how people behave or respond. Questions are formulated and refined as the research progresses, modified by the data as it accumulates. Findings are generated inductively, emerging from the data as analysis proceeds. Data gathering and data interpretation often progress in parallel, and separation between these elements is much less clear cut than with quantitative research. Figure 20.1 indicates the typical stages of qualitative research. The first three stages and the last are similar to other research. Stage four (piloting) is crucially important, allowing research tools such as interview schedules to be tested and refined. But the biggest single difference lies in stages five (data collection) and six (data analysis and theory development). These two stages are closely interwoven, with a repeated to and fro relationship quite unlike that found in quantitative studies. For example, if interviews were to be used, the research plan would specify in detail how the interviews would take place (including sampling strategy for participants, indicative number of sessions, location and duration of interviews, consent). An interview schedule or topic guide of questions would be developed, trialled and modified. The research plan would set out in advance how the interviews would be recorded, transcribed and checked and the strategy for data analysis (coding, identification of emergent themes and triangulation of findings). All these elements would be piloted to identify unexpected problems and refine the process. All these stages would be determined beforehand.
3.Seek permission and access to subjects
4.Pilot/trial data collection & analysis
5.Collect data 6.Analyse data and develop models and theories
7.Write-up and publish results
20
Principles and Methods in Qualitative Research
However, such research demands considerable flexibility. As data collection gathers pace and interviews are carried out, concurrent analysis may identify additional questions or areas for exploration. When the analysis yields no new themes, saturation is reached (provided a range of views has been sampled). At this point, the interviewing is likely to end, and the final number of interviews conducted may be greater or less than the original estimation. This part of the process must also be written up and explained in any published output. Although the results will depend on the outcomes of the interviews and their analysis, the process is clearly defined and available for external scrutiny. It is this transparency that underpins a piece of work’s rigour and defensibility. This process requires a distinct set of methods and techniques. The following section summarises the most important ones.
20.6 Qualitative Methods A short chapter can neither attempt to describe qualitative data collection and analysis in detail, nor deal with all the issues involved. Instead, we highlight common and important principles and methods. Recommended texts for further reading are cited in the Bibliography, although none makes much reference to the surgical context. A difficulty for the newcomer is to know which methods of collection and analysis to use in given circumstances. Factors governing selection include the nature of questions being asked; resources that are available; likely access to study participants, contexts or documents; the underlying philosophy of the researchers and the focus or intention of the study. Rather than simply provide a list of available methods, we have selected three areas in which qualitative research may be used effectively in surgical practice, indicating for each how the research might proceed and what methods might be used. The three areas are: (a) Exploring people’s responses (their perceptions, reactions and opinions) (b) Understanding a process (c) Analysing documents Each of these areas is presented as a vertical column in Table 20.1. The horizontal rows in Table 20.1 summarise
247
a number of key points about carrying out research, for example, outlining which methods may be used. The table can, thus, be read vertically to follow through researching in each of the three areas. Reading each row across horizontally, one can see how a stage or aspect of the research process may vary according to the area under investigation. Vignette 2 provides an example of a study in which use and selection of appropriate qualitative methods and approaches were vital to address the research question. Vignette 2 One of this chapter’s authors (RLK) is investigating the stress experienced by surgeons during operations. Relatively little is known about this area, so an initial study was conducted with a small number (16) of surgeons, deliberately selected to sample a wide range of views and experience levels. Semi-structured interviews were conducted by a non-clinical researcher to identify and explore factors which surgeons perceived as stressful. Analysis of interview transcripts allowed these factors to be categorised, generating a framework for further discussion and research [3].
20.6.1 Additional Types of Study The following types of research study are also worth noting: 1. Action research Action research is a form of enquiry where the researcher investigates his or her own practice through a cyclical process of change and evaluation, aiming to bring about improvement. A range of investigative methods may be used. The process is inevitably subjective, focusing on the researcher themselves. Groups (e.g. a whole surgical team) can carry out action research collaboratively, often with the help of a facilitator/researcher. 2. Evaluation studies Evaluation studies set out to ascertain the impact of something. Such studies typically use “mixed methods”, combining qualitative and quantitative research methods. Unlike action research, an evaluation will be
248
R. Kneebone and H. Fry
Table 20.1 Using qualitative methods in three different contexts relevant to surgical practice Intention Collecting people’s responses Understanding a process
Analysing documents
Example of Context/ research question
How did the operation affect your life at home and work?
What are the underlying When and how does trends in government thinking communication in the operating theatre break down? about surgical training, as shown through Ministers’ speeches and Department of Health documents over the past decade? What can patient “notes” tell us about the clinical reasoning process?
Typical permissions required
Ethical permission for research Negotiating access (to participants) Obtaining informed consent from individuals to participate
Ethical permission for research Negotiating access (to participants) Obtaining informed consent from individuals to participate
Ethical permission for research Negotiating access (to documents)
Typical “data processing” Recording (audio, film and/or requirements written notes) Verbatim transcription of recorded speech Rendering data anonymous; making any individual unidentifiable in records or publications
Filming, audio recording, field notes Verbatim transcription of recorded speech Rendering data /records anonymous where possible
Storage and copying of documents Ensuring anonymity may also be an issue
Typical data collection methods
Continuous observation: may include filming, audio recording, written field notes by observer; participant observation
Collection of material from web, library or other paper or electronic repository
Typical data analysis methods
Presentation of results
Individual: interviews (structured, semi-structured or unstructured) Questionnaires using Likert scales or free form response Accessing group norms: interviews, including focus groups Consensus development methods: e.g. nominal group technique, Delphi technique
Critical incident analysis Discontinuous observation, e.g. every 10 min Debriefing/observation of de-briefing
Interpretation in parallel with data Interpretation in parallel with collection (an iterative process) data collection, and until until saturation is reached saturation point is reached Identification of emergent categories/themes Grounded theory Use of qualitative analysis software (where relevant) Development of conceptual models
Discourse and content analysis Analysis of interactions/ movements/body language Analysis of relationships Use of qualitative analysis software (where relevant) Development of conceptual models
Presentation of themes, supported by quotations from participants Presentation of models and theories developed during the study
Presentation of themes, supported by quotations from participants Consideration of who is included and who is marginalised
Content analysis Use of qualitative analysis software
Presentation of themes, supported by quotations from documents Consideration of bias of authors
(continued )
20
Principles and Methods in Qualitative Research
Table 20.1 (continued) Intention
Achieving rigour in the study
249
Collecting people’s responses
Understanding a process
Analysing documents
Links made to pre-existing explanatory theories, models and discussions Declaration of relationship and possible bias of the author in relation to the study
Consideration of power relationships. Links made to pre-existing explanatory theories, models and discussions Presentation of models and developed during the study Declaration of relationship and possible bias of the author in relation to the study
Links made to pre-existing explanatory theories, models and discussions Presentation of models and theories developed during the study Declaration of relationship and possible bias of the author in relation to the study
Care in selecting study participants Address nature of relationship between interviewer and interviewee/s Piloting Combination with other (qualitative) methods Triangulation of methods or during analysis Enabling the analysis to be audited/verified
Care in selecting study participants/contexts Address the impact of observation/observer on participants Piloting Combination with other (qualitative) methods Triangulation of methods or during analysis Enabling the analysis to be audited/verified
Care in selection of documents Combination with other (qualitative) methods Triangulation of methods or during analysis Enabling the analysis to be audited/verified
conducted by a researcher who is not a subject of the study. 3. 3 Qualitative case studies These focus on a small area (e.g. a single operating team) in depth. Case studies typically collect data from more than one source and use some form of triangulation in their analysis. Although qualitative research does not aim for generalisability, such studies may shed light on important areas of practice, helping others in similar or different circumstances to better understand or investigate their own context.
20.7 Issues in Conducting Qualitative Research 20.7.1 Objectivity, Subjectivity and Bias We have already mentioned that qualitative research does not claim objectivity. Indeed, for a researcher to do so in a qualitative study is almost certainly a sign that they have not understood the methodology and are
using it inappropriately. Qualitative materials capture perceptions and views of individuals; these are not objective data. Analysis depends upon an individual researcher or researchers generating categories for interpretation; this is not an objective process. “Insider” studies, for example, can be very valuable, but they cannot be “objective”; their strength lies in their insight and informed perspective. Spurious objectivity or reliance on “numbers of things” misunderstands what qualitative research has to offer and often weakens it. Of course, subjectivity has its dangers. As shown above, qualitative data usually involve large volumes of material (e.g. interview transcripts, descriptions of observations or existing documents). Qualitative researchers must exercise selectivity throughout a project. From establishing which data to collect and record (e.g. what behaviours to watch; which questions to ask and when) to deciding how to frame and present it, the process demands constant decision-making. To some extent, this also applies to quantitative research. But there the researcher as a person is much less prominent, as the conventions of the genre make him or her invisible. It is tempting, but dangerous, to infer from this that he or she is not there.
250
The concept of “reflexivity” acknowledges the role played by the researcher in qualitative work. Good qualitative work recognises that perspectives (and sometimes bias) are integral components of human-centred research. In the end, the reader has to make up their own mind about the trustworthiness and usefulness of the work, based on the evidence placed before them.
20.7.2 Achieving Rigour To reinforce some points mentioned earlier, rigour in qualitative research is achieved by: 1. Careful study design, with the selection of appropriate methods of data collection and analysis (for example, failure to record interviews will reduce the opportunity to use emergent themes or data saturation and develop grounded theory). 2. Explicit declaration of the “position” of the researcher(s). 3. Triangulation of methods and findings during analysis. 4. Critical self-analysis of strengths and weaknesses of the study. 5. Clear exposition of the processes used for data collection and analysis (e.g. how categories of analysis were reached), with sufficient evidence to support assertions and conclusions. This allows theories and conceptual models generated to be tested by other researchers in the academic community, even though directly “reproducing” a qualitative study is not possible. Studies lacking sufficient detail to verify that an appropriate process was followed should be questioned.
R. Kneebone and H. Fry
how the data will be kept secure and anonymous. Invitations to participate should always include a reasonable amount of information about the study; ethical permission will usually stipulate that signed permissions should be obtained. Occasionally, situations may arise where this is not possible; in that case, it is imperative that all subjects know that research is being conducted and that they have the right to withdraw from participation. Consideration also needs to be given to any emotional disturbance the research may inadvertently generate in respondents (e.g. asking patients who have undergone failed surgical procedures to describe their experiences), and suitable measures be in place to offer support. Potential risk to researchers must also be considered. Although Heath and Safety issues in qualitative studies are a less obvious issue than with laboratory or clinical work, they are no less important. A non-clinical researcher undertaking observational work in the operating theatre, for example, will need careful induction and training to ensure that risks are minimised. Important issues arise over the ownership and security of data. Interview material is generated by individuals and should be used with sensitivity. It is common practice to offer participants a copy of the transcript of their interview, allowing any possible misrepresentations to be corrected. Later in the process, interviewees can be asked to what extent they recognise their own views in the researcher’s analysis (usually presented as concepts and categories from multiple interviews). This process also provides a form of triangulation. Data must be kept secure and be rendered anonymous, by using codes rather than real names. Data may be required to be destroyed after a specified interval; anonymised data may sometimes be placed in archives for future research use.
20.7.3 Ethical Permission, Risk and Ownership of Data
20.7.4 Presentation and Writing Up
Permission for research must always be obtained, both from institutions and individual participants. This will typically involve approval by an ethics procedure. For prospective participants, information and transparency are key issues. Subjects need to know the following: enough about the study to make an informed decision to participate, how the data will be used and
Qualitative research has its own conventions of presentation. These do not follow the “scientific” model of writing a paper, with its condensed and stylised format of literature review, methods, results, discussion and conclusion. Descriptions of qualitative work are usually much longer, starting with an extended introduction which
20
Principles and Methods in Qualitative Research
sets the work in context and summarises existing theory and evidence. Detailed sections on methods and methodology will describe the researchers’ approach to analysis, declaring their position in relation to the study subjects. Results will often quote extensively from participants, using verbatim material to typify each emergent theme or category and show nuances of opinion. Data may be interwoven with a discussion of relevant literature and theories, and new theory or conceptual models generated by the study may be developed. The limitations of the research should always be considered, using a self-critical stance. Implications and conclusions from the results, possible future research and speculation about findings often form the last section. References are usually presented in Harvard format (author name and publication date in brackets within the text and an alphabetical reference list at the end), rather than the more familiar Vancouver style (superscript numerals and a reference list in citation order). Sometimes a bibliography (a list of publications relevant to the topic, but not attached to individual sentences as in a reference list) may be provided.
20.8 Conclusion By exploring the responses of individual people, qualitative methods can give a deep understanding of important issues affecting surgical practice. In order to make full use of what qualitative research can offer, however, it is essential to understand its philosophy and methodologies. This chapter has summarised some key principles and methods, highlighting their strengths and limitations. Our aim has been to provide an introduction for those involved in surgical practice to use when reading, evaluating and conducting qualitative research, and to provide pointers for further exploration.
251 3. Wetzel CM, Kneebone RL, Woloshynowych M et al (2006) The effects of stress on surgical performance. Am J Surg 191:5–10 4. Glaser B, Strauss A (1967) The discovery of grounded theory. Aldine, Chicago (a classic work that describes the use of grounded theory)
Further Reading 1. Bosk CL (2003) Forgive and remember: managing medical failure. University of Chicago Press, Chicago (a good book to read to obtain a sense of what qualitative research can offer) 2. Lewins A, Silver C (2007) Using software in qualitative research: a step-by-step guide. Sage, London (good guide to using a range of software packages that also includes a helpful consideration of types of analysis to use) 3. Miles MB, Huberman AM (1994) Qualitative data analysis. An expanded sourcebook. Sage, Thousand oaks (a widely respected text on qualitative analysis) 4. Pope C, Mays N (2006) Qualitative research in health care. BMJ, London (an excellent brief compendium of methods that illustrates usage in health related research) 5. Robson C (2002) Real world research, 2nd edn. Blackwell, Oxford (wide ranging and detailed “how to do it” text) 6. Seale C, Gobo G, Gubrium JF et al (2004) Qualitative research practice. Sage, London (wide ranging and detailed “how to do it” text) 7. Somekh B, Lewin C (2005) Research methods in the social sciences. Sage, London (wide ranging and detailed “how to do it” text) 8. World wide web site1 (2007) Available at: http://caqdas.soc. surrey.ac.uk (a useful site for keeping up to date with computer assisted qualitative data analysis) 9. World wide web site2 (2007) Available at: http://www.bera. ac.uk/publications/guides.php (British Educational Research Association ethical guidelines on educational research. A useful resource when planning educational research)
Appendix Common Terms in Qualitative Research
References 1. Espin S, Levinson W, Regehr G et al (2006) Error or “act of God”? A study of patients’ and operating room team members’ perceptions of error definition, reporting, and disclosure. Surgery 139:6–14 2. Lingard L, Garwood S, Poenaru D (2004) Tensions influencing operating room team function: does institutional context make a difference? Med Educ 38:691–699
Action research: enquiry which focuses on the researcher’s own practice with the intention of improving it, using a repeating cycle of observation, change and evaluation. Anonymity: No individual or organisation should be identified when research is published. Records should be coded without personal names appearing and with
252
the identification “key” kept separately. Occasionally, anonymity cannot be achieved or may not be desirable; in such cases, permission must be obtained. Audio recording of speech: this is highly recommended, as it provides source material for verbatim transcription and analysis and allows triangulation by independent researchers. Researchers’ summary notes of interviews cannot satisfactorily capture detail and nuance, and are at risk of bias and selective reporting. Coding: assigning categories or themes to elements of text (e.g. transcriptions). In studies using grounded theory, these codes will be emergent (arising from the data). Content analysis: originally developed for documentary analysis, but now used more widely. Combines textual analysis with consideration of the purpose of the document, the intention of the writer and the wider social and political context in which it is situated. May count the frequency of certain words or phrases. Critical incidents: occurrences which have meaning or value to an individual (“incidents” may be routine or abnormal). Researchers may ask about critical incidents during interview; study participants may record them in a log for later debriefing; participant observers may witness them and ask about them later. Debriefing: an expert facilitator assists individuals or groups to review and reflect on an episode (e.g. an operation), with the aim of learning about participants’ actions and whether any changes are possible or desirable. Emotional reactions may be addressed. Delphi technique: method of achieving a broad consensus from experts (e.g. on the content of a core surgical curriculum). Individuals from an informed group are asked to rank statements and/or comment on a series of questions. Each individual does this independently (by post or email), then rankings and comments are collated by the researcher. The process is repeated in further rounds, leading to progressive refinement until consensus is reached. Diaries and reflective logs: study participants (subjects) keep records relevant to the research, and the researcher uses these as data for analysis. Discourse analysis: analyse language (written or spoken) in detail, especially from the perspective of its social context and meaning. Emergent categories/themes: deriving themes from source material (e.g. a transcribed interview) and
R. Kneebone and H. Fry
classifying or coding data according to what is found there. This contrasts with analysing against preconceived ideas (headings) into which responses might be grouped or counted. Ethnography: a term derived from anthropology and relating to long-term “submersion” with the participants in a study, often involving both observation and interview. Ethnographers analyse organisations or small groups as “cultures”, observing relationships between their members. Field notes: a term derived from ethnography. Field notes are the written or audio records the researcher keeps of what they see whilst researching. Such notes are selective and reflect the researcher’s thoughts, feelings and interpretations (unlike recordings and verbatim transcripts of interviews). Focus groups: guided discussion in groups of 4–8 with a moderator (researcher) who introduces the topic of interest and keeps discussion running, without asking leading questions. Grounded theory: A term invented by Glaser and Strauss [4] to describe the painstaking generation of theories from repeated examinations of the data itself, using actual phenomena (either directly observed or referred to during interviews), rather than pre-conceived concepts of the researcher. Typically used with iterative studies and triangulation by participants. Insider research: used to convey a sense that the researcher has attempted to “get inside” the values and perspectives of those being researched. Used with participant observation and ethnographic studies. Interview schedule: a list of questions to be asked during interview. Iterative data collection and analysis: initial collection of some data, immediately followed by provisional analysis which informs the subsequent data collection, (e.g. might lead to the asking of additional questions in an interview study). This repetitive process is often continuous throughout the project’s data collection phase. Likert scales: Invented in the 1930s by Likert as a way of eliciting opinion and perception through questionnaires. Relies upon statistical analysis, and therefore, not always considered a qualitative research method. Participants rate numerous positively and negatively framed statements, typically on a fivepoint scale (e.g. from “strongly agree” to “strongly disagree”), producing ordinal data. Scales (a number of statements around a theme) undergo statistical
20
Principles and Methods in Qualitative Research
testing with the aim of achieving internal consistency and differentiating between the views of individuals. Model building: creating conceptual models to explain, describe or show relationships within the data, e.g. a schematic representation of the communication in the operating theatre that attempts to capture the main initiators of oral exchanges. Descriptive models may help the researcher to generate insights about patterns of behaviour or ideas. Nominal Group Technique (NGT): uses face to face discussion (unlike Delphi technique) to generate ideas from groups of interested /involved individuals. Ideas are generated by participants independently, then collected without discussion and written up for the group to see. This avoids dominant members influencing others. There is then discussion to clarify ideas, strike out overlapping items and for participants to make any revisions to their own contribution in the light of the contributions of others. The process may stop at this stage or proceed to voting or ranking to identify items with the most support. Participant observation: An observer who does not seek to remain totally aloof from the subject of study. By becoming accepted by the group or individual being studied, the observer is able to “blend in”. The intention is to avoid artificial alteration of what is happening because of the presence of someone perceived as an outsider. Relates to ethnography. Piloting: trialling a process before its full use, to identify strengths and limitations. A “dry run” before conducting a series of interviews, for example, can test if the questions are easy to understand, if any are ambiguous, and how long the interview will take. A key stage in qualitative research. Purposive sampling: seeking out data sources because of their capability to shed light on the research question, e.g. selecting people with views or experiences that might challenge or confirm an emerging theory. Unlike random sampling, this does not aim at generalisable conclusions, but at gaining insight into key aspects of the research question. Qualitative analysis software: computer programs can simplify the manipulation of interview transcripts and other written data held electronically. Although the researcher still has to make all decisions about coding categories, software can help with tagging, organising and presenting data. This is especially useful with large datasets. Proprietary programs include NVivo, NUD*IST and Atlas.ti.
253
Questionnaires: useful for gathering data in a structured format, especially when conducting surveys. Questionnaires can collect several types of data; not all will be amenable to qualitative analysis. Freeform written responses may be analysed using content analysis. Questionnaire design is critical to success and is more difficult than it appears. Reflexivity (reflexive): awareness by the researcher of the ways their own background, interests, beliefs and identity may influence the research. The central role of the researcher as a person is key to conducting and interpreting qualitative research. Saturation: used in connection with iterative studies. Describes the point where analysis of new data is not yielding any new themes or insights. This is the point at which data collection stops. Semi-structured interviews: although major themes are specified in advance (using an interview schedule or topic guide), the researcher is free to vary the order in which questions are asked, to follow up points with supplementary questions and to omit areas if appropriate. Probably the most commonly used form of interview. Transcription: putting speech into writing, e.g. everything said by participants during interview (individuals or groups). Uses repeated playback of audiotapes and other oral material to create an accurate verbatim written record for analysis. Conventions exist for denoting pauses, hesitations, laughter etc. Transcription is extremely time-consuming, but is crucial in creating an accurate record for analysis and ensuring separation between source material (what participants actually said) from the researcher’s interpretation of that material. Verbatim quotations from transcripts are adduced as evidence during presentation of qualitative work. Triangulation: using a combination of methods or perspectives to ensure that a single source or interpretation does not unduly influence the analysis, thereby increasing rigour. Data obtained by one method (e.g. interview) may be triangulated against data obtained by a different method (e.g. observation) to see if both yield consistent themes. Emergent themes may also be triangulated with different researchers or study participants to test the level of agreement or disagreement. Topic guide: a list of topics to be asked about during interview (often used interchangeably with interview schedule).
Safety in Surgery
21
Charles Vincent and Krishna Moorthy
Contents 21.1
Introduction ............................................................ 255
21.2
Methods of Studying Errors and Adverse Outcomes .......................................... 256
21.3
Safety in Health Care: The Scale of the Problem......................................................... 256
21.4
Understanding Errors and Adverse Outcomes ... 258
21.5
The Person and the System ................................... 259
21.6
Systems Factors and Patient Safety ...................... 260
21.7
Understanding How Things Go Wrong ............... 260
21.7.1 Incident Analysis ...................................................... 260 21.7.2 Human Reliability Analysis Techniques .................. 261 21.8
Understanding Surgical Error and Surgical Outcomes .......................................... 262
21.8.1 Studies of Closed Claims in Surgery........................ 262 21.8.2 Observational Studies of Success and Failure and Studies of Communication ............. 263 21.9
Abstract In this chapter we first provide an overview of studies of errors and adverse outcomes in surgery. We also provide a brief summary of approaches to human error and systems thinking, which we contend could enhance current ways of understanding surgical outcomes, both in the analysis of individual cases and the broader understanding of the determinants of good and poor surgical outcomes. Finally, we provide some illustrations of interventions to improve the safety of surgery and the directions we see for the future. We consider that surgery has much to gain from embracing patient safety and drawing on the understanding and techniques that have been developed in health care and a variety of other industries; equally, we believe that patient safety could benefit from wider exposure to methods used in surgical research, in particular the attention given to the monitoring and constant surveillance of outcome and morbidity data characteristic of the best units.
The Next Steps: Improving Surgical Safety ......... 264
21.9.1 Reporting and Analysis of Incidents ........................ 21.9.2 Standardization of Clinical Processes: Guidelines and Protocols ............................................................ 21.9.3 Information Technology ........................................... 21.9.4 Improving Communication: Checklists and Briefing .............................................................. 21.9.5 Individual Attitudes and Behaviors .......................... 21.9.6 Simulation-Based Training ...................................... 21.9.7 Improving the Safety Culture ................................... 21.9.8 The Patient’s Role in Patient Safety .........................
264
21.1 Introduction 264 265 265 266 266 267 267
References ........................................................................... 268
K. Moorthy () The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
Patient safety can, at its simplest, be defined as: The avoidance, prevention, and amelioration of adverse outcomes or injuries stemming from the process of health care. Those involved with patient safety are often also concerned with other quality of care issues and it is not easy, perhaps not possible or even desirable, to draw a sharp line between patient safety and related activities such as risk management and quality assurance. Surgeons have of course always, from the very beginnings of the profession, wanted to achieve the highest standards of quality and safety for their patients. Until recently, however, this responsibility rested largely with the individual surgeon. The occurrence of complications
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_ 21, © Springer-Verlag Berlin Heidelberg 2010
255
256
and adverse outcomes was seen as due to patient factors such as comorbidities and ASA status and the skill, or failures, of the individual surgeon. Patient safety, however, views adverse outcomes in a much broader context, as a product of a system of care in which the actions of individuals play a critical part, but in which they are strongly influenced by the team, their working conditions, and the wider organizational context. While these factors are addressed in surgery, for instance, in discussions of the validity and meaning of volume/outcome relationships, they have not yet been integrated into a more systematic approach to the understanding and improvement of surgical safety and quality. However, in recent years, partly through the efforts of a few surgical pioneers and partly as a result of pressure from hospital risk managers and the public, surgeons, through morbidity and mortality meetings, have come together to analyze and share their adverse events with an emphasis on learning from the experiences and errors of others. Patient safety has gradually emerged as a distinct initiative within surgery, health care organizations, and regulatory bodies. In this chapter we first provide an overview of studies of errors and adverse outcomes in surgery. We then provide a brief summary of approaches to human error and systems thinking, which we contend could enhance current ways of understanding outcomes, both in the analysis of individual cases and the broader understanding of the determinants of good and poor surgical outcomes. Finally, we provide some illustrations of interventions to improve the safety of surgery and the directions we see for the future. We consider that surgery has much to gain from embracing patient safety and drawing on the understanding and techniques that have been developed in health care and a variety of other industries; equally, we believe that patient safety could benefit from wider exposure to methods used in surgical research, in particular the attention given to the monitoring and constant surveillance of outcome and morbidity data characteristic of the best units.
21.2 Methods of Studying Errors and Adverse Outcomes The number of studies assessing the incidence of error and harm has increased exponentially in the last few years and, while we cannot hope to cover them all, it is now possible to gain an understanding of the overall
C. Vincent and K. Moorthy
scale of the problem. There are a number of methods of studying errors and adverse events, each of which has evolved over time and been adapted to different contexts. Each of the methods has particular strengths and advantages, and also weaknesses and limitations. Table 21.1 summarizes some of the methods and their relative advantages and limitations. Methods vary in several respects. For instance, the various methods rely on different sources of data, medical records, observations, claims data, voluntary reports, and so on. Some focus on single cases or small numbers of cases with particular characteristics, such as claims, while others attempt to randomly sample a defined population. Some methods are oriented toward detecting incidence (how many) of errors and adverse events, while others address their causes and contributory factors (why things go wrong). There is no perfect way of estimating the incidence of adverse events or of errors. For various reasons, all of them give a partial picture. Record review is comprehensive and systematic, but by definition, is restricted to matters noted in the medical record. Reporting systems are strongly dependent on the willingness of staff to report and are a very imperfect reflection of the underlying rate of errors or adverse events.
21.3 Safety in Health Care: The Scale of the Problem Adverse events are defined as an unintended injury caused by medical management rather than the disease process. The definition, in surgical terms, seems equivalent to a complication, although a much wider class of events (in fact any that can cause patient harm) would be included. Note that adverse events are not necessarily regarded as preventable. It is important to note that separate judgments of preventability are made in all major studies. Thus, all complications would be recorded as adverse events, and a separate assessment is made of preventability. Retrospective reviews of medical records aim to assess the nature, incidence, and economic impact of adverse events, and to provide some information on their causes. The basic record review process is as follows. In phase I, nurses or experienced record clerks are trained to identify case records that satisfy one or more welldefined screening criteria – such as death, transfer to a special care unit, or re-admission to hospital within 12
21 Safety in Surgery
257
Table 21.1 Methods of studying errors and adverse (Adapted from Thomas and Peterson [1]) Study method Advantages Disadvantages Morbidity and mortality conferences and autopsy
Can suggest contributory factors
Hindsight bias
Familiar to health care providers
Reporting bias Focused on diagnostic errors Infrequently used
Case analysis/root cause analysis
Can suggest contributory
Hindsight bias
Structured systems approach
Tends to focus on severe events
Includes recent data from interviews
Insufficiently standardized in practice
Claims analysis
Provides multiple perspectives (patients, providers, lawyers)
Hindsight bias Reporting bias Nonstandardized source of data
Error reporting systems
Provide multiple perspectives over time
Reporting bias
Can be a part of routine operations
Hindsight bias
Uses readily available data Inexpensive
May rely upon incomplete and inaccurate data
Administrative data analysis
The data are divorced from clinical context Record review/chart review
Uses readily available data Commonly used
Judgments about adverse events not reliable Medical records are incomplete Hindsight bias
Review of electronic medical record
Observation of patient care
Inexpensive after initial investment Monitors in real time Integrates multiple data sources
Susceptible to programing and/or data entry errors
Potentially accurate and precise
Time-consuming and expensive
Provides data otherwise unavailable
Difficult to train reliable observers
Detects more active errors than other methods
Potential concerns about confidentiality
Expensive to implement
Possible to be overwhelmed with information Active clinical surveillance
Potentially accurate and precise for adverse events Time-consuming and expensive
months. These have been shown to be associated with an increased likelihood of an adverse event [2]. In phase II, trained doctors analyze positively-screened records in detail to determine whether or not they contain evidence of an adverse event using a standard set of questions. The basic method has been followed in all the major national studies, though modifications of the review form and data capture have been developed [3]. Studies in a number of different countries since the 1980s have suggested that between 4 and 16% of
patients admitted to hospital suffer an adverse event. More recent estimates consistently fall between 8 and 12%, with about half being judged preventable [4]. A significant percentage of adverse events are associated with a surgical procedure. For instance, in the Utah Colorado Medical Practice Study, the annual incidence rate of adverse events among hospitalized patients who received an operation was 3.0%, of which half were preventable. Some operations, such as extremity bypass graft, abdominal aortic aneurysm repair, and
258
colon resection, were at particularly high risk of preventable adverse events [5]. In the UK, complication rates for some of the major operations are 20–25%, with an acceptable mortality of 5–10% [6]. However, it is worth mentioning that at least 30–50% of major complications occurring in patients undergoing general surgical procedures are thought to be avoidable [7]. Many adverse events classified as operative are, on closer examination, found to be due to problems in ward management rather than intraoperative care. For instance, Neale et al. [8] identified preventable pressure sores, chest infections, falls, poor care of urethral catheters in their study of adverse events, together with a variety of problems with the administration of drugs and intravenous fluids. Retrospective review, like any other research method, has its limitations and the findings of the studies have to be interpreted with due regard to the methodological limitations; Neale and Woloshynowych [2] pointed out, for instance, that the review process relies heavily on the implicit judgments of doctors. Great efforts have been made to strengthen the accuracy and reproducibility of these judgments by training, by the use of structured data collection, by duplicate review with re-review, and by resolution of disagreements; however, the reliability of judgments of adverse events, particularly in relation to preventability, remains a difficult issue. Nevertheless, the consistency of the findings across many countries and many different review teams is impressive; future studies should move toward more precise specification of different types of adverse outcomes, which should improve reliability of detection. While the major record reviews have been enormously important in drawing attention to patient safety issues, future studies may move to the examination of defined adverse events in a prospective manner. For instance, in Canada, Wanzel et al. [9] prospectively monitored the presence and documentation of complications for all 192 patients admitted over a 2 month period to a general surgical ward. 75 (39%) patients suffered a total of 144 complications, 2 of which were fatal, 10 life threatening, and 90 of moderate severity. Almost all the complications were documented in the patient’s notes, but two-thirds of them were not documented on the front sheet of the patient’s final medical record, and only 20% were reviewed at the weekly morbidity and mortality rounds. Nearly one-fifth of the complications were due, in part, to error.
C. Vincent and K. Moorthy
21.4 Understanding Errors and Adverse Outcomes Patient safety is sometimes equated with preventing error. This seems innocent enough, but is a potentially limiting assumption in that patient harm can stem from factors other than error. For instance, hospital-acquired infections, surgical site infections, and postoperative chest complications all cause harm to patients, but may or may not involve errors on the part of individuals. With this proviso, however, the study of error in medicine has brought important new perspectives and understanding to bear on the broader issue of improving the safety and quality of care. In a seminal paper, still widely cited, Lucian Leape [10] drew on the psychology of error and performance, particularly the work of Jens Rasmussen and James Reason, and applied this to the practice of medicine. Leape argued that errors are often beyond the individual’s conscious control; that they are precipitated by a wide range of factors, which are often also beyond the individual’s control; systems that rely on error-free performance are doomed to failure, as are reactive attempts to error prevention that rely on discipline and training. He went on to argue that if physicians, nurses, pharmacists, and administrators were to succeed in reducing errors in hospital care, they would need to fundamentally change the way they think about errors. Leape explicitly stated that the solutions to the problem of medical error did not primarily lie within medicine, but in the disciplines of psychology and human factors, and set out proposals for error reduction that acknowledged human limitations and fallibility and relied more on changing the conditions of work than on training. Errors can be defined and classified in a number of different ways and from a number of different perspectives, and there is no ideal classification which covers all eventualities [4]. From a psychological perspective, an error is the failure of a planned action to achieve its desired goal [11]. Reason classifies errors as slips and lapses (those that are errors of action) and mistakes (those that errors of knowledge and planning). Reason also discusses violations that, as distinct from error, are intentional acts, which, for one reason or another, deviate from the usual or expected course of action. Slips and lapses occur when a person knows what they want to do, but the action does not turn out as they intended. They are failures of execution, rather than failures of knowledge or planning. Slips relate to
21 Safety in Surgery
observable actions and are associated with attention failures, whereas lapses are internal events and associated with failures of memory. Injury to major vessel during a routine operation due to distraction would be a slip, while forgetting to perform an anastomosis would be a lapse. Slips and lapses occur during the largely automatic performance of some routine task, usually in familiar surroundings. They are almost invariably associated with some form of distraction, either from the person’s surrounding or their own preoccupation with something in mind. With mistakes, the actions may go entirely as planned, but the plan itself deviates from some adequate path toward its intended goal. Here, the failure lies at a higher level: with the mental processes involved in planning, formulating intentions, judging, and problem solving [12]. Mistakes may be rule-based mistakes where the right rule is not followed or knowledge-based mistakes where the individual is not aware of the rules or procedures. For example, not prescribing deep venous thrombosis (DVT) prophylaxis to a high-risk patient would be knowledge-based mistake, while prescribing the wrong dose or wrong drug would be a rule-based mistake. Violations are deliberate deviations from safe operating practices, procedures, standards, or rules. A junior surgeon undertaking a major operation without adequate supervision would be a deliberate violation.
21.5 The Person and the System Reasons have encapsulated common assumptions about error in his identification of “person” and “system” approaches to health care. We should note that in reality, these approaches are complementary rather than mutually exclusive, but contrasting the two perspectives will serve to highlight the rather crude approach to error often seen in health care. In a person-centered approach to error, the person who makes an error has certain characteristics which produce the error and, furthermore, that these characteristics are under their control and they are, therefore, to blame for the errors they make. This view is strongly entrenched in health care. Efforts to reduce error are, from this perspective, targeted at individuals and involve exhortations to “do better,” retraining or adding new rules and procedures. For errors with more serious consequences, more severe sanctions come into play such as blaming, disciplinary action, suspension, media condemnation, and so on.
259
In contrast, the “systems approach” assumes that errors and human behavior cannot be understood in isolation, but only in relation to the context in which people are working. Clinical staff is influenced by the nature of the task they are carrying out, the team they work in, their working environment, and the wider organizational context; these are the system factors. From this perspective, errors are seen, not so much as the product of personal fallibility, but as consequences of more general problems in the working environment. In considering how people contribute to accidents, we, therefore, have to distinguish between “active failures” and “latent conditions” [12]. Such factors include long working hours, working under time pressure to complete the list, being distracted in the middle of an operation, not seeking help when faced with a critical decision, and equipment malfunctioning. Within surgery, one cannot deny the relative importance of individual characteristics such as attitude, motivation, clinical accountability, and skill. A strong sense of personal responsibility is fundamental to being a good surgeon. People who deliberately behave recklessly and without regard to their patients’ welfare deserve to be blamed, whether or not they make errors. However, blame, in addition to sometimes destroying careers, is also a major barrier to improving safety, hence the enormous importance given to creating an “open and fair culture” within health care. Learning from error, discussing error, and reporting safety issues and incidents in which patients are harmed require that the people involved are supported and even praised for coming forward. If there is no free flow of information about error and safety, the organization cannot learn, cannot address safety issues, and is destined to remain forever unsafe. Many of the accidents in both health care and other industries need to be viewed from a broad systems perspective, if they are to be fully understood. The actions and failures of individual people usually play a central role, but their thinking and behavior are strongly influenced and constrained by their immediate working environment and wider organizational processes. James Reason has captured the essentials of this understanding in his model of an organizational accident [13]. Major incidents almost always evolve over time, involve a number of people, and a considerable number of contributory factors; in these circumstances, the organizational model (Fig. 21.1) proves very illuminating. The accident sequence begins (from the left) with the negative consequences of organizational processes, such as planning, scheduling, forecasting, design,
260
C. Vincent and K. Moorthy Organisation and Culture
Contributory factors
Work/ Environment Factors
Management Decisions and Organisational Process
Care Delivery Problems
Defences & Barriers
Unsafe Acts
Team Factors
Individual (staff) Factors
Errors Incident
Task Factors
Violations Patient Factors
LATENT FAILURES
ERROR & VIOLATION PRODUCING CONDITIONS
ACTIVE FAILURES
Fig. 21.1 Organizational accident model (adapted from [13])
maintenance, strategy, and policy. The latent failures so created are transmitted along various organizational and departmental pathways to the workplace (the operating theater, the ward, etc.), where they create the local conditions that promote the commission of errors and violations (for example, high workload or poor human equipment interfaces).
21.6 Systems Factors and Patient Safety Vincent et al. have extended Reason’s model and adapted it for use in a health care setting, classifying the error producing conditions and organizational factors in a single broad framework of factors affecting clinical practice [14] (See Table 21.2). At the top of the framework are patient factors. In any clinical situation, the patient’s condition will have the most direct influence on practice and outcome. Other patient factors such as personality, language, and psychological problems may also be important as they can influence communication with staff. The design of the task, the availability and utility of protocols, and test results may influence the care process and affect the quality of care. Individual factors include the knowledge, skills, and experience of each member of staff, which will obviously affect their clinical practice. Each staff member is part of a clinical team and part of the
wider organization of the hospital. The way an individual practices, and their impact on the patient, is constrained and influenced by other members of the team and the way they communicate, support, and supervise each other. The team is influenced, in turn, by management actions and by decisions made at a higher level in the organization. These include policies for continuing education, training, and supervision, and the availability of equipment and supplies. The organization itself is affected by the institutional context, including financial constraints, external regulatory bodies, and the broader economic and political climate. The framework provides the conceptual basis for analyzing clinical incidents, in that it includes both the clinical factors and the higher-level organizational factors that may contribute to the final outcome. In doing so, it allows the whole range of possible influences to be considered and can, therefore, be used to guide the investigation and analysis of an incident.
21.7 Understanding How Things Go Wrong 21.7.1 Incident Analysis There are a number of methods of investigation and analysis available in health care, though these tend to be comparatively underdeveloped in comparison with
21 Safety in Surgery
261
Table 21.2 Framework of contributory factors influencing clinical practice Factor types Contributory influencing factor Patient factors
Condition (complexity and seriousness) Language and communication Personality and social factors
Task and technology factors
Task design and clarity of structure Availability and use of protocols Availability and accuracy of test results Decision-making aids
Individual (staff) factors
Knowledge and skills Competence Physical and mental health
Team factors
Verbal communication Written communication Supervision and seeking help
care delivery problems, and their respective contributory factors, as perceived by each member of staff. While a considerable amount of information can be gleaned from written records, an interview with those involved is the most important method of identifying the contributory factors. This is especially so if the interview systematically explores these factors and so allows the member of staff to collaborate in the investigation. In the interview the story and “the facts” are just the first stage. The staff member is also encouraged to identify both the care delivery problems and the contributory factors, which greatly enrich both the interview and investigation. A clinical team may use the method to guide and structure reflection on an incident. The protocol may also be used for teaching as a vehicle for introducing systems thinking. While reading about systems thinking is helpful, actually analyzing an incident brings systems thinking alive. The contributory factors that reflect more general problems in a unit are the targets for change and systems improvement. The incident acts as a “window” on the system [4].
Team leadership Work environmental factors
Staffing levels and skills mix Workload and shift patterns Design, availability, and maintenance of equipment Administrative and managerial support Physical environment
Organizational and management factors
Financial resources and constraints Organizational structure Policy, standards, and goals Safety culture and priorities
Institutional context factors
Economic and regulatory context National health service executive Links with external organizations
methods available in industry. For instance, Vincent et al. have developed the London Protocol for the investigation of health care incidents [14], a systems approach based on Reason’s organizational accident model. During an investigation, information is gleaned from a variety of sources. Case records, statements, and any other relevant documentation are reviewed. Structured interviews with key members of staff are then undertaken to establish the chronology of events, the main
21.7.2 Human Reliability Analysis Techniques Rather than taking a case or an incident and analyzing it, an alternative approach is to begin with a process of care and systematically examine it for possible failure points. This is the province of human reliability analysis. Human Reliability Analysis or Assessment (HRA) has been defined as the application of relevant information about human characteristics and behavior to the design of objects, facilities, and environments that people use [15]. HRA techniques may be used in the analysis of incidents, but are more usually used to examine a process or system. There are a vast number of these analytic techniques, derived by different people in different industries for different purposes. A detailed description of the various techniques is outside the scope of this chapter, but we will briefly describe two of the techniques that are increasingly being used within health care. Some techniques are primarily aimed at providing a close description of a task or to map out the work sequence. For instance, in hierarchical task analysis, the task description is broken down into subtasks or operations; this approach has been applied with much success to error analysis in endoscopic surgery [16]. Mishra et al. [17] undertook
262
a technical assessment of 26 laparoscopic cholecystectomies using a HRA technique developed by Joice and colleagues and nontechnical assessment of the theater team using a behavioral marking system developed in aviation. They found 0–6 technical errors per procedure and interestingly also found that these correlated with the surgeons situational awareness. They also found that the nontechnical assessment had a good interrater reliability (level of agreement) between the surgeon and nonsurgeon observer. Human error identification and analysis techniques are built on a basic task analysis to provide a detailed description of the kinds of errors that can occur and the points in the sequence where they are likely to occur. Failure mode and effects analysis (FMEA) is a teambased, systematic proactive step-by-step assessment method that for decades has been widely used in engineering and high-risk industries to identify and reduce hazards. For each step of the process, the team carrying out the analysis considers what could go wrong (the failure mode), why the failure might occur (cause), and what could happen if it did occur (effects). Incident analysis is usually seen as retrospective, while techniques such as FMEA, which examine a process of care, are seen as prospective and, therefore, potentially superior. The idea is that by using prospective analysis, we can prevent the next incident, rather than using case analysis to look back at something that has already gone wrong. However, there is no sharp division between retrospective and prospective techniques; the true purpose of incident analysis is to use the incident as a window onto the system, in essence looking at current weakness and future potential problems. Conversely, the so called prospective analysis relies extensively on the past experience of those involved [4].
21.8 Understanding Surgical Error and Surgical Outcomes 21.8.1 Studies of Closed Claims in Surgery Systematic record reviews are the most important outcome-based studies, but claims for malpractice and medical negligence are also a potentially important source of information on the causes of harm to patients [18]. We illustrate this approach by discussing three recent studies.
C. Vincent and K. Moorthy
The American College of Surgeons closed claims study [19] reviewed 460 malpractice claims against general surgeons. Surgeons were recruited to analyze the claims and data were collected according to predetermined standards based on previous seminal articles. Global quality of surgical care was assessed for each claim and defined as “care that would be expected from a prudent and caring general surgeon under the same circumstances as determined by similarly qualified providers.” The most common procedures highlighting safety concerns were those involving the biliary tract, intestines, hernias, vascular system, esophagus, and stomach. The most frequent events leading to claims included delayed diagnosis, failure to diagnose, failure to order diagnostic tests, technical misadventure, delayed treatment, and failure to treat. The global quality of surgical care was satisfactorily met in 36% of claims and not met in 50%. The reviewers found that (31%) of the events were due to deficiencies in postoperative care. In 22% of patients, preoperative diagnosis and treatment were considered the most deficient aspects of care. A technical misadventure was considered the most deficient component of care in 12% of patients. Rogers et al. [20] reviewed 444 closed malpractice claims, from four malpractice liability insurers. Surgeons reviewed the litigation file and medical records to determine whether an injury was attributable to surgical error and, if so, they further tried to analyze the factors that contributed to it. Surgical errors resulted in patient injury in 58% of cases. Sixtyfive percent of these cases involved significant or major injury and 23% involved death. 75% of errors resulting in patient injury occurred during intraoperative care, 25% during preoperative care, and 35% during postoperative care. A third of errors spanned more the one phase of care. Lack of technical competence or knowledge was identified as a contributing factor in 41% of cases. Communication breakdowns contributed to error in 24% of cases. The leading types of breakdowns were inadequate handover and failures to establish clear lines of responsibility. Gawande and colleagues [21] employed a case control design to examine instances of retained instruments and sponges after an operative procedure. The main risk factors that predicted the occurrence of a retained foreign body were undergoing emergency surgery and an unplanned change in operation and body mass index. This design overcomes some of the limitations that occur in traditional methods of closed claims
21 Safety in Surgery
analysis by setting the analyzed claims within a representative cohort.
21.8.2 Observational Studies of Success and Failure and Studies of Communication Observational studies offer another means of understanding both success and failure, and can offer important insights into the way problems evolve in the real world and how a number of factors interact to produce an adverse outcome. With increasing appreciation of the problem of patient safety in surgery, a number of research teams across the world have undertaken observational studies of various aspects of care within the operating theater environment. Marc de Leval’s research group in Great Ormond Street, London conducted research that was pioneering in that it involved direct observations of events in the operating theater and was conducted in close collaboration with psychologists and safety researchers. During their observation of pediatric cardiac surgery, in addition to outcomes, they collected data on the number of errors or procedural failures during the operation. Major events were potentially lifethreatening failures and minor events were those that in isolation were not expected to result in serious consequences. They found that major events and the total number of minor events were closely associated with death and near-misses. This study also highlighted the importance of the recognition and early compensation of errors during high-risk surgery [22]. Our own research began by delineating the various factors that impact outcomes and arguing for the necessity of developing measures of these various factors that influence performance and outcome in the operating theater [6]. Through observations of individual and team performance in the operating theater, our research has revealed a number of interesting findings. Our team has developed a tool for the measurement of teamwork in the operating theater called observation teamwork assessment for surgery, OTAS. This consists of a task checklist as well as behavior rating tool [23]. The observers were a surgical research fellow and a human factors researcher. Our observations have also revealed the extent of distractions and environmental problems in the operating theater [24]. A further group of primarily observational studies has focused on the specific issue of communication
263
within and between surgical teams. For instance, William’s et al. [25] found that of the 328 communication incidents identified by their study, 30% had a direct negative impact on patient care. Basic communications for safe surgery, such as confirmation of surgical site laterality or team checks to proceed with incision, are not consistently performed [23]. There are many potential causes of communication failure, so the solution to communication failure in health care will not be simple. Williams et al. [25] found that patient data including prior medical problems, status, and medications were not effectively represented in a coherent and up to date form, and that information passed verbally among professionals was subject to distortion. Research has also shown that communication failures result from inadequate preoperative planning and evaluation within and between professional groups, missing or unclear case notes, and a failure to follow instructions [26, 27]. Observational research in theater confirms that basic communications deemed important for safe surgery, such as confirmation of surgical site laterality or team preparation checks, are not systematically carried out [23]. Communication on several measures of effectiveness has been found to be highly variable [23, 28] and generally lacking in protocol [29]. Questionnaire surveys of team personnel in the operating room reveal that nearly 80% of personnel believe that enhanced communication is fundamental to patient safety [30]. However, preoperative communication is only practiced in less than 10% of observed procedures [30]. Evidence is, therefore, mounting to show that communication is a very serious problem for safety in health care. Observational studies of theater teams have demonstrated that observations of individual and team performance are feasible even within even such a complex environment such as the operating theater. These studies have also helped in understanding the challenges and the work that needs to be done. Trying to capture data on team communication is made difficult by face masks, the position of the team members in relation to each other and research observers, and the extent of distractions and disturbances in the operating theater. In order to develop and test team-based interventions, we need to determine the measures of human performance in the operating theater. Technical skills measures have been developed over the years [31], but trying to determine measures of team performance is considerably more difficult. OTAS is a significant step in that direction, but may need to be adapted for
264
different procedures. Rhona Flin’s group in Aberdeen, Scotland, has made a significant contribution to the understanding of the importance of nontechnical skills and their measurement in the operating theater [32]. Data are, however, lacking on the relationship of observed events to teamwork and patient outcomes.
21.9 The Next Steps: Improving Surgical Safety Over the past few years, we have come to appreciate the scale of preventable adverse events. There is now a growing focus on trying to address the problem and improve the safety of patients undergoing surgery. We have come to appreciate that errors are a result of a complex interplay of technical, clinical, cultural, psychological, and organizational factors. By the same token, there are multiple factors involved in interventions to produce improvements, ranging from the skill, attitude, and personality of the individual surgeon to the organization of surgical services across a whole health economy. Here we address some interventions that are particularly associated with the broader patient safety agenda and so may be useful to surgical teams to complement and enhance their existing programs. A critical question is who should lead safety initiatives in surgery? Should this be left to the organization and the managers? What role should clinicians play in addition to practicing their craft with the highest dedication, skill, and commitment? Do patients have a role to play? These are the questions that challenge a surgeon endeavoring to improve safety within his or her unit. Safety improvement programs in surgery will be impossible to implement successfully if due emphasis in not placed on the role of the wider organization and every person involved in the care of a patient including the patient.
21.9.1 Reporting and Analysis of Incidents The strengths and limitations of reporting systems, and the lack of engagement of clinicians in formal reporting, are beyond the scope of this chapter; we will simply argue for a much stronger focus on the analysis of incidents rather than simply collecting them. Risk
C. Vincent and K. Moorthy
managers produce clinical incident graphs on a monthly basis, which can be a starting point for more thoughtful investigations by the clinical team; just reporting of an incident will achieve little more than alerting the organization to the occurrence of the event. In order to achieve learning, it is crucial that reporting systems are seen as only the first step in a process that aims to understand the factors behind the incident [4]. Through incident analysis methods such as those mentioned earlier, it is possible to tease out the human factors and organizational issues in order to develop strategies and interventions to prevent them from being repeated. The medical literature is in parallel rising to challenge of patient safety, and case analysis reports are being increasingly published [33]. Incident reporting can be seen as reflective of the safety culture of an organization.
21.9.2 Standardization of Clinical Processes: Guidelines and Protocols Surgery is a complex system. From an engineering perspective, one way of improving safety and reducing errors within a complex system would be to standardize processes and reduce the extent of variability and thus unpredictability. Standardization of clinical processes can be achieved by guidelines, protocols, and clinical care pathways. The use of venous thromboembolism (VTE) prophylaxis guidelines is a good example of the use of evidence-based medicine and the potential impact of guidelines on patient safety. Patients undergoing major surgery are at significant risk of developing DVT and pulmonary embolism (PE). Without the use of prophylactic measures, the risk of VTE is around 20% after general surgery and 50% after orthopedic surgery. Postoperative VTEs are now considered to be significantly common adverse events in the western world. DVT/PE accounted for 9% of adverse events in the Utah and Colorado study, of which 19% were believed to be due to negligence [5]. Thirty percent of DVT/PEs are believed to be preventable [7]. So, when the evidence strongly suggests that DVT prophylaxis is critical in patients undergoing major surgery, why is there still some reluctance on the part of clinicians to prescribe prophylaxis?
21 Safety in Surgery
There are a number of ways by which VTEs can be prevented such as compression stockings, intermittent pneumatic compression, oral anticoagulation, aspirin, and heparin in its two forms. With potential drawbacks and contraindications to nearly all above methods, it is difficult to judge the best prophylactic measure for an individual patient. Guidelines act as decision aids in these circumstances, reducing reliance on human memory and cognition, though clinicians can see them as a threat to their clinical autonomy. Standardization will also fail without due attention being paid to systems factors [34]. The most likely systems errors leading VTEs even in patients treated by clinicians aware of VTE guidelines are omissions and prescription of a wrong dose. One study found that compliance with DVT prophylaxis guidelines can vary from 25 to 80% [35], but that compliance can be increased by active measures such as educational programs, quality assurance activities, and computerized decision support systems.
265
while it may be difficult for a human brain to consider all the variables to determine the probability of appendicitis, computer-aided decision support systems may support the clinician in making a diagnosis [36]. Information technology can also enhance both communications between health professionals and routine checking, thereby lessening reliance on human memory. A computerized hand-over system “UWCores” developed by the University of Washington halved the number of patients missed on rounds, improved the continuity of care, and improved workflow efficiency [37]. Radio-frequency identification tagging (RFID) technology can be used to identify patients prior to surgery, update an inventory of surgical instruments, and reduce the incidence of missing swabs [38]. A swab count is a traditional method of addressing this issue. However, even a swab count is not foolproof [21] because it is associated with human error and the counts are often wrong in stressful circumstances [21]. Radiofrequency tagging of swabs using RFID technology can prevent the occurrence of this error [39].
21.9.3 Information Technology
21.9.4 Improving Communication: Checklists and Briefing The use of information technology is closely associated with the principle of standardization of care. As explained earlier, a limitation of guidelines is that in a number of clinical settings, it is difficult to tailor them to an individual patient. Computers however, when provided with the appropriate information, can completely tailor their guidance or information for the individual patient. Thus, technology potentially provides a marriage between the need for standardization with the clinician’s necessary insistence that treatment is tailored to the individual patient. Electronic decision support systems exploit the computer’s ability to process information that maybe beyond the capacity of the human brain. The sheer quantity of medical information, even within a single specialty, is often beyond the power of one person to comprehend. For example, the diagnosis of appendicitis in a young woman of child bearing age who is minimally tender in the right iliac fossa, has normal inflammatory markers but some blood on urine microscopy may seem simple, but is one of the most challenging decisions to make for the general surgical resident on call. This is only a small subset of the information that is required to make a decision to operate. However,
Checklists serve a number of functions: they act as reminders, help in standardization of processes, serve as safety checks, add redundancy to the system [40], improve information flow, and provide feedback [41]. Checklists are mandatory and routinely used in aviation where there are checklists for every stage of the flight. Their use in health care has, however, been met with skepticism as they are seen to undermine the professional autonomy of clinicians and are seen as “busy work” [40]. In addition to the use of checklists, the Joint Commission recommended the use of a “timeout” or “pause for the cause” to confirm the patient, the procedure, and the site to be operated prior to the incision. This is now a mandatory requirement for all operating theaters in the United States. This has laid down the foundations for the establishment of preoperative team briefings where other checks and communication interventions can be dovetailed onto the “time-out.” In addition, the Joint Commission stipulates that it has to be process where all the team members are actively involved and any concerns or inconsistencies must be clarified at this stage. This has
266
resulted in the “time-out” serving as tool for fostering communication between team members. Preliminary evidence so far suggests that checklists and preoperative briefings aid in reducing the incidence of events such as wrong site surgery [42] and lead to an increase in the use of prophylactic medication in the perioperative period [43, 44]. There is also some evidence that preoperative briefings contribute to an improvement in the safety culture and team environment within the operating theater [42, 45]. Awad et al. found that team training in the use of briefings lead to an improvement in the communication between team members [44]. In addition, the KP organization found that preoperative briefings resulted in a reduced reporting of equipment problems and an increase in staff morale [42]. There were also minimal resource implications in the training and implementation of the briefing process. Lingard et al. found that in addition to improving safety, the briefing session also serves a number of functions [41]. It serves as a means of timely information transfer between team members, details that could potentially impact the flow of the procedure or patient safety can be clarified, team members can use the briefing session for bringing up any concerns, the process can facilitate decision making and foster team camaraderie, and can serve an educational purpose.
21.9.5 Individual Attitudes and Behaviors While we may have placed considerable emphasis on organization and cultural factors within the systems model, the role of the person at the “sharp end” cannot be disputed. People partly create safety by being conscientious, disciplined, and following rules where applicable. Hand washing is a good example of the diligence required on the part of health care staff to prevent infections. As mentioned earlier, studies have shown that over 50% of errors in surgical patients are due to technical misadventures [20]. A systems analysis of these errors may reveal factors such as communication problems, distractions, lack of supervision, the prevailing culture within the unit, etc. But it is also important to appreciate the importance of individual skill and diligence. This is best explained in Gawande’s words who said that “Operations like the lap chole have taught me how easily error can occur, but they’ve also showed me
C. Vincent and K. Moorthy
something else: effort does matter; diligence and attention to the minutest detail can save you.”
21.9.6 Simulation-Based Training Teams, like individuals, may erode or create safety. Underlying a number of specific team skills, such as prioritizing tasks, monitoring each other’s work, and communicating effectively, is the idea that the team has a common understanding of the task in question and the nature of team work. This is sometimes referred to as a ‘shared mental model’, analogous to the mental models of the world that each of us has as individuals. For example, it is essential that all the members of a surgical team are aware of the specifics of a procedure (laparoscopic and open; two stage or three stage esophagectomy) in order to anticipate any potential problems, be vigilant, and cross-check each other. However, this is not always the case among surgical teams. Guerlain et al. found that situational awareness was poor among most team members [46]. Traditional surgical training has its limitations. It is entirely based on learning on real patients with drawbacks to the learner as well as the patient. Training is not adapted to the needs of the learner, is not structured, and task difficulty cannot be escalated to enhance skills. More importantly, patients are potentially put at risk. Error and crisis management is difficult to teach in real life because of the rarity of the occurrence of crisis situations. In order to address these limitations, the surgical community has adopted the principles of simulationbased training from other high reliability organizations such as aviation and the military. Simulations allow core skills to be learnt prior to transfer to real environments and learning to be structured and tailored, and learners can make mistakes and appreciate the consequences of their mistakes. More importantly, they can learn to effectively manage and deal with their mistakes. There a number of simulations developed and validated for the acquisition of technical skills in surgery [47], but the role of simulations in training not just surgeons but surgical teams to work with a greater degree of coordination is being gradually acknowledged. Simulations are beginning to play an important part in the training of personnel in the operating theater. For example, courses in anesthesia crisis resource management [48] address the technical skills and
21 Safety in Surgery
knowledge required to effectively manage crises, but also place considerable emphasis on judgment, decision-making, vigilance, and communication with other team members. Our group in Imperial College has extended the use of surgical simulations beyond the acquisition and assessment of solely technical skills, to training teams in the operating theater and in crisis management skills [49].
21.9.7 Improving the Safety Culture The UK health and safety commission defines safety culture as “the product of the individual and group values, attitudes, competencies and patterns of behavior that determine the commitment to…safety.” Thus, even though it is a reflection of an organization or unit’s efforts to improve safety, every individual contributes to those efforts. The tendency for excessive, immediate, and unreasoning blame in the face of patient harm, both from within and outside health care organizations, has led some to call for a “no-blame” culture. This, if taken literally, would appear to remove personal accountability and also remove many social, disciplinary, and legal strictures on clinical practice. A culture without blame would, therefore, seem to be both unworkable and to remove some of the restrictions and safeguards on safe behavior. A much better objective is to try to develop an open and fair culture, which certainly means a huge shift away from blame, but preserves personal responsibility and accountability. Leadership, clinical and organizational, is crucial to the safety culture of a surgical unit and an organization. Leaders influence safety directly by setting up safety-related committees and initiatives and allowing staff time to engage in fundamental safety issues, such as the redesign of systems. Leaders also influence safety indirectly by talking about safety, showing they value it, and being willing to discuss errors and safety issues in a constructive way. An organization with a safety culture would also have a reporting and learning system at its heart. The role of reporting and learning systems in understanding errors has been talked about earlier, but in the cultural context, we are more concerned with the attitudes and values that underlie a willingness to report and, more importantly, to reflect and learn. Every error should be seen as a learning opportunity.
267
21.9.8 The Patient’s Role in Patient Safety Finally, we turn to the contribution of the most important members of the surgical team – the patient and their family. From this chapter it is probably obvious to the reader that safety is addressed and discussed in multiple ways, lessons are sought from all manner of other industries and experts, from the disciplines of psychology, ergonomic, engineering, and many others. Yet the one source of experience and expertise that remains largely ignored is that of the patient. One might argue that patients do not have much to contribute; after all, many people fly, but aviation safety does not rely on the passengers for safe operation. In health care, however, unlike aviation, the patient is a privileged witness of events both in the sense that they are at the center of the treatment process and also that, unlike clinical staff who come and go, they observe almost the whole process of care. Patients are usually thought of as the passive victims of errors and safety failures, but there is considerable scope for them to play an active part in ensuring their care is effective, appropriate, and safe. Angela Coulter [50] has argued that instead of treating patients as passive recipients of medical care, it is much more appropriate to view them as partners or coproducers with an active role in their care. For instance, patients have a vital role to play in providing accurate and relevant information about their care. In surgery, patients should be encouraged to report postsurgical complications promptly so that swift action can be taken if necessary. Unfortunately, lack of information about what to watch out for after discharge from hospital is a very common complaint. In a postal survey of patients discharged from hospital, 31% of respondents said they were not given a clear explanation of the results of their surgical procedures, 60% were not given sufficient information about danger signals to watch out for at home, and 61% were not told when they could resume their normal activities. If greater attention was paid to providing this type of information, it could lead to a reduction in the rate of complications and readmissions [51]. Organizations across the world are acknowledging the value of patient involvement in safety. The Joint Commission for Accreditation of health care organizations (JCAHO) and the Australian Commission on Safety and Quality in Health Care produce patient information leaflets to encourage patients
268
to be active participants in the process of confirmation of surgical side/site and procedure.
References 1. Thomas EJ, Petersen LA (2003) Measuring errors and adverse events in health care. J Gen Intern Med. 18(1):61–67 2. Neale G, Woloshynowych M (2003) Retrospective case record review: a blunt instrument that needs sharpening. Qual Saf Health Care 12:2–3 3. Woloshynowych M, Neale G, Vincent C (2003) Case record review of adverse events: a new approach. Qual Saf Health Care 12:411–415 4. Vincent C (2006) Patient safety. Elsevier, London 5. Thomas EJ, Studdert DM, Burstin HR et al (2000) Incidence and types of adverse events and negligent care in Utah and Colorado. Med Care 38:261–271 6. Vincent C, Moorthy K, Sarker SK et al (2004) Systems approaches to surgical quality and safety: from concept to measurement. Ann Surg 239:475–482 7. Gawande AA, Thomas EJ, Zinner MJ et al (1999) The incidence and nature of surgical adverse events in Colorado and Utah in 1992. Surgery 126:66–75 8. Neale G, Woloshynowych M, Vincent C (2001) Exploring the causes of adverse events in NHS hospital practice. J R Soc Med 94:322–330 9. Wanzel KR, Jamieson CG, Bohnen JM (2000) Complications on a general surgery service: incidence and reporting. Can J Surg 43:113–117 10. Leape LL (1994) Error in medicine. JAMA 272:1851–1857 11. Reason J (1995) Understanding adverse events: human factors. Qual Health Care 4:80–89 12. Reason JT (2001) Understanding adverse events: the human factor. In: Vincent C (ed) Clinical risk management: enhancing patient safety. BMJ, London 13. Reason JT (1997) Managing the risks of organisational accidents. Ashgate, Aldershot 14. Vincent C, Taylor-Adams S, Stanhope N (1998) Framework for analysing risk and safety in clinical medicine. BMJ 316: 1154–1157 15. Kirwan B (1994) A guide to practical human reliability assessment. Taylor and Francis, London 16. Joice P, Hanna GB, Cuschieri A (1998) Errors enacted during endoscopic surgery – a human reliability analysis. Appl Ergon 29:409–414 17. Mishra A, Catchpole K, Dale T et al (2008) The influence of non-technical performance on technical outcome in laparoscopic cholecystectomy. Surg Endosc 22:68–73 18. Vincent CA (1993) The study of errors and accidents in medicine. In: Vincent CA, Ennis M, Audley RJ (eds) Medical accidents. Oxford University Press, Oxford 19. Griffen FD, Stephens LS, Alexander JB et al (2007) The American College of Surgeons’ closed claims study: new insights for improving care. J Am Coll Surg 204:561–569 20. Rogers SO Jr, Gawande AA, Kwaan M et al (2006) Analysis of surgical errors in closed malpractice claims at 4 liability insurers. Surgery 140:25–33
C. Vincent and K. Moorthy 21. Gawande AA, Studdert DM, Orav EJ et al (2003) Risk factors for retained instruments and sponges after surgery. N Engl J Med 348:229–235 22. de Leval MR, Carthey J, Wright DJ et al (2000) Human factors and cardiac surgery: a multicenter study. J Thorac Cardiovasc Surg 119:661–672 23. Undre S, Healey AN, Darzi A et al (2006) Observational assessment of surgical teamwork: a feasibility study. World J Surg 30:1774–1783 24. Healey AN, Sevdalis N, Vincent CA (2006) Measuring intraoperative interference from distraction and interruption observed in the operating theatre. Ergonomics 49: 589–604 25. Williams RG, Silverman R, Schwind C et al (2007) Surgeon information transfer and communication: factors affecting quality and efficiency of inpatient care. Ann Surg 245:159–169 26. Kluger MT, Tham EJ, Coleman NA et al (2000) Inadequate pre-operative evaluation and preparation: a review of 197 reports from the Australian incident monitoring study. Anaesthesia 55:1173–1178 27. Ludbrook GL, Webb RK, Fox MA et al (1993) The Australian Incident Monitoring Study. Problems before induction of anaesthesia: an analysis of 2000 incident reports. Anaesth Intensive Care 21:593–595 28. Lingard L, Reznick R, Espin S et al (2002) Team communications in the operating room: talk patterns, sites of tension, and implications for novices. Acad Med 77:232–237 29. Grote G, Zala-Mezö E, Grommes P (2004) The effects of different forms of coordination in coping with work load. In: Dietrich R, Childress T (eds) Group interaction in high-risk environments. Ashgate, Aldershot, pp 39–55 30. Sexton JB, Thomas EJ, Helmreich RL (2000) Error, stress, and teamwork in medicine and aviation: cross sectional surveys. BMJ 320:745–749 31. Moorthy K, Munz Y, Adams S et al (2005) A human factors analysis of technical and team skills among surgical trainees during procedural simulations in a simulated operating theatre. Ann Surg 242:631–639 32. Yule S, Flin R, Paterson-Brown S et al (2006) Development of a rating system for surgeons’ non-technical skills. Med Educ 40:1098–1104 33. Shojania KG, Fletcher KE, Saint S (2006) Graduate medical education and patient safety: a busy – and occasionally hazardous–intersection. Ann Intern Med 145:592–598 34. Leape LL, Berwick DM, Bates DW (2002) What practices will most improve safety? Evidence-based medicine meets patient safety. JAMA 288:501–507 35. Tooher R, Middleton P, Pham C et al (2005) A systematic review of strategies to improve prophylaxis for venous thromboembolism in hospitals. Ann Surg 241:397–415 36. de Dombal FT, Dallos V, McAdam WA (1991) Can computer aided teaching packages improve clinical care in patients with acute abdominal pain? BMJ 302:1495–1497 37. Van Eaton EG, Horvath KD, Pellegrini CA (2005) Professionalism and the shift mentality: how to reconcile patient ownership with limited work hours. Arch Surg 140:230–235 38. Schwaitzberg SD (2006) The emergence of radiofrequency identification tags: applications in surgery. Surg Endosc 20: 1315–1319 39. Macario A, Morris D, Morris S (2006) Initial clinical evaluation of a handheld device for detecting retained surgical
21
Safety in Surgery
gauze sponges using radiofrequency identification technology. Arch Surg 141:659–662 40. Hales BM, Pronovost PJ (2006) The checklist – a tool for error management and performance improvement. J Crit Care 21:231–235 41. Lingard L, Espin S, Rubin B et al (2005) Getting teams to talk: development and pilot implementation of a checklist to promote interprofessional communication in the OR. Qual Saf Health Care 14:340–346 42. DeFontes J, Surbida S (2004) Preoperative safety briefing project. Permanente J 8:21–27 43. Altpeter T, Luckhardt K, Lewis JN et al (2007) Expanded surgical time out: a key to real-time data collection and quality improvement. J Am Coll Surg 204:527–532 44. Awad SS, Fagan SP, Bellows C et al (2005) Bridging the communication gap in the operating room with medical team training. Am J Surg 190:770–774 45. Makary MA, Mukherjee A, Sexton JB et al (2007) Operating room briefings and wrong-site surgery. J Am Coll Surg 204: 236–243
269 46. Guerlain S, Adams RB, Turrentine FB et al (2005) Assessing team performance in the operating room: development and use of a “black-box” recorder and other tools for the intraoperative environment. J Am Coll Surg 200: 29–37 47. Grantcharov TP, Kristiansen VB, Bendix J et al (2004) Randomized clinical trial of virtual reality simulation for laparoscopic skills training. Br J Surg 91:146–150 48. Gaba DM, Howard SK, Flanagan B et al (1998) Assessment of clinical performance during simulated crises using both technical and behavioral ratings. Anesthesiology 89:8–18 49. Moorthy K, Munz Y, Forrest D et al (2006) Surgical crisis management skills training and assessment: a simulation [corrected]-based approach to enhancing operating room performance. Ann Surg 244:139–147 50. Coulter A (1999) Paternalism or partnership? Patients have grown up-and there’s no going back. BMJ 319: 719–720 51. Coulter A (2001) Quality of hospital care: measuring patients’ experiences. Proc R Coll Phys Edin 31(9):34–36
22
Safety and Hazards in Surgical Research Shirish Prabhudesai and Gretta Roberts
Contents Abbreviations ..................................................................... 271 22.1
Introduction .......................................................... 271
22.2
Health and Safety Law ........................................ 272
22.3
Common Hazards in the Research Laboratory............................................................ 273
22.3.1 22.3.2 22.3.3
Biological Hazards................................................. 273 Animal Material ..................................................... 274 Human Material ..................................................... 274
22.4
Theatre Safety and Surgical Smoke ................... 275
22.5
Transport of Material .......................................... 275
22.6
Genetic Modification............................................ 275
22.7
Chemicals.............................................................. 275
22.8
Radiation............................................................... 278
22.9
Other Hazards ...................................................... 278
22.10
Routes of Transmission for Chemicals, Biologicals and Radiation.................................... 279
22.11
Risk Assessment and Control Mechanisms ....... 279
22.11.1 Five Steps to Risk Assessment .............................. 279 22.11.2 Principles for Control Measures ............................ 280 22.12
Waste Disposal...................................................... 281
22.13
Health Surveillance .............................................. 281
22.14
Reporting of Injuries, Diseases and Dangerous Occurrences Regulations 1995 ............................ 281
References ........................................................................... 282
S. Prabhudesai () Bart’s and the London Hospital NHS Trust, The Royal London Hospital, Whitechapel, London E1 1BB, UK e-mail: [email protected]
Abbreviations COSHH GM HSE LTEL MSDS PPE STEL
Control of Substances Hazardous to Health Genetically modified Health and safety executive Long-term exposure limit Material safety data sheet Personal protection equipment Short-term exposure limits
Abstract The importance of carrying out surgical research that is safe not only to those involved but also to colleagues, other members of the institute and to the general public cannot be over-emphasized. Researchers and those managing the research should be aware of the potential hazards of their studies and the relevant health and safety regulations. This chapter discusses a number of common hazards that may be encountered during surgical research and the various safety measures that need to be considered. Well-defined health and safety policies in every surgical research department will help produce studies of the highest quality and minimise the likelihood of disasters.
22.1 Introduction Surgical research is a wide and varied field, ranging from clinical trials comparing surgical treatments, virtual reality simulators to assess and enhance surgical skills to traditional laboratory-based biological research. Researchers have an important role in carrying out research that is safe not only to them, but also to others who may be affected. For this, it is vital that the researchers and those managing the research are aware
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_22, © Springer-Verlag Berlin Heidelberg 2010
271
272
S. Prabhudesai and G. Roberts
of the health and safety regulations and potential hazards that they could encounter during their research.
22.2 Health and Safety Law The Health and Safety at Work Act was introduced in the United Kingdom in 1974. This act describes general duties that employers have towards their employees, to members of the public and the duty of the employees to themselves and to others. Based on the act, the employers must ensure, so far as is reasonably practicable, the health, safety and welfare of all employees and includes
a requirement for a written safety policy together with organizational and other arrangements. There are numerous regulations made under the act which cover specific areas of health and safety, a number of which are core to managing safety in surgical research laboratories. Table 22.1 identifies a number of these regulations which need to be considered when looking at safety in surgical research; this list is not exhaustive and those that apply will vary with the type of research done. Many of these regulations have Approved Codes of Practice or guidance documents issued by the Health and Safety Commission as well as condensed leaflets and information which are available on the Health and Safety Executive (HSE) website [1].
Table 22.1 Summary of Health and Safety Legislation applicable to Surgical Laboratory Research Common regulations for the laboratory Summary Control of Substances Hazardous to Health Regulations 2002 as amended 2005
Control of chemicals and biological hazards – requirement for risks to be assessed
Display Screen Equipment Regulations 1992
Ergonomic set-up of workstations in the laboratory, e.g. mass spectrometers, NMR machines, etc.
Reporting of Injuries Diseases and Dangerous Occurrences Regulations 1995
Specifies certain accidents and injuries to be reported to the HSE
Health and Safety (Safety Signs and Signals) Regulations 1996
Details the requirement for prohibition, warning, mandatory and emergency signage in the workplace
Workplace (Health, Safety and Welfare) Regulations 1992
Defines minimum welfare requirements for a workplace
Pressure Systems Safety Regulations 2000
Covers pressure systems such as pressure autoclaves, liquid nitrogen dewars and gas regulators
Ionising Radiation Regulations 1999
Specifies the need to employ radiation protection supervisors and advisers and the need for risk assessment
Management of Health and Safety at Work Regulations 1999
Supplement the requirements of the Health and Safety at Work Act, requirements for risks to be assessed and controlled
Manual Handling Operations Regulations 1992 Avoid where possible, assess and reduce the risk Noise at Work Regulations 1989
Protect people from exposure to harmful noise- three-action levels- reduce noise according to hierarchy of risk control, e.g. drilling, impactor mills, etc.
Personal Protective Equipment at Work Regulations 1992
Stipulates provision, maintenance and use of PPE
Fire Safety Order
Risk assess the fire risk of the laboratory and take reasonable precautions, e.g. store flammables in special metallic cupboards, reduce combustible material in laboratory, etc., emergency procedures
Provision and use of Work Equipment Regulations 1998
Equipment must be suitable for purpose, maintained in good repair and any safety measures used
Electricity at Work Regulations 1989
Require precautions to be taken against the risk of injury from electricity
First Aid Regulations 1981
First aid kit suitable for the type of work being performed. Eye wash/eye wash station, first aiders, signage
22
Safety and Hazards in Surgical Research
273
Head of Institute Responsible for provision of safety management systems
Advice available from Head of Faculty Institute safety Officer and Head of Occupational Health Specialist Safety Advisors e.g. Biological or Radiation
Faculty/Department Safety Oficer
Responsible to ensure systems are implemented Head of Department
Responsible for implementation of safety procedures and protcols and ensuring risk assessments are completed
Principal Investigator
Local Safety Advisors Research Staff and Students
Responsible for following safety procedures and protocols
Fig. 22.1 The chain of health and safety responsibility in an Academic or Medical Institution
The Health and Safety legislation puts the emphasis on the employer to ensure Health and Safety in their organisation. The practical organisation of this varies from institute to institute; typically most have highly trained specialist Health and Safety advisers who work closely with the occupational health team. However, legally the chain of responsibility lies with the principal investigators, heads of department and ultimately the principal or chief executive officer of the organisation. An example of the chain of responsibility is outlined in Fig. 22.1.
Hazard Group 1
22.3 Common Hazards in the Research Laboratory A hazard is “the potential of a substance, activity or process to cause harm”, whilst a risk is “the likelihood of a substance, activity to process to cause harm”. A traditional laboratory contains a wide variety of hazards including chemicals, biological agents, radiation agents, hazardous equipment, sharps, fire and even workstations, all of which if not controlled can pose a risk to health.
22.3.1 Biological Hazards Biological agents are included as hazardous substances under Control of Substances Hazardous to Health Regulations (COSHH 2002). These include pathogens, cell cultures, human bio- fluids, human tissues and animal material. All biological organisms are classified in one of four hazard groups (Fig. 22.2) and the advisory committee on dangerous pathogens has issued an approved list of biological agents and their hazard categories. However, it is worth noting that if the organism is not on the list, it does not necessarily fall into category 1 [2]. Classification of biological agents usually determines minimum containment level required and the specification of the laboratories is outlined in schedule of the regulations. Table 22.2 shows the requirements for a containment level 2 laboratory such as one which handles clinical samples e.g. blood and tissues.
Hazard Group 2
Hazard Group 3
Hazard Group 4
Increasing hazard to human health
Unlikely to cause human disease
Fig. 22.2 Classification of biological hazards
Can cause human disease but is unlikely to spread to community and there is usually effective prophylaxis or treatment available
Causes severe human disease and may spread to community but there is usually effective prophylaxis or treatment available
Causes severe human disease and may spread to community and there is usually no effective prophylaxis or treatment available
274
S. Prabhudesai and G. Roberts
Table 22.2 Category 2 laboratory requirements (taken from Schedule 3 of the COSHH Approved Codes of Practice) Containment measures Containment level 2 Workplace is to be separated from any other activities in the same building
No
Input air and extract air to the workplace are to be filtered using HEPA or equivalent
No
Access to be restricted to authorised persons only
Yes
Workplace is to be sealable to permit disinfection
No
Specific disinfection procedure
Yes
Workplace to be maintained at an air pressure No negative to atmosphere Surfaces impervious to water and easy to clean Yes (benches) Surfaces resistant to acids, alkalis, solvents and disinfectants
Yes (benches)
Safe storage of biological agents
Yes
A laboratory is to contain its own equipment
No
Infected material including any animal is to be handled in a safety cabinet or isolator or other suitable containment
Yes, where aerosol produced
22.3.2 Animal Material Ex vivo research – Some surgical research involves the use of animal tissues or bones, often as a model for perfecting surgical manipulations or techniques. These samples should be treated in accordance with good laboratory practice and can be used in containment level 1 laboratories. Whilst these materials on arrival may pose no significant health risks, there is the potential for them to harbour harmful bacteria if not stored and prepared correctly. No eating, drinking or putting anything into the mouth is an important control measure, as is buying the material from an approved source, correct storage, either short term in the fridge or longer-term freezer and ensuring that surfaces and equipment are effectively decontaminated after use. In vivo research – Research involving the use of live animals presents a number of hazards. Details on the ethics and governance of animal experimentation are described elsewhere in this book. On purely health and safety grounds, the potential of biting and scratching from the animal during handling must be evaluated and a number of accidents and needle stick injuries occur when trying to inject or manipulate an unwilling
animal recipient. Animal experimentation is usually performed in dedicated animal units; guidelines for the safe housing and set up of animal facilities should be available through the institute safety office. Occupational health surveillance is an essential part for controlling the risks posed by animal experimentation, especially in relation to animal allergens.
22.3.3 Human Material Ex vivo research and universal precautions – A high proportion of laboratory-based surgical research revolves around the use of human bio-fluids or tissues. The potential risk of infections from these samples includes blood borne viruses such as HIV, hepatitis B and hepatitis C. Generally, it is advised not to use samples from patients with known infections unless absolutely necessary for the study, in which case the samples should be treated as per infectious agent e.g. lung biopsy from a patient diagnosed as TB should be handled in a containment level 3 facility. Even when the samples are not known to be infected, they should still be handled as potentially infectious and treated with “universal precautions”. Human samples should be handled in a containment level 2 laboratory, and procedures in which there is a potential to produce aerosols should be performed in a class II or I biological safety cabinet. These precautions should apply to all samples including, but not limited to, tissues, blood, semen, vaginal secretions, synovial fluid and pleural fluid. “Universal precautions” include: • Transportation of samples in robust containers in secondary containment, appropriately labelled. • Always wear gloves, laboratory coat and safety glasses. • Use a biological safety cabinet if there is the potential to aerosols or splashes. • Minimise use of sharps and glass, and if required use them cautiously. • Practice good hygiene. Wash hands and other skin surfaces thoroughly. • Dispose off waste appropriately. • Disinfection with recognised agent. • Procedures should be in place in the event of exposure or in an emergency. • All users should be immunised for hepatitis B and tetanus and necessary arrangements made with the Occupational Health Department.
22
Safety and Hazards in Surgical Research
In vivo research – The hazards of this type of research should be well known to the majority of surgeons. It is always important to consider both the risks to the operator and to the patient. The introduction of materials or equipment from the laboratory to the theatre should be discussed in detail with the theatre manager and infection control.
22.4 Theatre Safety and Surgical Smoke Whilst it is outside the remit of this chapter to discuss the general health and safety requirements of a surgical theatre, surgeons involved in research must be aware of the impact of their research on patients, fellow theatre staff and themselves. As with specialist theatre equipment, research equipment should be tested be fore use and meet infection control standards. Personal protective equipment should meet the minimum general requirements, although extra personal or engineering protection may be required depending on the procedure to be performed. Sterile gloves, theatre hats and surgical masks protect both the patient and the surgeon and all standard hygiene measures should be observed. Surgical smoke is produced when ultrasonic (harmonic) scalpels, electrosurgical and laser devices are used. Not only do these procedures give rise to an unpleasant odour, but may also contain chemical and biological hazards. These chemicals can cause headaches, nausea, vomiting, irritation to the eyes, nose and throat and viable and non-viable bacteria and virus have been isolated from surgical smoke [3]. Control measures include the provision of robust ventilations systems, either locally at the site of manipulation or within the direct operating environment. There is no conclusive evidence to suggest that wearing a standard surgical mask reduces post-operative infection or protects the wearer from exposure to surgical smoke; however, it does protect the wearer from contact with body fluids.
22.5 Transport of Material Transport of biological material, either within or between institutions, is an often overlooked part of safety and one which if not controlled may result in
275
exposure of members of the public to potentially infectious material. Whilst transporting samples on site, it is important to place the samples in secondary containment, such as a plastic box and containing absorbent material (e.g. paper tissues), so that if an accident occurs, any spillage is contained. Transporting samples off-site is more complicated; with the exception of small diagnostic samples the normal postage system cannot be used and specialist couriers must be employed. Samples should be placed in rigid containers, securely sealed and placed inside a UN approved biological specimen container and packed with absorbent materials in accordance with the relevant packaging instruction. The package must be adequately labelled and details of the sender and addressee included. On receipt of such a package, it is important to open it in a controlled manner in the laboratory in case a leak it breakage of the sample container has occurred. Transportation of category 3 material or cold or dry-ice shipments should be made via a specialist biological courier.
22.6 Genetic Modification The Genetically Modified (GM) Organisms (Contained Use) Regulations 2000 covers the use of genetically modified pathogens, cell lines and animals in research. The intention to perform any GM work must be notified to the institution and class 2 and 3 activities must be notified to, and permission granted by, the HSE. The management of GM health and safety lies outside the scope of this chapter, but further information is available on the HSE website.
22.7 Chemicals A hazardous chemical is one which has the potential to cause ill health to people and include substances used directly in the workplace, those produced by work activities and those which occur naturally. Hazardous chemicals are classified according to the severity and type of hazard they present either immediately or in the long term. Harmful chemicals are included in the remit of the COSHH regulations, the supply of such chemicals is also covered by the Chemicals Hazardous Information and Packaging for Supply Regulations 2002 and the commonly used hazard symbols can be found in Fig. 22.3.
276
S. Prabhudesai and G. Roberts
Fig. 22.3 Chemical hazard symbols Xn - Harmful Xi - Irritant
Oxidizing
Corrosive
Hazardous to the environment
Flammable Very flammable
Biohazard
CORROSIVE
BIOHAZARD
Toxic Very Toxic
• Harmful – Substance that if swallowed, inhaled or penetrates, the skin may pose limited health risks. • Irritant – Non-corrosive substance that can cause dermatitis, conjunctivitis or bronchitis after contact. Some irritants are also sensitizers, i.e. repeated contact with a substance induces an increased reaction. • Corrosive – Substances (gas, liquid or solid) attack living tissue, not only the skin and eyes, but also the respiratory tract by inhalation or gastrointestinal tract by ingestion, normally by burning, e.g. hydrochloric acid, sulphuric acid. • Flammable – Substances which can easily be ignited and are capable of burning rapidly. May include solids, liquids, vapours or gases. Includes also the following: ethanol, hydrogen gas and ether. • Oxidizing – Oxidizing agents are chemicals that bring about an oxidation reaction. Fire or explosion
Ionising radiation
is possible when strong oxidizing agents come into contact with easily oxidizable compounds. An example of an oxidizing agent is hydrogen peroxide. • Toxic – A number of substances fall under the toxic heading. Classically, toxic represents substances that affect the function of the body. Examples of toxic chemicals include acrylamide, phenol, sodium azide and methanol. Carcinogenic substances are also commonly labelled as toxic. They are defined as substances which are known or are suspected to promote abnormal development of cells to become cancerous. In addition, mutagenic and teratogenic compounds are included in this category, which may cause heritable genetic mutations or non-heritable genetic mutations/malformations in the developing foetus, respectively. Other toxic chemicals may also impair fertility or harm the unborn child.
22
Safety and Hazards in Surgical Research
• Explosive – Can explode when subjected to adverse conditions such as heat, light, mechanical shock, detonation and certain chemical catalysts. • Environmental – Are chemicals that may affect environmental flora or fauna. These chemicals should not be released into the environment and require specialised disposal via chemical waste contractors usually via the institute’s safety office. • Liquid nitrogen and cryogenics – Liquid nitrogen is frequently used in research laboratories, for snap freezing surgical biopsy samples, during some sample preparation techniques and for transportation. The hazards associated with liquid nitrogen are often underestimated, not only can contact with liquid nitrogen cause cold burns, but also asphyxiation may occur due to the rapid expansion of the liquid to vapour and the displacement of oxygen from the air. Other hazards include the potential for containers to explode on removal from liquid nitrogen due to internal leakage and gas expansion. The delivery, transportation and use of liquid nitrogen and other cryogens should be tightly controlled with thorough procedures and training in place. In addition, pressurised liquid nitrogen dewars have specific maintenance, insurance, manual handling and transportation requirements. Due to the potential for serious incidents it is usually deemed unacceptable to travel in a lift or vehicle with more than 500 ml of liquid nitrogen. General steps to control the potential risks of liquid nitrogen include using liquid nitrogen compatible storage and transport dewars, wearing cryogenic gloves and face mask and the use of oxygen depletion monitors in confined spaces. Finding out more information on chemicals – The labels of chemicals often include details of its associated hazards along with, if applicable, one or more of the symbols detailed in Fig. 22.3. In addition, when purchased, chemicals should be accompanied by a material safety data sheet (MSDS). Most companies also have these MSDS available as PDF files on their websites. MSDS provides a wealth of information on the chemical(s) including the hazard definition and additional safety instructions can be found in Section 15 – Regulatory Information – which lists the “Risk” ® and “Safety” (S) phrases that apply. These phrases give a good indication on the hazards associated with the chemicals and how they should be handled; there are presently 68 R and 64 S phrases, examples of which can be seen in Table 22.3.
277 Table 22.3 Frequently encountered risk and safety phrases. For the full list of phrases, please see http://www.hse.gov.uk/chip/phrases.htm Risk or safety Phrase number R10
Flammable
R14
Reacts violently with water
R20/21/22
Harmful by inhalation, in contact with skin and if swallowed
R20/22
Harmful by inhalation and if swallowed
R21
Harmful in contact with skin
R23/24/25
Toxic by inhalation, in contact with skin and if swallowed
R26/27/28
Very toxic by inhalation, in contact with skin and if swallowed
R34
Causes burns
R36/37/38
Irritating to eyes, respiratory system and skin
R39
Danger of very serious irreversible effects
R40
Limited evidence of a carcinogenic effect
R41
Risk of serious damage to eyes
R42/43
May cause sensitisation by inhalation and skin contact
R45
May cause cancer
R46
May cause heritable genetic damage
R48
Danger of serious damage to health by prolonged exposure
R50/53
Very toxic to aquatic organisms, may cause long-term adverse effects in the aquatic environment
R60
May impair fertility
R61
May cause harm to the unborn child
R64
May cause harm to breast-fed babies
R65
Harmful: may cause lung damage if swallowed
R67
Vapours may cause drowsiness and dizziness
R68
Possible risk of irreversible effects
S3/7
Keep container tightly closed in a cool place
S9
Keep container in a well-ventilated place
S16
Keep away from sources of ignition – no smoking
S17
Keep away from combustible material
S22
Do not breathe dust
S24/25
Avoid contact with skin and eyes
S29
Do not empty into drains
S36/37
Wear suitable protective clothing and gloves
S36/37/39
Wear suitable protective clothing, gloves and eye/face protection
S51
Use only in well-ventilated areas
278
Workplace exposure limits – Some hazardous substances have occupational work exposure limits set under COSHH regulations. These limits are set over an 8 hour working day (long-term exposure limit, LTEL) or as a short-term exposure limit (STEL; e.g. 15 minutes) and must not be exceeded. Information of exposure limits is included in the MSDS and further information is available via the HSE website.
22.8 Radiation Radiation in surgical research can be encountered in many forms, as ionising radiation in the form of X-rays or radioisotopes or as non-ionising radiation in the form of ultra-violet (UV), electromagnetic or lasers and all pose a significant hazard to human health. Ionising radiation can be found in two forms, either as ionising alpha and beta particles or as X-rays and gamma-rays. This form of regulation is covered by the Ionising Radiations Regulations of 1999 as well as the Radiation Act of 1993. Due to the potential security and health risks posed by ionising radiation, strict guidelines are in place, all work must be licensed and all workers registered. There is also a requirement for the appointment of an appropriately trained radiation protection adviser and radiation protection supervisor and the need for risk assessment and contingency plans. X-rays – Are used in research in several fields, either using medical grade equipment or high powered analytical machines. Frequent exposure to X-rays may damage the DNA in the cell, resulting in their destruction or mutation. Congenital malformations and increased incidence of cancer may be observed in children exposed to radiation during the foetal and embryonic stages of development. Additional care must, thus, be taken to protect pregnant workers from exposure. Short-term high level exposure can also cause severe radiation-induced skin burns and damage to the eyes, the degree of burns could range from reddening of skin to desquamation to ulceration and necrosis depending on the exposure dose. Containing the radiation within a piece of equipment or room, controlling access, provision of adequate shielding and monitoring exposures are all important to minimise the health risks of such equipment. Radioisotopes – Are used both in the clinical setting and the research environment and can produce radiation in the form of alpha or beta particles as well as
S. Prabhudesai and G. Roberts
gamma rays. As with X-rays, this type of radiation source can cause damage and destruction to cells; however, unlike X-rays which are only emitted when the equipment is switched on, radioisotopes are in a continuous state of decay and constantly emitting radiation. This form of radiation is easily spread through contact, and great care must be taken whilst working with these substances to prevent contamination of the environment and the workers. Control measures include their use in specified and registered laboratories or areas with restricted access, to reduce exposure by limiting quantities used and exposure times and by using appropriate work practices, shielding, personal dose measurement (dosimetry), a high level of laboratory hygiene and by contamination monitoring using specific radiation counters. Further information should be available through the institute’s radiation protection adviser and on the HSE website. Non-ionising radiation can come in the form of infra-red and UV radiation, lasers, radiofrequency and microwave radiation, and exposure can result in skin or eye burns and chronic effects such as cancers, neurological effects and sterility. Limiting exposure times and the provision of shielding and appropriate personal protective equipment are effective ways of controlling the hazards posed by this type of radiation.
22.9 Other Hazards Hazards encountered during any form of surgical research are not limited to the factors discussed above, and when designing experiments and performing risk, assessments hazards such as those outlined below should also be considered: • Mechanical hazards include sharp injuries from the use of needles and scalpels, but can also include the ejection of materials during procedures, impact, crushing, friction and entanglement hazards. Correct use of equipment and their associated safety devices and the maintenance of equipment are important steps in controlling this type of hazard. • Environmental and individual hazards should always be factored in and include levels of noise and vibration as well as the impact of lighting, temperature, space, stress and working hours on the ability to perform the work safely. • The suitability of the workplace and workstation should be studied, looking at the possible hazards
22
Safety and Hazards in Surgical Research
from slips, trips and falls, falling or moving objects, manual handling procedures, easy access to equipment and ergonomic set-up. Accidents are much less-likely to occur when the set-up of the facility has been optimised. • Other hazards such as fire and electricity also need to be assessed; maintenance and testing of electrical equipment, provision if required of appropriate electrical safety cut-off devices as well as removing combustible materials, locking away flammable chemicals in fire cupboards and minimising the use of naked flames are all important in controlling this type of hazard.
22.10 Routes of Transmission for Chemicals, Biologicals and Radiation When looking at the hazards associated with a chemical, biological or radiation source, it is important to look at how this substance can cause harm to the body. The major routes of entry of a substance to a body are by ingestion or inhalation, through breaches in the skin defences, absorbed through intact skin or via the eye. Entry via the mouth can occur by eating, drinking or smoking in the laboratory, mouth pipetting or placing contaminated articles in the mouth and these practices should be prohibited in the research area and good levels of hygiene be maintained. Accidental splashes or aerosols into the mouth should be prevented using biological safety cabinets or shields during procedures likely to produce airborne droplets. Biological agents and chemicals can breach the skin by puncture wounds and it is important that sharps such as needles or scalpels are only used if absolutely required, and that any cuts or scratches to the skin are covered. A number of chemicals can either damage the skin or be absorbed through the skin, and care must be taken to ensure the correct types of gloves are used during handling these chemicals and they are regularly changed. Substances can also enter the eyes by splashes or as vapours or dust or by transfer by contaminated fingers. As with any process producing aerosols, dust or droplets precautions would include using the appropriate extraction cabinet such as a fume cupboard or biological safety cabinet and the wearing of appropriate safety glasses, goggles or facemasks. With ionising radiation, extra care must be taken as
279
these materials are easily spread, the monitoring of the environment and the body with the correct radiation counter as well as rigorous personal and laboratory hygiene are important measures to prevent transmission of this material.
22.11 Risk Assessment and Control Mechanisms Risk assessments are the cornerstone of a number of the health and safety regulations including COSHH, Management of Health and Safety in the Workplace and the Ionising Radiation Regulations. Moreover, these risk assessments should be “suitable and sufficient” and cover both employees and others who may be affected by the work. The assessments should identify the significant risks, prioritise the measures required to control these risks and should be appropriate to the nature of the work.
22.11.1 Five Steps to Risk Assessment The HSE have identified five key stages in the risk assessment process [4]; these are: 1. Identify the hazard – Only the significant hazards of the procedure should be identified and addressed in the assessment. Examples of possible hazards have been outlined in the proceeding paragraphs, but this list is not exhaustive, care must be taken to look at the whole situation of the research in question and address all the significant hazards encountered. 2. Identify who might be harmed by these hazards and how – The researcher and co-workers are the most obvious candidates who might be harmed, but domestic staff, members of the public and maintenance teams should also be considered. Young workers, trainees, new or expectant mothers may be more at risk than others. 3. Evaluate the risks and decide on the control measures (existing and additional) – The purpose of a risk assessment is to reduce the risk as much as possible. Risk levels, i.e. possibility of somebody being injured and severity of injury, are often rated high, medium or low, but more complicated quantitative risk assessment matrices are also available. Once
280
the risk level has been decided, informed decisions can be made on the control measures that can be implemented to reduce this level of risk; this is discussed in more detail below. 4. Record the significant findings – There are many possible layouts for a risk assessment; many institutes provide their own pro-formas and an example is also given in the booklet “five steps to risk assessment” – formats may vary with the complexity of the situation and the hazards identified. 5. Monitoring and review – Risk assessments should be monitored for the efficiency of the control measures and reviewed following incidents, when changes occur to the procedure and on a routine basis.
S. Prabhudesai and G. Roberts Mandatory:
Prohibition:
Authorised personnel only
No No No smoking drinking eating
Warning
Biohazard
22.11.2 Principles for Control Measures When considering the measures required to control a risk, a hierarchy of risk control should be applied. The first is to avoid risk, is it essential for this procedure or experiment to be performed; once this decision has been affirmed, the risks that cannot be avoided must be evaluated. The next stage is to look at the principle of combating the risks at source, looking at the materials, techniques or equipment to be used and determining if this is the safest option, are less hazardous materials available, or perhaps less hazardous forms of the substance, using synthetic tissues instead of animal, readymade solutions instead of powders, etc. Work methodology also needs to be investigated, issues such as minimising quantities used, stored and transported, limiting exposure to substances by excluding non-essential personnel from the risk and reducing exposure times. Engineering controls include adequate ventilation, the use of fume cupboards or the correct type of biological safety cabinet for the substances or procedures and are an essential tool in laboratory safety. The phrase “cleanliness is next to godliness” does not only apply in domestic situations; good housekeeping forms a cheap but effective part of controlling risks in a laboratory environment. Keeping the area clean, putting away clutter and material substantially reduces the risk of minor laboratory accidents such as trips or spills and prevents the unintentional contamination of individuals or materials with biohazards or chemicals. Even minor details such as emptying bins, washing of floors and routine chores have an impact on laboratory safety.
Safe First aid
First aid for this dept. is available from
Fire exit
Fig. 22.4 Common workplace health and safety signs
It is essential that safe systems of work are in place in any form of surgical research and are a requirement of the Health and Safety at Work Act. For hazardous procedures, a safe method for performing the activity must be written down, often in the form of a standard operating procedure, and should form part of research training. Training and information are both important tools for risk control. Training should include laboratory inductions, formal institution lead training courses and one-to-one on the job task specific training and these sessions should be documented. Information comes in many forms, including signs such as those shown in Fig. 22.4, posters, laboratory rules, standard operating procedures and the risk assessment itself. It is only once the above measures aimed at controlling the risk for the collective that personal protective equipment (PPE) plays a role. PPE is often regarded as the last line of defence, the major limitations of this type of protection are that it only protects the individual and not others nearby and that it relies on the equipment being used and used correctly. PPE is covered by the Personal Protective Equipment Regulations of 1992, which state that if PPE is required, the employer
22
Safety and Hazards in Surgical Research
must provide it free of charge to the user; it must be maintained and replaced if faulty. In turn, employees are required to use equipment that is provided, use it in the correct manner and notify the employers if it becomes faulty. PPE comes in many forms including gloves, safety glasses, ear protection, laboratory coats, shoes, etc. It is very important to ensure the item is suitable for its use and the institution safety or occupational health department should be able to advise on these issues. Minimum levels of PPE for a category 2 laboratory will include the mandatory use of latex or nitrile disposable gloves, laboratory coats and safety glasses. Other items available often include cryogenic and heat-resistant gloves, full face masks; molecular biology laboratories may also have UV-resistant face masks or special safety glasses for use with lasers. The provision of welfare facilities in the laboratory includes the provision of a separate hand washing facility, soap, paper towels, a hook to hang up the laboratory coats, as well as providing a separate area for rest, reasonable working temperatures and general welfare facilities. For the purpose of this chapter, first aid provisions should also be provided and must be suitable for the number and type of hazards involved; because of the potential for chemical or biological eye splashes, eye wash bottles or stations should be provided in addition to first aid boxes. Fire control measures such as fire extinguishers and blankets should be provided in consultation with the institution fire officer. All risk control measures, whether engineering or human, must be monitored and supervised. Safety equipment such as biological safety cabinets and items of PPE must be regularly checked, and the effectiveness of training, information and systems of work monitored. It is also important to monitor the attitude of researchers to health and safety in the laboratory; noncompliance with laboratory rules and horseplay are frequent causes of laboratory accidents and can have potentially serious consequences, both for the individuals concerned and the institution. Finally, the risk control measures in place should be periodically reviewed, taking note of any incidents or changes in procedure.
22.12 Waste Disposal Research can generate a wide variety of waste, like clinical and other biological material, chemical,
281
radiation and other wastes. The routes of disposal will vary slightly with the institution and so it is difficult here to provide specific waste guidelines. However, it must be stressed that waste disposal is an important aspect of research safety; lack of care in waste disposal can and do result in significant amount of accidents each year and can result in fines, litigations and enforcement agency prosecutions. Care must be taken of disposing any research material through the correct route and in the correct containers, and guidance should be sought through the institution safety office, estates managers or from hospital waste managers.
22.13 Health Surveillance Health surveillance is defined as systematically watching out for early signs of work-related ill health in employees exposed to certain health risks. Health surveillance is required for a number of substances including, but not limited to, using certain chemicals, human pathogens and when working with animals. Health surveillance may come in different forms ranging from questionnaires, physical examinations and medical tests, usually through the institutes occupational health service.
22.14 Reporting of Injuries, Diseases and Dangerous Occurrences Regulations 1995 These regulations require the reporting of specified accidents, ill health and dangerous occurrences to the enforcing authority, which in most cases for research will be the HSE. These include death, major injury and more than 3-day lost time accidents, etc. Major incidents require immediate notification, in most institutions in collaboration with the institution safety adviser. Others must be within 10 days. Reportable diseases include certain poisonings, some skin diseases, lung diseases, infections, such as leptospirosis, hepatitis, tuberculosis, legionellosis and tetanus [5]. It is also a requirement to keep records of incidents, in order to learn from them and prevent them in the future.
282
References 1. Health and Safety Executive website. Available at www.hse. gov.uk 2. Advisory Committee on Dangerous Pathogens, Approved list of biological agents. Available at http://hse.gov.uk/pubns/ misc208.pdf
S. Prabhudesai and G. Roberts 3. Alp E, Bijl D, Bleichrodt RP et al (2006) Surgical smoke and infection control. J Hosp Infect 62:1–5 4. Five steps to risk assessment. Available at http://www.hse. gov.uk/risk/fivesteps.htm 5. RIDDOR Guidelines. Available at http://www.hse.gov.uk/ riddor/index.htm
Fraud in Surgical Research – A Framework of Action Is Required
23
Conor J. Shields, Desmond C. Winter, and Patrick Broe
Abbreviations
Contents Abbreviations ..................................................................... 283 23.1
Introduction ............................................................ 284
23.2
History of Fraud ..................................................... 284
23.3
Prevalence of Fraud ............................................... 284
23.4
Reasons for Fraud .................................................. 284
23.5
Types of Fraud ........................................................ 285
23.5.1 23.5.2 23.5.3 23.5.4 23.5.5 23.5.6
Fabrication................................................................ Duplication ............................................................... Plagiarism ................................................................. Authorship ................................................................ Impact Factors and Misconduct ............................... Conflicts of Interest ..................................................
23.6
Managing Research Misconduct ........................... 288
23.7
A Framework of Action ......................................... 289
23.8
Conclusion............................................................... 290
285 285 286 286 287 288
References ........................................................................... 292
C. J. Shields () Department of Surgery, Mater Misericordiae University Hospital, Eccles Street, Dublin, Ireland e-mail: [email protected]
BMJ COPE FDA IF JAMA ORI WADA
British Medical Journal Committee on Publication Ethics Food and Drug Administration Impact factor Journal of the American Medical Association Office of Research Integrity World Anti-Doping Agency
Abstract Fraud in science has a long history, with some noteworthy and seminal publications lately scrutinized because of discrepancies suspected of being fraudulent in nature. Scientific misconduct can take many forms; however, all imply a violation of the code of ethical scholarly conduct. It incorporates fabrication, falsification, plagiarism, redundant publication, misrepresenting of data, undisclosed conflicts of interest, unethical research, and misappropriation of research funds. Estimates of the prevalence of misconduct are alarming. The emergence of scientific fraud has huge implications for how researchers, clinicians, colleges, and journals conduct business. The system of peer review, employed by all reputable journals, attempts to certify the scientific validity of a submitted manuscript, but, perhaps controversially, may not be ideally placed to determine research fraud. To combat fraud in research effectively, there needs to be a harmonized international strategy that combines and coordinates the resources of journals, funding bodies, and national scientific bodies.
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_23, © Springer-Verlag Berlin Heidelberg 2010
283
284
23.1 Introduction Following the recent highly publicized fall from grace of two eminent biomedical researchers, the public’s unquestioning acceptance of scientific probity is under threat [18]. Faith in both science and scientists has defined Western civilization since the Age of Enlightenment. “The rapid progress true science now makes occasions my regretting sometimes that I was born too soon” proclaimed Benjamin Franklin (letter to Joseph Priestly, 1780). The concept that truths about the natural world can be perceived through deduction and observation, and not necessarily through superstition and religion, elevated men of scientific method to the status of prophets. Now, recent events have cast doubt on science’s claim to reveal the truth.
23.2 History of Fraud Fraud in science has a long history, with some noteworthy and seminal publications lately scrutinized because of discrepancies suspected of being fraudulent in nature. Most prominent among these are the accusations (now accepted) that the 1912 finding of a skull and jaw by Charles Darwin represented an archaeological hoax. “Piltdown Man” was believed to be the evolutionary “missing link,” until it became impossible to reconcile it with subsequent fossil finds, rendering it an aberration which was finally exposed as an elaborate fraud in 1953. The identity of the fraudster has not been conclusively proved; however, Darwin himself, as well as Pierre Teilhard de Chardin, have been implicated by many authorities. The work of Gregor Mendel, the Augustinian priest and the “father of modern genetics,” has long been viewed as suspicious by statisticians, as many of his results were implausibly close to those expected. As the majority of his work is reproducible, this may perhaps be explained by confirmation bias, rather than by deception.
23.3 Prevalence of Fraud Scientific misconduct can take many forms; however, all imply a violation of the code of ethical scholarly conduct. It incorporates fabrication, falsification,
C. J. Shields et al.
plagiarism, redundant publication, misrepresenting of data, undisclosed conflicts of interest, unethical research, and misappropriation of research funds. Estimates of the prevalence of misconduct are alarming; a survey of newly appointed British consultants revealed that 55.7% had observed research misconduct, while 5.7% admitted to engaging in unethical activities [16]. Thirty-three percent of scientists surveyed by the journal Nature admitted to at least one instance of misconduct [25]. Despite that, Claxton has estimated the incidence of fraudulent papers at only 0.018% per annum, employing US Food and Drug Administration (FDA) audits and notices of retraction [8]. A previous editor-in-chief of the American Journal of Roentgenology stated that he believed duplicate publication to be rare, but acknowledged that this may simply be a reflection of it seldom coming to attention [19]. In 1992, the US Committee on Science, Engineering, and Public Policy found that the number of confirmed scientific misconduct cases was low, but conceded that underreporting may be significant. While these statistics are undoubtedly consoling (as Claxton states, even if their figures are out tenfold, only 0.2% of papers contain fraudulent data), the impact of the revelation of fraud causes untold damage to the reputation of medical science. The emergence of scientific fraud has huge implications for how researchers, clinicians, colleges, and journals conduct business. The system of peer review, employed by all reputable journals, attempts to certify the scientific validity of a submitted manuscript, but, perhaps controversially, may not be ideally placed to determine research fraud, except by serendipity. The success of the concept that one should submit one’s work to the judgment of peers is predicated upon honesty and disclosure, both on the part of the investigators and also the reviewers.
23.4 Reasons for Fraud What are the impulses that drive researchers to commit fraud? Research misconduct is not confined to the junior ranks of researchers, but can be manifest throughout the chain of responsibility, to include the senior author. Career progression is inextricably linked with the reporting of results. Those who publish regularly, and particularly in journals of repute, can expect to see
23
Fraud in Surgical Research – A Framework of Action Is Required
their careers assume an upward trajectory, and will gain the respect of their peers. In a world where science, medicine, and industry are closely associated, the award of a prestigious and lucrative research grant will not only advance the grant writer, but may also ensure the future employment of other laboratory personnel. However, the intrusion of the money of the marketplace into universities and laboratories is frequently accompanied by the ethics of the marketplace. For short-term researchers, career progression often hinges on securing first author publications. Given the time constraints, the junior researcher may be tempted to compromise methodology. In extreme cases, this may take the form of data fabrication; however, falsification and inappropriate data manipulation, duplicate publication, or “salami slicing” of results are more prevalent. The graduate student may also feel that advancement is contingent upon securing the blessing of the principal investigator, and in a misguided attempt to please their seniors, may influence data [9]. For more senior researchers, a desire to earn the respect of their peers, to enhance scientific credibility, or simply a belief in one’s theory, despite an inability to provide reproducible evidence, may seduce. With the increasing reliance on cumulative impact factor (IF) of publications to determine promotions [1], financial incentive has now been added to that of achieving professional preeminence. In this chapter we propose to examine the various forms of research misconduct, and suggest methods by which it may be detected and deterred.
23.5 Types of Fraud 23.5.1 Fabrication Fabrication of data is the most serious form of research misconduct (see Box 1). It includes reporting experiments which were never conducted, and the generation of data sets, but may also extend to the omission of inconvenient data sets, though the latter may be classified as “misrepresentation of data.” Fabrication of data is the most serious charge that can be leveled at a researcher, and should it be proved, tends to result in ostracism from the profession, forfeiture of grant funding, and in some instances, criminal charges. It has particular relevance in the context of biomedical research,
285
as false data may lead to the adoption of erroneous clinical methods. Furthermore, this practice can insidiously subvert the elucidation of best practice by inclusion in future systematic and metaanalyses.
23.5.2 Duplication Duplicate, or redundant, publication occurs when work, or a substantial part of it, has already been published, without this being made explicitly clear (see Box 2). Apart from distorting the scientific record by overemphasizing the importance of a single data set, in certain circumstances, it may even infringe copyright. Duplicate publication has long been recognized as a problem [2]; however, estimates of its current incidence are alarming. A study by Schein and Paladugu in 2001 suggested that as many as one in six original articles in leading surgical journals displayed some form of redundancy [20]. Redundant publications almost invariably demonstrate attempts at deception: the authors rarely site the preexisting publication, or cite it in an unrelated context, the list of authors is altered, and the text may even contain references to “other groups” when referring to the authors themselves. The duplicate papers are frequently in close temporal proximity, implying that they were submitted to journals almost simultaneously or at least that each journal editor could have no knowledge of the impending publication of an almost identical study elsewhere. These characteristics of duplicate publications reveal their deceitful nature. The International Committee of Medical Journal Editors has noted that it may be appropriate to re-publish an article that was originally published in a non-English language, or local, journal. This argument may perhaps be extended to articles which span various disciplines, and may have a number of distinct audiences. However, this should be acknowledged explicitly within the text and the accompanying footnote, and should be made clear to the referencing and indexing bodies. This argument is, of course, weakened by the widespread dissemination of journals via the Internet, resulting in fewer barriers in journal access between specialties. An indiscretion related to duplication is “salami slicing,” the act of fragmenting a single study into numerous publications, each containing the smallest possible set of data. This may entail redundancy, as both data
286
and text tend to be recycled. While impacting upon the coherence of any scientific statement, it also has a measurable cost – that of significantly increasing the number of papers requiring peer review. The drive to “salami slice” stems from the researchers’ desire to augment their bibliography, as the quantity of output is weighted heavily in the “publish or perish” environment. There is a distinction between redundant publication and “salami slicing”. In the interests of coherence and brevity, a journal editor may suggest that a manuscript be split into several constituents. The authors may then feel entitled to submit the excluded data as a separate paper. It is considered that a two-thirds overlap in the population studied constitutes duplicate publication. “Salami slicing” is deemed to have been occurred when papers answer the same or very similar questions, using almost identical methods. Therefore, a study addressing two or more questions may be legitimately split; however, splitting on the basis of outcomes would represent misconduct. As in most instances of research misconduct, journals tend to rely on authors and institutions to disclose possible duplication. Indeed, authors frequently have to sign an attestation that their paper has not previously been published, and also a transfer of copyright form. Despite this, duplication is a significant issue in modern publishing. Schein and Paladugu propose an electronic screening process for each paper, which is certainly feasible in this era of electronic publishing and indexing [20]. It still leaves unresolved the issue of simultaneous submission, which would continue to result in redundancy only being detected following publication. Given the demands that redundancy places on the editorial and reviewing communities, and the potential to influence the scientific record, it should be viewed as a particularly serious form of research misconduct.
23.5.3 Plagiarism Plagiarism can be defined as the false attribution of intellectual property, and is regarded as a serious offense (see Box 3). Despite associated opprobrium, it occurs commonly, and is seen in many diverse environments, including schools, journalism, literature, and research. Individuals as different as Brahms, Dr. Martin Luther King, and Vladimir Putin have been accused of
C. J. Shields et al.
plagiarism. In the world of literature, plagiarism may even be occasionally viewed as a virtue; Shakespeare, Wilde, and Eliot can claim to have improved upon their predecessors. In Eliot’s words, “… good poets make it into something better, or at least something different”. Plagiarism in the realm of research, though, is a far more base activity. At its most unrefined, plagiarism in surgical publications consists of verbatim transcription of passages from a preexisting publication without due recognition or citation. However, it can occur at any stage of the research process, and in a far more insidious and subtle manner; misappropriation of others’ research ideas, whether published or not, may also be classed as plagiarism. Declaring plagiarism when the same author has scripted both manuscripts (“self-plagiarism”) is more difficult. Indeed, if plagiarism is theft of intellectual property, is self-plagiarism not oxymoronic? This issue frequently arises when a researcher authors a short report, then a full manuscript, and subsequently a book chapter on an area of obvious expertise. At what level of text duplication does self-plagiarism become unethical? This is especially relevant in highly technical passages, as would typically be found in “Materials and Methods” sections. Many journals would regard selfplagiarism in this instance as a minor infraction. As in most issues of research misconduct, much guilt can be assuaged by full disclosure. Permission of the copyright holder may also be required. It hardly needs stating that the detection of plagiarism is frequently serendipitous; there exist no formal mechanisms for detecting language or concept theft within the scientific realm.
23.5.4 Authorship There are few issues as divisive in research laboratories as authorship on forthcoming manuscripts, a lesson which is frequently learnt bitterly by the junior researcher. There is no current accepted definition of what constitutes eligibility to be described as an author; however, it is reasonable to state that having one’s name appended to a paper denotes responsibility for some or all of hypothesis generation, performance of the study, manuscript writing, and approval of the final manuscript. As Hewitt elegantly stated “Authorship cannot be conferred; it may be undertaken by one who will shoulder the responsibility that goes with it” [17].
23
Fraud in Surgical Research – A Framework of Action Is Required
It has been suggested that each coauthor should be able to assume responsibility for the entire paper. While this seems a laudable ideal, one encounters difficulty with this definition when a study transcends specialty boundaries. Alternatively, if simple contribution alone is sufficient, would this not lead to names appearing on manuscripts for legacy reasons, engendering an authorship system akin to patent protection? The number of authors cited on papers has increased dramatically recently, raising the specter of honorary authors [15]. A narrow focus on citation quantity has increased the temptation for all to inflate bibliographies, leading to the inappropriate inclusion of individuals who at best had a tangential involvement in the preparation of the manuscript. As peer-reviewed publication is the currency of the academic world, this temptation is unlikely to dissipate in the foreseeable future. The problems with “gift” or honorary authorship are all too apparent. It may be defined as a process to maximize individual credit, while minimizing responsibility [5]. There have been a number of high profile academic casualties, who have conferred their names upon manuscripts requiring a credibility boost, only to find themselves subsequently embroiled in misconduct investigations when doubts about the veracity of the data surfaced. Authorship is a fraught topic, and is one that can ill afford to be left as an afterthought. It is generally accepted that it is the responsibility of the primary author to determine authorship. As this individual is frequently the most junior member of the research team, they may find themselves in an invidious position, prey to department politics. In cases of dispute over credit, it may be appropriate for the senior author to undertake the final determination. It is noteworthy that in the physical sciences, an alphabetical listing of authors is frequently encountered.
23.5.5 Impact Factors and Misconduct The IF was developed as an objective, quantitative assessment of the significance of published research. It has evolved since then to represent a measure of worth of a scientific career, and an independent prognostic factor and determinant of academic promotion. It is even employed to determine researchers’ annual
287
financial bonuses in South Korea and China, among others [1]. However, the concept of the individual and journal IF is not without its critics, and a compelling argument can be made that it simply reflects popularity, not prestige [6]. It is also accepted that IFs may be manipulated by unscrupulous authors and editors. The journal IF is defined as a ratio of the mean number of citations of the journal’s papers that occurred in 1 year to the number of articles published by that journal over the preceding 2 years. The obvious flaw in this calculation is that no account is taken of the origin of the citations. Is the paper being referenced by undergraduate students, or by acknowledged world experts? Has the reference originated in a prestigious journal, or in a less discerning publication? Have the authors simply referenced their own work? Furthermore, consideration of whether the article being cited contains original research, or is a review article is not undertaken. Hence, the top four ranking journals in 2003 were review journals (Annu Rev Immunol, Annu Rev Biochem, Physiol Rev, Nat Rev Mol Cell Biol). This is simply a reflection of the citation habits of most researchers, who find review articles a more convenient source than the original material. If the prestige of the citing journal is factored into an equation of impact, based on Google’s PageRank algorithm, review journals fall out of the top ten, and the top four journals become Nature, J Biol Chem, Science, and Proc Natl Acad Sci USA, which correlates with what many in the scientific community would intuitively sense to be the journals with “real” impact [6]. Nevertheless, individual and journal IFs remain significant arbiters of career progression and a measure of the success of editorial policy, respectively. Attempts have been made by both individual authors and journals to artificially boost their IF. Some journals suggest preferential citation of articles published within that journal, with the implication that acceptance for publication is contingent upon agreement of the authors. Alternatively, some editors may write “annual reviews,” discussing issues of scientific relevance by reference to articles from their own journal. Some authors, frequently with merit, reference their own work in subsequent papers. In the case of an original multi-author paper, this can lead to multiple citations from the same author group. The point at which these practices become unethical is difficult to define. Both authors and editors can justifiably claim that they are simply playing by the rules of the game, and the rules are transparent.
288
Despite the original intent of the IF, the reduction of scientific relevance to a simple figure has created a new tyranny no more equitable or reflective of true scientific value than that which preceded it – subjective assessment.
23.5.6 Conflicts of Interest A conflict of interest exists when a reasonable reader may infer that an author’s (or editor’s) secondary interests may have unduly influenced their opinion on the interpretation or validity of a study. Conflicts of interest may be classified as personal, commercial, political, academic, or financial. The majority of journals have strict policies on disclosure of all potential conflicts by all authors, and should a decision be taken to proceed to publication, the editor has a duty to inform the readership of these conflicts. The modern evolution of commercial ties between medicine and business makes the issue of conflict disclosure even more pertinent. In a recent study, Bhargava et al. noted a potential conflict of interest in up to 13% of original articles, and up to 33% of editorials in publications related to gastroenterology [4]. Disclosure of all funding sources and the financial relationships and arrangements of all authors is imperative to maintain trust in the published literature. The prevailing ideologies of publishers and governments can also influence decision-making; three prominent editors of leading medical journals have been dismissed in the past decade (Hoey, Lundberg, and Kassirer of the Canadian Medical Association Journal, JAMA, and the New England Journal of Medicine, respectively), following political conflict with the journals’ publishers. The FDA declined to accept the recommendation of its own scientific advisory board when denying over-the-counter availability of postcoital contraceptives in 2006, a decision which was taken as evidence of the influence of the Bush administration on FDA policy [24].
23.6 Managing Research Misconduct Aspersions cast upon the integrity of science harm all involved in academic pursuits. It behooves all to be alert to the possibility of fraud, in order to preserve science’s
C. J. Shields et al.
intrinsic claim to truth. Due to several high profile cases discussed elsewhere in the chapter, there is increased media scrutiny and skepticism of medical claims. Unless the research community is seen to address the issues of fraud and misconduct in a rigorous and transparent manner, the efforts of those who strive to advance medical knowledge will be undermined. Belatedly, this issue is receiving attention at national level in a number of countries, with the establishment of regulatory bodies; in the US, the Office of Research Integrity (ORI), in the UK, the Panel for Research Integrity in Health and Biomedical Sciences, and in Denmark, the Danish Committees on Scientific Dishonesty, among others. These agencies are concerned with issuing guidelines to researchers, editors, and funding bodies, with those established on a statutory basis also empowered to investigate allegations of fraud. However, while the work that these bodies undertake is laudable, it may be viewed as a rather fragmented attempt to deal with a problem that is truly international. Furthermore, given the large amount of stakeholders within research, it is difficult to decide who has the responsibility for detecting, investigating, and prosecuting fraud. The interested parties include the individual laboratory, the university, the funding body, the journal that receives the manuscript, and the national research advisory and regulatory bodies. Editors frequently find themselves in the firing line when fraud is uncovered. However, it is becoming increasingly clear that journal editors and reviewers cannot be charged with uncovering misconduct, though most exposures originate from the observations of academics. An astute reviewer may recall reading a similar article elsewhere, thereby uncovering duplication, or an eagle-eyed editor may suspect image manipulation in a manuscript, but it is almost certain that the majority of scientific fraud passes undetected. As Richard Horton, editor of The Lancet, said in response to the Sudbø scandal, “short of me flying to Oslo and checking out every entry on the computer, there really is no way for me to detect fraud” [18]. This significantly compromises attempts to estimate the degree of misconduct, and to examine whether research ethics implementation bodies have any impact. Even if an editor does suspect misconduct, there is little that can be done, other than refusing to publish or issuing a retraction of previously published work. The editor may draw the matter to the attention of the appropriate funding body or research institution, but
23
Fraud in Surgical Research – A Framework of Action Is Required
there is neither an agreed protocol of action, nor an obligation on the body receiving the complaint to act. As most journals represent the output of small academic societies, staffed by enthusiasts, they have neither the resources nor expertise to deal adequately with the problem. Even the larger, prestigious journals, admit that they struggle: The degree I hold is an MD, not an MDeity; I have no ability to know what is in the minds, hearts, or souls of authors. Furthermore, I do not have, nor desire to have, the resources of law enforcement agencies… [10]
Some journals have recently begun to employ more robust and objective techniques to identify deception. These include computer analysis of submitted photomicrographs to detect image manipulation, and referral of certain manuscripts to forensic statisticians, in an attempt to detect manipulation of data (see Box 4). However, it needs hardly be stated that these are extremely laborious and time-consuming techniques, and lie outside the capabilities and finances of many journals. What are a journal’s duties if fraud is suspected? The Committee on Publication Ethics (COPE) was founded in 1997 “to provide a sounding board for editors who were struggling with how best to deal with possible breaches in research and publication ethics.” Editors are encouraged to submit possible cases of misconduct, which are discussed at quarterly meetings, and advice is offered. Frequently, the authors are asked to provide clarification of issues the editor feels are contentious. If a satisfactory response is not forthcoming, the editor is oft times advised to contact the authors’ employer. In the case of published work, the issue of amending the scientific record arises. If doubts about the manuscript cannot be dispelled, the editor may request the authors to retract the paper. If the authors decline, the editor is ethically obliged to issue a retraction notice. In all cases, the exact reason for the retraction must be quoted, preferably using the language of the investigating committee, if one existed. However, this is frequently an unsatisfactory response – papers continue to be cited after retraction without reference to the fraud that has been perpetrated [14]. It is significantly easier to disseminate news of a retraction if the article has been published subsequent to the rise of electronic publishing, with the facility to mark post factum an article as retracted, and to link to the editorial notice of retraction.
289
Following an attempt at deception or actual misconduct, an editor may choose to place the authors, or the institution, on a publication “blacklist” for a prescribed period. While superficially appealing, a publication ban only extends to the specific journal or journal group, and simply encourages submission elsewhere. Undoubtedly, the most effective point at which to interrupt fraud is in the originating laboratory, through careful supervision of junior researchers, and in-house peer review of data. The conclusions of the inquiry into Jon Sudbø’s deception criticized his supervisor, the Radium Hospital, and the University of Oslo for “a lack of preliminary control and organization with a view to the researcher’s PhD project”. The hospital was also criticized for “a lack of training and consciousness-raising in respect of the researcher and other employees with a view to the rules for handling patient material, preliminary assessments of research projects and authorship” [11]. Regrettably, cases of possible fraud or misconduct are not always investigated with enthusiasm by thirdlevel institutions or research bodies, presumably due to a fear of stirring up a hornet’s nest, and diminishing the standing of the institution in question [7, 14].
23.7 A Framework of Action One may regard scientific fraud as being akin to drugs in sport. Drugs confer an unfair advantage on certain competitors, enabling them to outstrip those who toil honestly. The shock and revulsion that accompanies the revelation that a successful athlete has been cheating is comparable to the feelings aroused in the scientific community by the recent high profile cases of fraud. The world of athletics has enacted firm and resolute policies to combat the scourge of cheating, many of which biomedical research agencies could usefully employ (See Box 5). The establishment of an international body for the investigation of research misconduct, to which national research bodies and journals would subscribe, would facilitate a cohesive and effective response to fraud. The creation of this coalition interested in research propriety would solve many of the problems editors and individuals encounter when confronted by possible fraud, namely the lack of an international autonomous body, and an agreed protocol of action. The embryonic stages
290
of this are already apparent, within groups such as COPE and the World Association of Medical Editors. An agreed code of ethics should be formulated and adopted. This should clearly define and proscribe fabrication, falsification, duplication, and plagiarism, as the most serious forms of academic misconduct. A schedule of penalties for each should be defined, from a public reprimand to a life ban on involvement in research. An independent Court of Arbitration could ensure due process was followed. The existence of a unified editorial community would significantly increase the effective penalty of “blacklisting”; authors or institutions found guilty of misconduct would be limited to publishing in journals that choose not to subscribe to the agreed ethical code. It would also help to prevent editorial disagreements on article retraction; only five of ten articles identified in March 2005 by ORI as being definitely fraudulent had been retracted as of November 2005 [21]. Furthermore, the formulation of common policies for all journals concerning matters such as management of suspected misconduct, and independent verification of statistical methods and data interpretation for clinical trials prior to submission, would serve to increase statistical and scientific integrity without increasing workload or costs for the editorial team. To enhance detection of fraud, forensic methods should be made available to journals deemed to carry work of significant scientific or medical import. These methods would include both the software and the expertise to detect the hallmarks of image manipulation, such as an area of “blending” where two images have been apposed seamlessly to resemble a single image, or where bands have been removed from a gel blot. Now that electronic publishing is ubiquitous, plagiarism should become significantly easier to detect. Automated checks for verbatim passages, and more complicated search algorithms based on data, phrases, and images, could be reliably employed. To foster an environment that dissuades researchers from misconduct, national research integrity bodies should be charged with ensuring appropriate research governance. In a fashion similar to random drug tests, they should have the power to audit laboratories, ensuring that adequate supervision is provided for junior researchers and that research notebooks are properly maintained. Empowering journals to demand raw data from submissions deemed suspect, and a policy of verification of raw data from papers selected at random should aid both in detecting fraud and acting as a deterrent.
C. J. Shields et al.
Finally, the issue of career progression, and the application of the “publish or perish” aphorism to surgical researchers, should be addressed. Given the limited time that surgical trainees can spend conducting research, compared with their science counterparts, the temptation to engage in duplicitous practice is significant. For those who do not view biomedical research as a calling, another metric to determine advancement may be appropriate.
23.8 Conclusion Issues of research fraud are currently dealt with on an ad hoc, journal-specific, level. As a scientific community, we need to respond to the urgent need to protect the integrity of biomedical science. To combat fraud in research effectively, there needs to be a harmonized international strategy that combines and coordinates the resources of journals, funding bodies, and national scientific bodies. Box 1. Fabrication One of the most striking recent examples of research misconduct is the Sudbø case. In January 2006, Richard Horton, editor of The Lancet, received a disturbing communication from officials of The Radium Hospital, Norway, indicating that they had reason to believe that data published in The Lancet from their institution may be fraudulent. This set in train a series of events culminating in the retraction of 15 articles by Jon Sudbø, including the Lancet paper and papers published in The New England Journal of Medicine [22, 23], the rescinding of his doctorate by the University of Oslo, and the revocation of his license to practice medicine and dentistry. The Lancet paper, by Sudbø et al. from October 2005, purported to be a nested case–control study of 908 subjects showing that the use of nonsteroidal antiinflammatory drugs reduced the risk of oral cancer, based upon data from the Cohort of Norway database [23]. However, when the paper came to the attention of the director of epidemiology at the public health database, it emerged that the patient data were entirely fabricated. An independent Commission of Inquiry was established, and concluded that much of Sudbø’s work was invalid because of manipulation and fabrication of data [11].
23
Fraud in Surgical Research – A Framework of Action Is Required
This case reveals many characteristics of proven research misconduct: a long history of deception and attempted deceptions, a single author with exclusive access to the raw data, a lack of supervision from coauthors, the unknowing involvement in fraud of senior researchers, whose names were employed to lend weight to the manuscript, and the failure of peer review to detect a meticulously designed fraud.
Box 2. Redundant Publications Duplicate publication leads to the exaggeration of the importance of a study, and increases editorial and reviewing workload. Sometimes redundancy is easy to spot: Osti E (2006) Cutaneous burns treated with hydrogel (Burnshield) and a semipermeable adhesive film. Arch Surg 141:39– 42Osti E, Osti F (2004) Treatment of cutaneous burns with Burnshield (hydrogel) and a semi-permeable adhesive film. Ann Burns Fire Disasters 7:137–141While in other cases, attempts are made to mask it: Kim JH, Lee SH, Cho SW et al (2004) The quantitative analysis of mitochondrial DNA copy number in premature ovarian failure patients using the real-time polymerase chain reaction. Korean J Obstet Gynaecol 47:16–24Cha KY, Lee SH, Chung HM (2005) Quantification of mitochondrial DNA using real-time polymerase chain reaction in patients with premature ovarian failure. Fertil Steril 84:1712–8AndRoutsi C, GiamarellosBourboulis EJ, Antonopoulou A et al (2005) Does soluble triggering receptor expressed on myeloid cells-1 play any role in the pathogenesis of septic shock? Clin Exp Immunol 142:62–67 GiamarellosBourboulis EJ, Zakynthinos S, Baziaka F et al (2006) Soluble triggering receptor expressed on myeloid cells 1 as an anti-inflammatory mediator in sepsis. Intensive Care Med 32:237–243
Box 3. Plagiarism The ORI considers plagiarism “to include both the theft or misappropriation of intellectual property and the substantial unattributed textual copying of another’s work”. In The British Medical Journal (BMJ) in 2006, Chalmers describes uncovering that a paper published
291
in Acta Medica Iugoslavica in 1974 was an amalgam of the work of others contained in two papers from the Journal of Obstetrics and Gynaecology of the British Commonwealth, published in 1971 by Noble et al. and 1973 by Pearson et al. [7]. As the BMJ reviewer Jim Neilson observed “large parts of the text have been used verbatim, with little modification, and with no acknowledgment of the Pearson and Davies paper … The figures in the tables have been modified slightly from both original papers – so this is not only plagiarism, it is also scientific fraud.” Farndon and Buchler in the British Journal of Surgery 1999 describe a serendipitous discovery of plagiarism, when a manuscript was sent for review to a referee who noted his own text within the paper. The author of the offending manuscript claimed his “insufficiency of English” as an explanation [13]. These cases highlight the fact that the detection of plagiarism is frequently delayed and almost always fortuitous (Chalmers’ discovery was made while preparing a manuscript published in 1992, 18 years after the deception).
Box 4. Detecting Fraudulent Data Careful statistical analysis can reveal telling signs of data fabrication and manipulation. However, this requires the statistician to have access to the raw data, and an appreciation of the study design. In fabricating data, the researcher will frequently not create data which lie on the extremes in terms of values within the set, but rather will tend to include data which approximate the mean. While this may make detection more difficult, the human mind exhibits “digit preference,” rendering achieving appropriate variability in fabricated data difficult [18]. Further sophisticated methods, including examining relationships between apparently independent pairs of variables, may demonstrate anomalies: a correlation matrix may show where relationships are too weak or too strong for genuine data [3, 12]. Finally, forensic examination of graphs supplied with the manuscript may be more useful in highlighting incongruities than mathematical analysis of the data [12].
292
Box 5. The World Anti-Doping Agency The International Association of Athletics Federations has been to the forefront in combating the use of performance enhancing drugs in sport, through the establishment of, and cooperation with, various national and international agencies, including the World Anti-Doping Agency (WADA). This has resulted in some remarkable successes, and has restored the public’s faith in athletics as an ideal. While acknowledging that cheats will always exist, the world of athletics is striving to minimize their influence. The first aim of the WADA was to harmonize antidoping regulations. This led to an unprecedented international process of consultation to draft a consensus document: “On January 1, 2004, the Code came into force, and by the time of the Opening Ceremony of the Olympic Games in Athens, all Olympic international federations … and all 202 National Olympic Committees had accepted the code and undertaken to incorporate its provisions within their own rules.” (Richard Pound, Chairman of WADA, 2007) There are obvious lessons for the scientific community in how athletics has dealt with the unpleasant issue of cheating, namely by harmonizing anticheating regulations through international consensus followed by local enforcement.
Acknowledgments We thank Drs. A. Tarrant and V. Tarrant for their invaluable suggestions during the preparation of this chapter.
References 1. Al-Awqati Q (2007) Impact factors and prestige. Kidney Int 71:83–85 2. Anonymous (1969) Definition of “sole contribution” N Engl J Med 281:676–677 3. Bailey KR (1991) Detecting fabrication of data in a multicenter collaborative animal study. Control Clin Trials 12:741–752 4. Bhargava N, Qureshi J, Vakil N (2007) Funding source and conflict of interest disclosures by authors and editors in
C. J. Shields et al. gastroenterology specialty journals. Am J Gastroenterol 102:1146–1150 5. Biagioli M (1998) The instability of authorship: credit and responsibility in contemporary biomedicine. FASEB J 12:3–16 6. Bollen J, Rodriguez M, Van de Sompel H (2006) Journal status. Scientometrics 69:669–687 7. Chalmers I (2006) Role of systematic reviews in detecting plagiarism: case of Asim Kurjak. BMJ 333:594–596 8. Claxton LD (2005) Scientific authorship. Part 1. A window into scientific fraud? Mutat Res 589:17–30 9. Cyranoski D (2006) Your cheatin’ heart. Nat Med 12:490 10. DeAngelis CD (2006) The influence of money on medical science. JAMA 296:996–998 11. Ekbom A (2006) Summary of the commission of inquiry’s report. Rikshospitalet–Radiumhospitalet Medical Center, Oslo 12. Evans S (2001) Statistical aspects of the detection of fraud. In: Lock S, Wells F, Farthing MJ (eds) Fraud and misconduct in biomedical research. BMJ Books, London 13. Farndon JR, Buchler M (1999) Two articles for comparison. Article A: APACHE II score in massive upper gastrointestinal haemorrhage from peptic ulcer: prognostic value and potential clinical applications. Article B: APACHE II score in massive upper gastrointestinal haemorrhage from peptic ulcer. Br J Surg 86:598–599 14. Farthing MJ (2001) Retractions in Gut 10 years after publication. Gut 48:285–286 15. Flanagin A, Carey LA, Fontanarosa PB et al (1998) Prevalence of articles with honorary authors and ghost authors in peerreviewed medical journals. JAMA 280:222–224 16. Geggie D (2001) A survey of newly appointed consultants’ attitudes towards research fraud. J Med Ethics 27:344–346 17. Hewitt R (1957) The physician-writer’s book: tricks of the trade of medical writing. W.B. Saunders, Philadelphia 18. Marris E (2006) Should journals police scientific fraud? Nature 439:520–521 19. Rogers LF (1999) Duplicate publications: it’s not so much the duplicity as it is the deceit. AJR Am J Roentgenol 172:1–2 20. Schein M, Paladugu R (2001) Redundant surgical publications: tip of the iceberg? 129:655–661 21. Sox HC, Rennie D (2006) Research misconduct, retraction, and cleansing the medical literature: lessons from the Poehlman case. Ann Intern Med 144:609–613 22. Sudbo J, Kildal W, Risberg B et al (2001) DNA content as a prognostic marker in patients with oral leukoplakia. N Engl J Med 344:1270–1278 23. Sudbo J, Lee JJ, Lippman SM et al (2005) Non-steroidal anti-inflammatory drugs and the risk of oral cancer: a nested case-control study. Lancet 366:1359–1366 24. Triggle CR, Triggle DJ (2007) What is the future of peer review? Why is there fraud in science? Is plagiarism out of control? Why do scientists do bad things? Is it all a case of: “all that is necessary for the triumph of evil is that good men do nothing”? Vasc Health Risk Manag 3(1):39–53 25. Wadman M (2005) One in three scientists confesses to having sinned. Nature 435:718–719
A Framework Is Required to Reduce Publication Bias The Academic Surgeon’s View
24
Ronnie Tung-Ping Poon and John Wong
Contents 24.1
Introduction ............................................................ 293
24.2
Evidence of Publication Bias ................................. 294
24.3
Sources of Publication Bias ................................... 294
24.4
Implications of Publication Bias ........................... 296
24.5
A Framework to Reduce Publication Bias and its Impact ................................................. 297
24.5.1 Reducing Submission Bias of the Investigators ....... 297 24.5.2 Reducing Reviewer and Editorial Selection Bias..... 300 25.5.3 Detecting Publication Bias in Metaanalysis ............. 302 25.6
Conclusions ............................................................. 302
References ........................................................................
303
Abstract Publication bias refers to the tendency of researchers, reviewers, and editors to submit or accept manuscripts for publication based on the direction or strength of the study, leading to a bias in selective publication of studies with positive outcomes. The existence of publication bias in surgical literature has been well-documented. Publication bias leads to misleading conclusion in metaanalysis of clinical trials as negative studies are underrepresented. As a result, inappropriate investigations or treatments may be recommended to patients. To reduce publication bias in surgical literature, a framework of measures directed at individual investigators, institutions, reviewers, and editors of journals are outlined in this chapter. Compulsory registration of all clinical trials in public registry and online open access journals for publication of all trials, irrespective of positive or negative results, are particularly important measures to reduce publication bias.
24.1 Introduction
R. Tung-Ping Poon () Department of Surgery, Queen Mary Hospital, 102 Pokfulam Road, Hong Kong, China e-mail: [email protected]
The past two decades have witnessed the growth of evidence-based medicine, with the physicians at the forefront in testing various new medications through prospective clinical trials. Surgeons have been lagging behind in evidence-based medicine for a variety of reasons. This may be partly related to the surgical personality and partly related to the nature of surgical treatments, which are more difficult to conduct and evaluate in the setting of randomized controlled trials. Most of the surgical literature consists of case series or retrospective comparative studies, and the number of surgical randomized controlled trials is still small [32]. The relative paucity of randomized trials of surgical treatment has led to a debate about whether this reflects
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_24, © Springer-Verlag Berlin Heidelberg 2010
293
294
inadequate scientific education of the surgical community or special problems in applying this methodology in this discipline [20, 22]. Some of the problems related to randomized trials of surgical treatments include variation in surgical skills and expertise of participating surgeons, learning curve needed for surgical procedures, possible bias in patient selection, and difficulty in blinding. The surgical literature was also criticized for poor design of the randomized trials published, with inadequate power or validity to reliably assess a new surgical treatment in many published trials. Another important factor that further undermines the quality of evidence-based surgical literature is the existence of publication bias, which refers to the tendency on the parts of researchers, reviewers, and editors to submit or accept manuscript for publication based on the direction or strength of the study, leading to a bias in selective publication of studies with positive outcomes [10]. This is a problem that exists in all fields of medicine, but given the already scarce randomized controlled trials on surgical treatments, it may have a particularly important implication on decision-making in surgical treatments based on the literature. The need to improve surgical literature in the direction of evidence-based medicine has been duly emphasized by several academic surgeons recently [20, 28, 40]. While there are many aspects of surgical trials that need to be improved, reduction in publication bias is of prime importance. Before measures to reduce publication bias can be proposed and effectively implemented, it is pertinent to understand the sources of publication bias and its implications.
24.2 Evidence of Publication Bias The existence of publication bias in favor of studies with positive results has been well-documented in studies that followed up protocols approved by research ethics committees [13, 36], trials registered in data banks [33], or abstracts submitted to scientific meetings [4, 9]. A three-fold difference in publication of positive randomized trials compared with negative randomized trials has been reported in one study [36]. Studies with positive results are also more likely to lead to a greater number of publications and to be published in journals with high impact factor [13]. Furthermore, positive studies are likely to be published
R. T.-P. Poon and J. Wong
more quickly than negative studies [25, 36]. In one cohort of studies reported to an institutional ethics committee, not only were negative studies less likely to be published, but negative studies that saw the light of publication did so only after considerable delay compared with positive studies (median: 8 vs. 4 years) [36]. Interestingly, it has also been shown that positive studies are more likely to be published in English journals compared with negative studies, and hence, have a higher chance to be included in review or metaanalysis based on English literature only [16]. The tendency toward publication bias appears to be greater with studies of small sample size and observational studies than with randomized trials [13], which may have a major implication for surgical literature composed of predominantly observational studies or randomized trials with small sample size. The presence of publication bias in surgical literature has been evaluated in a cross-sectional study of papers presented at the Canadian Association of Pediatric Surgeons and the American Pediatric Surgery Association [43]. The study found that the presence of statistically significant results was the only factor associated with successful publication on multivariate analysis (odds ratio, 3.3). In addition to bias in publication of trials with significant outcome, one cohort study using protocols and published results of randomized trials approved by ethics committees in Denmark suggested that 62% of trials had at least one primary outcome that was changed, introduced, or omitted [7]. The majority of trials had unreported outcomes, which would have been difficult to identify without access to the trial protocols. Worst of all, a questionnaire survey of investigators of the trials revealed that 86% of survey responders denied the existence of unreported outcomes despite clear evidence to the contrary. This suggests that contacting investigators for unreported outcomes or unpublished studies to compensate for the publication bias may not be reliable.
24.3 Sources of Publication Bias Publication bias may arise at any stage within the research and publication process. Investigators or authors can be the first source of publication bias, by not writing up or not submitting studies with negative results (submission bias). In one survey of authors of
24 A Framework Is Required to Reduce Publication Bias The Academic Surgeon’s View
published trials who had also participated in unpublished trials, the most common reason given by the authors for not publishing completed randomized trials was negative results [11]. While this bias may be sometimes a subjective judgment of the investigators that negative trials are not worthwhile publication, more likely it reflects the perception by authors that negative studies are less likely to be accepted by journals for publications or less likely to be published in journals of high impact factor, and hence not worthy of the effort of writing or submission. This lack of enthusiasm to submit negative studies with less chance of publication roots in the way research is organized and rewarded in academic institutions, where the number and impact factors of publications are the standard criteria for promotion. Under such a system, the wish bias of the academics tends to be strong and the motivation to have a positive study published quickly in high impact journals overrides their moral obligation to publish a negative study. This probably also explains the selective underreporting of negative outcomes in published articles on clinical trials, as negative outcomes are perceived to lower the chance of acceptance of the manuscripts for publication [7]. Another factor intrinsic to the surgical personality is the craving for excellent surgical outcomes, which may lead to selective publications of studies with favorable surgical outcomes. Unlike drug trials which test the treatment efficacy of drugs rather than the clinicians, surgical trials often evaluate outcomes of a certain procedure performed by the surgeons. Perioperative morbidity and mortality are the ultimate measures of the quality of surgeons’ surgical skills and perioperative management. Hence, surgeons are less likely to report studies with poor surgical outcomes that they perceive as a reflection of their own weakness. In fact, a recent review on outcome of surgery for unruptured intracranial aneurysms showed that studies with an excellent surgical outcome were more likely to be published than those with an average outcome [42]. Review of retrospective studies of unruptured aneurysms suggested that neurosurgeons had limited motivation to publish their results unless their combined mortality and morbidity was less than 10%. Finally, the funding source of a clinical trial can also influence the publication of the trial. Although a research sponsor has no direct jurisdiction on whether to publish a negative trial, the interest of the research sponsor may indirectly influence the enthusiasm of the investigators to publish negative trial findings.
295
Reviewers of papers submitted to the journals are the second possible source of bias. The study by Mahoney [26] is particularly illustrative of selection bias in the peer-review process. In that study, referees for a journal were randomly assigned to receive manuscripts with identical “Introduction” and “Methods” sections but varied in “Results” or “Discussion” sections, and they were asked to rate the manuscripts for five items: relevance, methods, data presentation, scientific contribution, and publication merit. Although studies with positive and negative results had identical “Methods” sections, referees rated the manuscripts with negative results lower in the quality of methods. Manuscripts with negative results were also scored lower in data presentation, scientific contribution, and publication merit. The editor is supposed to be the final gatekeeper on acceptance of papers with the least bias, yet there is also a tendency of editorial bias against negative studies [10, 33]. Because of the limited space available in journals, editors are likely to publish studies with favorable outcomes. There are no clear data on the relative contribution of investigators’ submission bias and editorial selection bias to the overall publication bias observed in the medical literature. One study based on the manuscripts submitted to Journal of American Medical Association showed an odds ratio of 1.30 for publishing studies with positive results compared with studies with negative results, but the editorial selection bias appeared to be small compared with the submission bias demonstrated for researchers [30]. Another element of editorial selection bias is that editors of journals with high impact factor may have a tendency to select positive studies for publication since they are more likely to be cited, and hence, contribute positively to future impact factor of the journal. As a result, negative studies tend to be published in journals with a lower impact factor and cited less frequently. This is demonstrated by the publications resulting from the Helsinski Heart Study, a trial on a cholesterollowering agent that was originally planned as a study with primary and secondary prevention arms [27]. The results of the primary prevention arm were interpreted as positive and were published in New England Journal of Medicine in 1987 [17]. The results of the secondary prevention arm were unfavorable. Although completed at the same time, this was not published until 1993 in Annals of Medicine, a journal with a much lower impact factor [18]. In the 3 years following publication
296
R. T.-P. Poon and J. Wong
of each, the former paper received 450 citations compared with 17 for the later paper [14]. In another study of publication bias of research projects approved by a Research Ethics Committee, positive studies were published in journals with a mean impact factor of 1.6, as compared to 0.9 for studies with negative results [13]. The extent of editorial selection bias in surgical journals remains unclear, but is likely to be similar or even worse than journals on internal medicine.
24.4
Implications of Publication Bias
The result of publication bias is that negative trials and series with poor outcomes are underrepresented in the surgical literature. This may result in inflation of apparent treatment effects when the published literature is subjected to traditional systemic review or metaanalysis [19]. Traditional systematic review is a systematic summary of the literature without further statistical analysis of the data, whereas in a metaanalysis, results of independent studies are pooled and analyzed statistically. The results of metaanalysis are often perceived as reliable and convincing, and are likely to have an impact on clinical practice of readers. However, in one study of 48 metaanalyses, about half had some indications of publication bias, and most metaanalyses do not consider the effect of publication bias on the results [37]. In addition to the selective publication of entire studies, outcome reporting bias may further increase the prevalence of spurious results and overestimate the efficacy of interventions. The worst scenario for patients, surgeons, and policy-makers is that ineffective or even harmful surgical treatments are promoted. It is also possible that expensive new interventions which are thought to be better than existing procedures are not truly superior. It has been estimated that selective underreporting of research is probably more widespread and more likely to adversely affect patient management than deliberate publication of falsified data [6]. The misleading conclusion or recommendation as a result of publication bias in metaanalyses or systemic reviews has important implications for patient care. In an early study of publication bias in cancer therapy based on a National Cancer Institute registry of clinical trials, a review of published trials indicated a better outcome for certain regimens than did a review of all data from published trials plus the data from trials that were
not published [33]. For example, published trials showed a statistically significant 16% survival benefit of one regimen over another for treatment of ovarian cancer, but the benefit dropped to 5% when all registered studies were included. Another study compared the results of unpublished and published trials on new therapies and found that only 14% of completed unpublished trials favored the new therapy compared to 55% of published trials [11]. The presence of publication bias in metaanalysis of surgical studies has also been well-documented [3, 41]. Many surgical studies involve evaluation of a new diagnostic or surgical procedure compared with the standard procedure. Publication bias is likely in the initial publications of a novel diagnostic or surgical procedure when the investigators wish to demonstrate superiority of the novel procedure over an established procedure. In general, initial evaluation of a newly introduced procedure tends to yield excellent results; subsequently wider application of the procedure and moderation of the initial enthusiasm tend to lead to more variable and less impressive results. This has been illustrated by a study that showed the accuracy of endoscopic ultrasound in staging rectal cancer has declined over time, with the lowest rate reported in more recent literature [21]. This inflated estimate of the capability of endoscopic ultrasound may lead to unrealistic expectations of clinicians and patients on this technology, and may affect the evaluation of its cost-effectiveness. Publication bias not only leads to inappropriate investigations or treatments given to patients, but it also leads to inappropriate information being given to patients. The operative mortality rates quoted to patients in informed consent prior to surgery are often based on outcome reported in the surgical literature, which however, may not represent the true risk of the procedure because of publication bias. Understating the risk of a surgical procedure can alter surgical decision-making for the surgeons as well as patients. One recent study evaluated the operative mortality after pancreatic resection in the literature as compared with the actual mortality rate based on a nationwide inpatient sample in the United States [38]. The study showed that the actual mortality rate based on the nationwide inpatient sample was 2.4-fold higher than the literature rate (7.6 vs. 3.2%). All literature-based series were published from academic medical centers, whereas 26.3% of pancreaticoduodenectomies in the nationwide inpatient service were performed at nonacademic medical centers, which had a mortality rate
24 A Framework Is Required to Reduce Publication Bias The Academic Surgeon’s View Fig. 24.1 A framework to reduce publication bias
297
Design of study Investigator Literature research, proper research methodology and sample size
Institution Approval of all clinical studies by institutional review board Education of investigatorson research ethics Clinical trial registration Study Investigator Adherence to ethics of conduct and publication of research
Institution Monitoring of publication status of all approved studies
Submission of study
Peer and Editorial Review
Reviewer and Editor Structured peer review process Editorial policy of publication standard regardless of direction of results Open access journals
Publication
of 11.4%. The actual mortality rate at academic medical centers in the nationwide inpatient sample (6.4%) was lower than nonacademic centers, but still higher than the literature-based rate. This study clearly demonstrated bias in surgical risk provided to patients in informed consent as a result of publication bias.
journals in recent years. Metaanalysis or systemic review has become the main source of literature that surgeons turn to in guiding their clinical practice. Hence, the issue of publication bias has assumed greater importance. Based on the possible sources of publication bias, a framework (Fig. 24.1) is proposed to reduce publication bias and its adverse impact on systemic review or metaanalysis in surgical literature.
24.5 A Framework to Reduce Publication Bias and its Impact
24.5.1 Reducing Submission Bias of the Investigators With the increasing emphasis on evidence-based medicine, there is more reliance on systemic examination of the surgical literature to resolve clinical questions in surgery. This is reflected by the increasing number of review articles or metaanalyses in major surgical
The main determinant of whether a clinical study will be published or not seems to be the investigator, for it is he who decides to become or not become an author.
298
In one study that followed up clinical trial protocols submitted to institutional review boards, only 6 of 124 studies not published were reported to have been rejected for publication, suggesting that publication bias originated primarily with investigators [12]. Other studies have also observed that most unpublished research has never been submitted to a journal for consideration of publication [13, 39]. Most of the time, investigators decide to undertake a certain clinical study with an intent to publish. However, there are several factors that subsequently lead to failure of the investigators to write up the study as a research article, or to submit a written article to a journal for consideration of publication. Studies surveying the reasons for not submitting a study to publication found several common reasons, including termination of studies prior to completion because of problems with patient recruitment, poor methodology, negative results, results considered not important enough, data controlled by sponsor, presence of other papers with similar findings, lack of time, and loss of interest [13, 39]. There is evidence that the tendency of publication bias is less in well-designed study with large sample size, and in randomized clinical trials compared with observational studies [13]. The prevention of underreporting of negative studies by investigators requires a comprehensive approach consisting of measures directed to individual investigator and the institution.
24.5.1.1 Education of Individual Investigator Prevention of publication bias should start with education and mentoring of new researchers and continuing education of established investigators in the ethical conduct of research and publication of results. It is pertinent to educate investigators of the equal importance of negative results and positive results provided that the studies are properly conducted. Many investigators who opt not to publish a negative study may feel that the contribution of a negative study to the literature is much less than a positive study. They do not understand the adverse effect of underreporting of negative studies on the overall evidence of a particular treatment based on the literature. It is also equally important to educate investigators that deliberate withholding of negative results of clinical trials from publication, either as a result of pressure from sponsors or their own intention, is considered an ethical misconduct in
R. T.-P. Poon and J. Wong
science [5]. Publication of research results, irrespective of whether they are positive or negative, should be a moral obligation of the researchers. Furthermore, the ethical requirement of scientific research also demands the investigators to publish all outcomes of a study rather than selective reporting of positive outcomes. Education of researchers on proper research methodology is important to enhance quality of studies and reduce the chance of studies with negative results not submitted because of poor methodology or problem in interpretation of the results. Proper research methodology should include thorough literature search for similar studies before embarking on the study, proper design of study especially in sample size estimation, and appropriate statistical methods for analysis of the data. In the academic world, the incentives to publish are generally great and any reasonably good work, negative or positive, can be published somewhere. In many cases of failed publication of negative studies, it is likely that the investigators do not submit their work for publication because they have become aware of serious limitations in the methodology after negative results are obtained. Negative studies present special difficulties in analysis and interpretation as they may merely reflect the lack of sufficient power in the study to observe a positive effect because the sample size is inadequate. In fact, one study has shown that unpublished trials were much smaller in sample size compared to those published [17]. By properly designing a study with adequate sample size based on a reasonable hypothesis, the uncertainty of whether negative results are due to inadequate statistical power or a true lack of difference between comparison arms can be avoided [34]. This will encourage investigators to submit a study for publication even if the results are negative. Investigators should also be encouraged to make better use of confidence intervals rather than relying on significance testing alone in interpreting the data [35].
24.5.1.2 Institutional Monitoring The education of new investigators on ethics and methods of clinical research can be done through mentoring by experienced investigators, but a more effective way is through an education program provided by the institution. In many academic institutions like the authors’ institution, a clinical trial center and an institutional review board are in place to help conduct clinical
24 A Framework Is Required to Reduce Publication Bias The Academic Surgeon’s View
studies and protect the safety of patients participating in clinical trials. The institution should provide assistance in medical statistics so that clinical studies can be designed properly to provide adequate statistical power. This will help avoid negative results because of inadequate sample size. In studies of surgical treatments, sample size tends to be underestimated because patient recruitment is usually slow compared to a study of drug therapy for medical conditions, and the investigators are not willing to spend a long period on the study. Furthermore, results of surgical treatment may change with time, and it may not be appropriate to conduct a study with patient recruitment over several years in a single center. In such a situation, the investigators should be encouraged to consider a multicenter study. One additional advantage of a large multicenter study is that it would be less susceptible to publication bias because the greater investment in resources and need for collaboration would make it more likely to be written up and submitted for publication irrespective of outcome. Many institutions are now providing investigators with training on Good Clinical Practices (GCP), a set of guidelines on ethical conduct of clinical trials that is now obligatory for academic research in many institutions [23]. However, most institutions are not providing training for investigators on ethics for reporting and publication of clinical trials. A number of guidelines and resources are available that could be used as a framework for education of clinical investigators [1, 2]. The institution can also play a critical role in the ethical conduct and publication of clinical trials by setting up regulations and guidelines for investigators. Currently, most institutional review boards do have guidelines for the ethical running of clinical trials, but no guidelines are provided regarding publication of the trials that have been reviewed by the institutional review boards. Guidelines on publications should include the followings. First, sponsors should be prohibited from making decisions about publication of results or in revising manuscripts. This will prevent sponsors’ influence on investigators not to submit negative studies, and avoid the opportunity of sponsors to selectively omit certain negative outcomes in a trial. In fact, a recent survey of 107 institutions showed that 85% of the institutions prohibited the sponsors from participation in the process of publication of industrysponsored trials [29]. Second, all investigators should be asked to declare any conflict of interest related to
299
the study. Avoidance of conflict of interest of investigators in industry-supported trials is important. An investigator with financial interest in a certain new therapy or surgical technology is more likely to withhold negative results from publication. Third, it should be mandatory for investigators to submit all completed studies, irrespective of positive or negative results, for publication. Many investigators consider that they have the sole right to decide to publish or not a study that they have performed. In this respect, it is important for the investigators to understand that patients who participate in a clinical study has a due right to expect that data derived from their participation will be published and disseminated to the medical community. Furthermore, the agency that provides the grant or the institution that provides the resources for a trial also has a right to expect that the data from a study that it supports are published so that the research money is not wasted. In fact, many grants come with a requirement that the study should be published in a journal. The institutional review board can take an even more aggressive approach by monitoring the publication status at the completion of the study. Currently most institutional review boards require investigators to submit a completion report, but are generally satisfied if the study has been completed without unexpected major harm to patients. Nothing is usually asked about the publication status of the study. As argued above, not submitting a completed study for publication should be considered an ethical misconduct in scientific research. When investigators undertake research involving humans, they take on public trust that is violated when the results are not disseminated through publication. Hence, institutional review boards should have a role in making sure that the investigators have made an effort to submit the data for publication, and evidence of submission for publication should be provided by the investigators to the institutional review board as part of the final completion report. Deliberate withholding of a negative study for publication should be considered an equally serious ethical misconduct as falsification of data that should be subjected to institutional reprimand and sanction. Of course, whether the study will be published or not eventually depends on whether the quality of the study is acceptable to medical journals for publication. However, through this mechanism, there will be at least equal attempt of submission of positive and negative studies, and submission bias by investigators could be minimized. Furthermore, by keeping a record
300
of the publication status of all studies reviewed by the institutional review board, it would be much easier to retrieve data regarding any studies that are not published, and such data are more reliable than questionnaire surveys of investigators that are commonly conducted to retrieve data of unpublished studies. The above institutional measures depend on registration of the clinical studies with the institutional review boards. In most institutions, prospective clinical trials have to be approved by the institutional review board, but retrospective studies are often undertaken without approval. This may be a particularly important issue with surgical studies, most of which are retrospective series or case-control studies. The decision to publish a retrospective study is likely to be more influenced by the results than a prospective study. Surgeons who look at the operative outcome of a procedure are likely to decide not to submit the study for publication when the operative mortality and morbidity results are not favorable. To solve this issue, the only way is to require approval of all clinical studies using human data, irrespective of prospective or retrospective nature of the study. By monitoring the publication status of both prospective and retrospective studies, it should be possible to lessen publication bias due to selective submission of studies.
24.5.2 Reducing Reviewer and Editorial Selection Bias 24.5.2.1 Standards of Publication One of the most important reasons for investigators to selectively submit studies with positive outcome is their belief that reviewers and editors of journals are more eager to embrace optimistic results and studies that show a new treatment being better than conventional treatments. The optimistic desire to come up with a treatment that is new and better is intrinsic to all investigators who are looking for medical “advances”, but it is important for the reviewers and editors as gatekeepers of journals to resist the influence of such optimistic desire in selection of studies for publication. Unfortunately, there is evidence that such desire prevails among reviewers and editors of high impact journals. In a simple survey of articles that reported controlled comparison of clinical therapies and
R. T.-P. Poon and J. Wong
published in four general medical journals with high impact factor (British Medical Journal, JAMA, The Lancet and the New England Journal of Medicine), 70% of the studies reported were positive [31]. Journal editors have admitted a potential bias against negative studies in manuscript evaluation [33]. The consequences of such editorial bias perceived by the investigators are not only avoidance of submission of negative studies, but also a tendency of investigators to dredge through their data until they get significant results and then alter their hypotheses to fit these results [31]. To solve this issue, it is important for journals to formalize editorial policy clearly stating that the standard of publication will be based on clinical importance, quality of research methodology, and logical reasoning in interpreting the results rather than the direction and strength of study results. This policy should be stated clearly in instruction to authors of the journals. It is also important for the reviewers to understand that they are invited to detect bias in the design of the research and preparation of a paper, not to add bias in recommending studies for publication. A clear guideline to the reviewer to avoid publication bias should be provided. In particular, the reviewers should be able to differentiate studies that have truly negative results and those with negative results due to inadequate statistical power. Rejecting the former will lead to publication bias in favor of positive results, whereas acceptance of the later may lead to misleading conclusion of the lack of efficacy of an otherwise effective treatment. It may not be practical to demand all manuscripts to be reviewed by a medical statistician, but particular attention should be paid to the statistics in evaluating studies with negative outcomes. A fairer and more formal peer-review process with structured review of originality, clinical importance, research methodology, and data analysis, rather than subjective judgment of the reviewers based on the results of the study, may also help to reduce publication bias. To draw the attention of readers to the importance of negative but properly conducted studies, the journal editors may consider a special section to published well-conducted negative studies. In fact, The Journal of American Medical Association once had a section entitled “Negative Results” that published short articles on negative studies in the 1960s, though the section was discontinued for unclear reason [4]. In recent years, several surgical journals such as Annals of Surgery and British Journal of Surgery have special
24 A Framework Is Required to Reduce Publication Bias The Academic Surgeon’s View
sections on prospective randomized trials that have helped to promote the importance of such trials. By the same token, it is possible to promote publication of negative studies in a special section of journals. Finally, the editors should conduct regular audit for any publication bias in manuscripts accepted for publication so that any bias can be detected and corrected.
24.5.2.2 Clinical Trial Registry Registration of all clinical trials prior to the start of the study is an important measure that can reduce publication bias, as it will allow accessibility of all studies with positive or negative results. Furthermore, by registration of the original study protocol and all subsequent amendments in a registry accessible to journal editors and the public, investigators are prevented from changing the hypothesis or design of the study to fit the results. Clinical trial registration will also reduce selective report of outcomes since all the primary and secondary outcomes will be registered in the protocol. Although registration of clinical trials was first proposed in the 1970s [33], only in recent years has there been a growing demand for mandatory registration of clinical trials as a policy of journals for publication in response to the problems of publication bias and underreporting of trial results [24, 35]. The implementation of web-based clinical trial registration in registries accessible to the public is made possible by the rapid development of internet in recent years. The demand for clinical trial registration initially grew from concerns of negative data being concealed in industry-sponsored trials [35]. The lack of public information about the existence of trials allows unfavorable results to be hidden, even if the data show that a marketed medication, device, or other intervention is useless or harmful. A clinical trial registry also serves to protect the interest of subjects participating in clinical trials as it help ensure dissemination of negative or even adverse results of a certain therapy to other investigators conducting a similar trial. In 2000, a webbased clinical trial registry became operational in the United States (http://www.ClinicalTrials.gov), and beginning in 2005, the International Committee of Medical Journal Editors required registration of trials in this registry or another qualified public registry for any study to be published in the journals [8]. Although the number of clinical trials being registered has
301
increased substantially since the implementation of this requirement by journals, there are still many gaps in the reporting [44]. Clinical trial registration will likely increase transparency in conducting and reporting clinical trials. However, its effectiveness depends on comprehensive registration of all clinical trials. Furthermore, current clinical trial registries only provide for registration of protocols of the clinical trials. To ensure that all positive and negative results of trials are accessible to the public and available for metaanalysis, there is a need for the development of database of trial results of all registered clinical trials. To implement this, an accepted format or process for providing summaries of trial results to the public has to be formulated, and preferably such data should undergo independent scientific review before being entered into such database.
25.5.2.3 Open Access Online Journals Theoretically, publication of all clinical trials irrespective of whether the study defends or rejects a treatment will eliminate any publication bias. However, this is impossible with the conventional journals because of limited paper space. The editors have to select a small proportion of a large number of submitted papers for publication, a process in which editorial selection bias is likely to creep in subtly. Recently, Internet has not only provided the opportunity for the development of international clinical trial registries, but it has also led to development of a new mode of publication of scientific research, i.e., open access electronic journals. With the current information technology, the space available in electronic journals is almost unlimited, allowing publication of more articles irrespective of direction of results. Furthermore, online journals can also significantly shorten the time from article submission to publication. Open access journals also allow more comprehensive publication of the full set of data from each clinical trial. Hence, open access publication is an attractive approach to overcome publication bias by improving access to the growing research output. There are examples of online publishers attempting to increase access to research results, such as BioMed Central which has an open access surgical journal called BMC Surgery (http://www.biomedcentral.com/ bmcsurg).
302
Recently, the Journal of Negative Results in Biomedicine (http://www.jnrbm.com) was launched to “receive papers on all aspects of unexpected, controversial, provocative and/or negative results to provide scientists and physicians with responsible and balanced information to support informed experimental and clinical decisions.” Open access publishing holds out promise for broadening and speeding the dissemination of scientific data from clinical studies. However, it is important that articles published in open access journals should have undergone the same degree of peer-review as conventional journals. Without some sort of quality control, flawed studies that are allowed to publish may be as distorting as any publication bias against negative results. This is one major advantage of open access journals for clinical studies over simple deposition of clinical trial data in open access database. Currently, the cost of publication charged by some of the open access journals may restrict the use of this new media for publication. However, it is likely that future improvement in cost-effectiveness of software for electronic journals may make publication in open access journals easily affordable or even free to investigators.
25.5.3 Detecting Publication Bias in Metaanalysis Given that publication bias is common, it is important that such publication bias is detected in systemic review or metaanalyses. To reduce the adverse effect of publication bias, properly conducted metaanalyses that entail pooling and statistical analysis of publications from the literature are preferable to traditional review of single-topic issues. In any metaanalysis, an attempt should be made to locate any unpublished studies, for example, through searching abstracts presented in medical conferences. However, retrieval of unpublished studies or unpublished data from trials requires a lot of effort and the process could be biased in itself as it depends on the goodwill of investigators to cooperate with the data collection. Furthermore, including data from studies that have not been independently reviewed by peers also carries the risk that the data may not have been derived from a properly designed and conducted trial.
R. T.-P. Poon and J. Wong
Since it is impossible to collect data from all clinical studies on a certain intervention, statisticians have developed statistical methods to detect and correct publication bias. One commonly used statistical method in metaanalyses is the funnel plot [45], which show sample sizes against the point estimate of treatment effectiveness generated in individual studies. Details of application of this test are available in the paper that originally described the method [15]. The principle of the test is quite simple and it is based on the assumption that larger studies are more likely to be published, whereas smaller trials may get published only if they report a significant result. Any difference between the large and small studies suggests publication bias in that some small studies have not been published owning to their nonsignificance. The idea of adjusting the results of metaanalyses for publication bias and imputing “fictional” studies into metaanalysis to fill the gap of missing studies is controversial [37]. The statistical methods are by nature indirect and exploratory, and are based on certain assumptions on the missing studies or data which can be difficult to verify in reality. Hence, this process of adjustment for publication bias by itself is subjected to the potential of bias. Metaanalysis allows surgeons to obtain evidencebased summaries more quickly than from primary studies, but the presence of publication bias and other serious problems leads to potential weakness in the evidence provided by such metaanalyses. One particular problem with surgical metaanalyses compared with metaanalyses on drug treatment originates from the fact that surgical outcomes vary substantially depending on the experience of the surgeons. Publication bias is particularly likely in surgical studies since poor surgical outcomes are less likely to be reported. While there are statistical methods to detect and correct publication bias, it is impossible to fully compensate for publication bias in metaanalysis. Hence, primary prevention of publication bias by the aforementioned proactive measures is much more important to ensure unbiased evidence being provided in metaanalysis to guide clinical practice.
25.6 Conclusions Studies in surgical literature have indicated significant publication bias in that clinical studies with positive or favorable results are more likely to be published than
24
A Framework Is Required to Reduce Publication Bias The Academic Surgeon’s View
those with nonsignificant or unfavorable results. Such publication bias appears to originate mainly from submission bias of investigators, but is further reinforced by selection bias of reviewers and editors. This may lead to misleading conclusion in metaanalyses of surgical treatments and wrong recommendation of treatments for patients. A framework is required to reduce publication bias, which should include education of investigator in ethics of conducting and publishing clinical studies, monitoring of publication status of all clinical studies approved by institution review boards, editorial policy of journals to accept manuscripts based on methodology rather than direction of results, registration of all clinical studies in clinical trial registry accessible to public, and open access publication of all clinical studies irrespective of positive or negative studies. However, it is unlikely that we can completely eliminate publication bias because in the pursuit of clinical research, researchers are motivated by the excitement of finding a new therapy that surpasses the existing therapies. Hence, there is always an element of bias of optimism in researchers, and so is there in reviewers, editors, and readers. However, it is imperative for academic surgeons to minimize publication bias as much as possible, so that such bias will not significantly mislead the results of metaanalysis, on which surgeons nowadays rely more and more for evidence-based surgical practice.
References 1. Altman DG, Schulz KF, Moher D, et al (2001) The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Intern Med 134:663–694 2. Benos DJ, Fabres J, Farmer J, Gutierrez JP et al (2005) Ethics and scientific publication. Adv Physiol Educ 29: 59–74 3. Bown MJ, Sutton AJ, Bell PR et al (2002) A meta-analysis of 50 years of ruptured abdominal aortic aneurysm repair. Br J Surg 89:714–730 4. Callaham ML, Wears RL, Weber EJ et al (1998) Positiveoutcome and other limitations in the outcome of research abstracts submitted to a scientific meeting. JAMA 280: 254–257 5. Chalmers I (1990) Underreporting research is scientific misconduct. JAMA 263:1405–1408 6. Chalmers I (1993) Publication bias. Lancet 342:1116 7. Chan AW, Hrobjartsson A, Haahr MT et al (2004) Empirical evidence for selective reporting of outcomes in randomized trials. Comparison of protocols to published articles. JAMA 291:2457–2465
303
8. De Angelis C, Drazen JM, Frizelle FA et al (2004) Clinical trial registration: a statement from the International Committee of Medical Journal Editors. Lancet 364: 911–912 9. De Bellefeuille C, Morrision C, Tannock I (1992) The fate of abstracts submitted to a cancer meeting: factors which influence presentation and subsequent publication. Ann Oncol 3: 187–191 10. Dickersin K (1990) The existence of publication bias and risk factors for its occurrence. JAMA 263:1385–1389 11. Dickersin K, Chan S, Chalmers TC et al (1987) Publication bias and clinical trials. Control Clin Trials 8:343–353 12. Dickerson K, Min Y, Meinert CL (1992) Factors influencing publication of research results. Follow-up of applications submitted to two institutional review boards. JAMA 267: 374–378 13. Easterbrook PJ, Berlin JA, Gopalan R et al (1991) Publication bias in clinical research. Lancet 337;867–872 14. Egger M, David Smith G et al (1998) Bias in location and selection of studies. BMJ 316:61–66 15. Egger M, David Smith G et al (1997) Bias in meta-analysis detected by a simple, graphical test. BMJ 315:629–634 16. Egger M, Zellweger-Zahner T, Schneider M et al (1997) Language bias in randomised controlled trials published in English and German. Lancet 350:326–329 17. Frick MH, Elo O, Haapa K, Heinonen OP et al (1987) Helsinski heart study: primary prevention trial with gemfibrozil in middle-aged men with dyslipidemia. N Engl J Med 317:1237–1245 18. Frick MH, Heinonen OP, Huttunen JK et al (1993) Efficacy of gemfibrozil in dyslipidemic subjects with suspected heart disease. An ancillary study in the Helsinki heart study frame population. Ann Med 25:41–45 19. Gardner MJ, Altman DG (1986) Confidence intervals rather than p values: estimation rather than hypothesis testing. Br Med J (Clin Res Ed) 292:746–750 20. Hall JC, Hall JL (2002) Randomisation in surgical trials. Surgery 132:513–518 21. Harewood GC (2005) Assessment of publication bias in the reporting of EUS performance in staging rectal cancer. Am J Gastroenterol 100:808–816 22. Horton R (1996) Surgical research or comic opera: questions, but few answers. Lancet 347:984–985 23. Jorgensen A, Bach KF, Friis K (2004) Good clinical practice is now obligatory in academic clinical drug research in the European Union. Basic Clin Pharmacol Toxicol 94:57–58 24. Krleza-Jeric K, Chan A, Dickersin K et al (2005) Principles for international registration of protocol information and results from human trials of health related interventions: Ottawa statement (part 1). BMJ 330:956–958 25. Krzyzanowska MK, Pintilie M, Tannock IF (2003) Factors associated with failure to publish large randomized trials presented at an oncology meeting. JAMA 290:495–501 26. Mahoney MJ (1977) Publication prejudices: an experimental study of confirmatory bias in the peer review system. Cognit Ther Res 1:161–175 27. Manninen V (1983) Clinical results with gemfibrozil and background to the Helsinski heart study. Am J Cardiol 5:35–38 28. Meakins JL (2006) Evidence-based surgery. Surg Clin N Am 86:1–16
304 29. Mello MM, Clarridge BR, Studdert DM (2005) Academic medical centers’ standards for clinical-trial agreements with industry. N Engl J Med 352:2202–2210 30. Olson CM, Rennie D, Cook D et al (2002) Publication bias in editorial decision making. JAMA 287:2825–2828 31. Rennie D, Flanagin A (1992) Publication bias. The triumph of hope over experience. JAMA 267:411–412 32. Sauerland S, Seiler CM (2005) Role of systemic reviews and meta-analysis in evidence-based medicine. World J Surg 29: 582–587 33. Sharp DW (1990) What can and should be done to reduce publication bias? The perspective of an editor. JAMA 263: 1390–1391 34. Simes RJ (1986) The case for an international registry of clinical trials. J Clin Oncol 4:1529–1541 35. Steinbrook R (2004) Public registration of clinical trials. N Engl J Med 22;351:315–317 36. Stern JM, Simes RJ (1997) Publication bias: evidence of delayed publication in a cohort of clinical research projects. 315:640–645 37. Sutton AJ, Duval SJ, Tweedie RL et al (2000) Empirical assessment of effect of publication bias on meta-analyses. BMJ 320:1574–1577
R. T.-P. Poon and J. Wong 38. Syin D, Woreta T, Chang DC et al (2007). Publication bias in surgery: implications for informed consent. J Surg Res 143: 88–93 39. Weber EJ, Callaham ML, Wears RL et al (1998) Unpublished research from a medical specialty meeting. Why investigators fail to publish. JAMA 280:257–259 40. Wells SA Jr (2001) Surgeons and surgical trials – why we must assume a leadership role. Surgery 132:519–520 41. Wente MN, Shrikhande SV, Müller MW et al (2007) Pancreaticojejunostomy versus pancreaticogastrostomy: systematic review and meta-analysis. Am J Surg 193: 171–183 42. Yoshimoto Y (2003) Publication bias in neurosurgery: lessens from series of unruptured aneurysms. Acta Neurochir (Wien) 145;45–48 43. Zamakhshary M, Abuznadah W, Zacny J et al (2006) Research publication in pediatric surgery: a cross-sectional study of papers presented at the Canadian Association of Pediatric Surgeons and the American Pediatric Surgery Association. J Pediatr Surg 41:1298–1301 44. Zarin DA, Tse T, Ide NC (2005) Trial Registration at ClinicalTrials.gov between May and October 2005. N Engl J Med 353:2779–2787
Data Collection, Database Development and Quality Control: Guidance for Clinical Research Studies
25
Daniel R. Leff, Richard E. Lovegrove, Lord Ara Darzi, and Thanos Athanasiou
Contents 25.1
Introduction .......................................................... 305
25.2
Data Collection ..................................................... 306
25.2.1 25.2.2 25.2.3
Documentation ....................................................... 306 Training and Certification ...................................... 308 Procedures.............................................................. 309
25.3
Clinical Research Audit ....................................... 310
25.4
Database systems.................................................. 310
25.4.1 25.4.2 25.4.3 25.4.4 25.4.5 25.4.6 25.4.7 25.4.8
Database Software ................................................. Database Design .................................................... Database Models .................................................... Good Database Design........................................... Graphical User Interface (GUI) ............................. Exporting Data for Analysis .................................. Quality Control and Data Integrity ........................ Data Security..........................................................
25.5
Conclusions ........................................................... 319
310 311 311 314 317 318 318 319
Abstract Data collection is fundamental to any clinical surgical research study. Without collecting reliable and accurate data, it is impossible to draw meaningful conclusions regarding the research question of interest. This chapter aims to provide an overview of how to plan generic data collection and data storage, protocols, training and analysis that promote accurate and reliable data collection. Additionally, the fundamentals of database models and storage are covered to aid the novice researcher. A wealth of more detailed database design information and administration is available elsewhere for those who wish to pursue more complex designs. Throughout this chapter, we illustrate principles explained using clinically relevant examples.
25.1 Introduction
References ........................................................................... 319 Further Reading ................................................................. 320
D. R. Leff () Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, QEQM Building, St Mary’s Hospital Campus, Praed Street, London, W2 1NY, UK e-mail: [email protected]
Data collection is fundamental to any clinical research study. Without collecting reliable and accurate data, it is impossible to draw meaningful conclusions regarding the research question of interest. This chapter aims to provide an overview of how to plan generic data collection and data storage, applicable to the majority of clinical research studies. Specifically, the chapter focuses on planning and research design, data collection procedures and protocols, training and analysis that promote accurate and reliable data collection. Additionally, the fundamentals of database models and storage are covered to aid the novice researcher. A wealth of more detailed database design information and administration is available elsewhere for those who wish to pursue more complex designs. Throughout this chapter, we illustrate principles explained using clinically relevant examples.
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_25, © Springer-Verlag Berlin Heidelberg 2010
305
306
D. R. Leff et al.
It is hoped that through adherence to the principles outlined in this chapter, errors in clinical data collection and storage may be minimised.
25.2 Data Collection 25.2.1 Documentation 25.2.1.1 Protocol Design Studies funded by external bodies such as the Medical Research Council insist upon a formal study protocol. For many investigator-led projects, formal protocols may not be mandatory but are recommended to ensure the collection of reliable and relevant data. The benefits of a clear, consistent study protocol are summarised in Table 25.1. Where present, good protocol design will assist the lead investigators in the design, conduct and monitoring of research studies. Good protocols detail the plan for conducting the trial, as well as the purpose and function of the study and how it will be carried out. Specific elements that should be included are the underlying research question, the anticipated number of participants, eligibility and exclusion criteria, the details of planned interventions as well as outcome measures and study endpoints. The Consolidated Standards of Reporting Trials (CONSORT) group developed a range of initiatives to alleviate problems arising from inadequate reporting of randomised controlled trials (RCTs) [1]. The recommendations and guidelines developed by the CONSORT group are especially useful for researchers when planning clinical studies and ensure reliable and accurate data collection and reporting [1]. Prior to drafting a full protocol, we recommend starting with a 500-word summary that clearly defines the Table 25.1 The fundamental benefits of constructing a study protocol Study protocol • • • •
Guidance for the conduct of the trial; summarises the study procedures Aids procurement of external funding for the trial Assists local review board assessment of the ethical implications of the trial Historical document or record of the trial
research hypothesis or objectives, the study population, design and outcome parameters of interest. The introductory paragraph of the full protocol should describe the relevant theoretical background to the study. It helps think what is already known on this topic ? and why is this particular study now required ? For example, a trial might be necessary to (1) test a new treatment or regimen, (2) test an established treatment for a new indication, (3) determine the best strategy of a range of treatments and (4) provide new information on safety or efficacy of existing treatments. Once the researchers are content with the research objectives, it is vital to establish the eligibility criteria for the study population [2]. This refers to the clinical and demographic characteristics that define those that are eligible to be enrolled. The criteria for entry may relate to age, sex, clinical diagnosis and co-morbidity. Exclusion criteria are characteristics that prevent entry into a trial even though all inclusion criteria are met. Exclusion criteria should also be explicitly defined in the protocol and often relate to patient safety (e.g. we excluded all pregnant women). The protocol should also include a statement about how participants are recruited (e.g. self – referral, advertisement, etc.) and the setting in which the study will take place (e.g. London Teaching Hospital, District General Hospital, etc.) as this will affect wider generalisations regarding the results of the study. Attention to sample size is extremely important. Ideally, a study should be large enough to have a high probability (power) of detecting a clinically important (and statistically significant) difference of a given size, if such a difference exists [3, 4]. Obviously, large sample sizes would be required to detect smaller differences between study groups. It is preferable, where possible, to employ a formal power calculation to determine the sample size prior to enrolment. Built in to the protocol should be criteria and/or procedures to follow if subjects wish to terminate involvement in the study prematurely. Regarding studies that involve subjects undergoing a specific intervention (study group) while other subjects do not undergo the intervention (control group), the protocol should clearly state the method used for unbiased assignment of the intervention (e.g. randomisation). Random allocation is important to minimise bias in group assignment. The protocol should also include arrangements for allocation concealment, a critical process that prevents foreknowledge of treatment assignment and thus shields those
25
Data Collection, Database Development and Quality Control: Guidance for Clinical Research Studies
who enrol participants from being influenced by this knowledge. Where appropriate, the protocol should detail the methods used to ensure blinding of patients and health care providers. It should be obvious from the protocol which researchers, if any, are party to knowledge regarding treatment or group assignment. Additionally, a statement concerning how patients who are allocated to one group but cross over to the other group (e.g. a patient allocated to undergo a laparoscopic procedure is converted to open upon discovery of dense adhesions early in the procedure) will be analysed. Ideally, analysis should be on an intention-to-treat basis, but under some circumstances it may be deemed appropriate to analyse based on the final allocation [5].
The nature of the experimental design should also be stated in the protocol. For example, most clinical studies involve two-group parallel designs (where one group receives treatment and the other does not), but other designs include cross-over trials and factorial designs. The regularity of follow-up and details of data to be collected at each follow-up should be explicitly defined within the protocol. Outcome measures should have been shown to be valid and reliable prior to study commencement. Additionally, it is helpful to include the proposed method for statistically analysing the data. All of this information is captured in a standard flow diagram as illustrated in Fig. 25.1. When finalising the study protocol, it is important to maintain the highest standards of research ethics. For Assessed for eligibility (n = ... )
Excluded (n = ... )
Enrollment
Not meeting inclusion criteria (n = ... ) Refused to participate (n = ... ) Other reasons (n = ... )
Analysis
Follow up
Allocation
Randomised (n = ... )
Fig. 25.1 The CONSORT guideline recommended clinical study flow diagram [6]
307
Allocated to intervension (n = ... )
Allocated to intervension (n = ... )
Received allocated intervention (n = ... )
Received allocated intervention (n = ... )
Did not receive allocated intervention (give reasons) (n = ... )
Did not receive allocated intervention (give reasons) (n = ... )
Lost to follow up (n = ... ) (give reasons)
Lost to follow up (n = ... ) (give reasons)
Discontinued intervention (n = ... ) (give reasons)
Discontinued intervention (n = ... ) (give reasons)
Analysed (n = ... ) Excluded from analysis (give reasons) (n = ... )
Analysed (n = ... ) Excluded from analysis (give reasons) (n = ... )
308
this reason, it is recommended that the protocol should include arrangements for reporting adverse events, details of any remunerations or compensations for involvement in the study and plans for informed consent for participation (e.g. who is able to take consent?). In particular, the protocol should specify the safeguards in place to ensure that subject participation is voluntary and confidential. Procedures to monitor patient safety should be evident from the protocol and should include the type of adverse events to be monitored and named individuals to contact in the case of serious adverse events. Finally, plans for termination of the study, final data collection and communication to participants and the public through research publications and presentations should also be incorporated into the protocol. In the United Kingdom (UK), the National Research Ethics Service works with colleagues to maintain a UK-wide system of ethical review that protects patients while facilitating and promoting ethical research [7]. 25.2.1.2 Manual of Operations and Procedures The procedural implementation of a study often relies upon a Manual of Operations and Procedures (MOP) [2]. This is especially true for longitudinal studies conducted over months or several years and studies involving multiple individuals or rotating personnel. The MOP serves as the co-ordinators’ guide to the protocol, a historical record and a training tool. The MOP is inherently easier to change than the protocol. Naturally, details contained within the MOP will vary depending upon the nature of the study. However, minimum requirements would include a summary of the study design, data collection schedules, follow-up timing, procedures and forms to be completed at each visit, instructions for carrying out specific interventions including interruption and stopping criteria, instructions for completing a procedure (e.g. 20 s of rest, then 20 s of the task) and instructions relating to study procedures (e.g. consent and randomisation). MOP should be concise and summary tables and/or diagrams can be particularly useful. A table of contents and index may aid navigation for researchers using the MOP [2]. 25.2.1.3 Document Revisions Inevitably, changes to the protocol will occur following commencement of the study. Often practical constraints
D. R. Leff et al.
are encountered that make changes mandatory. In the absence of an accurate and reliable data storage and management system, new files may be lost or accidentally merged with older versions. Careful attention should be paid to document naming, storage and back up schemes to prevent document catastrophes. One named researcher should take the responsibility of coordinating and circulating significant document changes and for maintaining electronic backups. We recommend maintaining a minimum of three electronic versions of all relevant documents. For example, for our studies we maintain and back up one hard drive version (laptop based), one mobile external hard drive version and a further version stored on a remote back-up server (e.g. terabyte server). Electronic documents should all contain a specific version number at the top of the page and this should be updated when significant changes are made. In this respect, different versions of the protocol and MOP act as a description of the evolution of the trial. Paper archives should also be updated when significant changes to the protocol are made.
25. 2.2 Training and Certification Ensuring that the research staff charged with the responsibility of collecting clinical data are appropriately trained is fundamental to data quality control and assurance. This is especially true for complex procedures involving the collection of important outcome data. In this regard, standard operating procedures and/or MOP described above are likely to be extremely beneficial to researchers who are new to a particular field of research. Theoretically, by describing how a particular task is to be performed each time, standardisation across multiple measurements and different researchers is assured. Utilising standard equipment can also further help reduce inter-subject variability during data acquisition procedures. The nature and intensity of training will depend upon the procedure involved (e.g. capillary blood glucose versus operating a brain imaging device). Moreover, training should be targeted to the individuals’ role in data acquisition. For example, a study site co-ordinator will have very different training and certification requirements from a research assistant or research technician. Notwithstanding the above, training for the majority of procedures is likely to involve both cognitive (video of the procedure and observation
25
Data Collection, Database Development and Quality Control: Guidance for Clinical Research Studies
of important steps) and psychomotor aspects (practice) and should be supervised by experts in the field. Personnel should be required to be competent in these procedures following a standardised period of training. Following successful completion of training and demonstration of competency, researchers should be issued with a certificate of completion of training. Certification serves to document training and track data collectors. Personnel that no longer work on a project should be formally decertified, such that they are no longer able to collect study data. Periodic retraining should also be offered where appropriate. In cases where official bodies are not overseeing training, the principal investigator should retain the responsibility for tracking the certification status of researchers affiliated with the research project. Beyond the requirements to train staff conducting specific procedures, certain Universities offer formalised courses such as certificates in advanced training in clinical research methodology [8].
25.2.3 Procedures 25.2.3.1 Preparation for Data Acquisition The success of any clinical study relies on a high level of organisation. This is especially true for research projects involving serial/longitudinal data acquisition. This starts with planning the amount of time required to collect data for each subject/participant. In our investigations we have learnt to overestimate the amount of time required for each subject. This may result in fewer subjects at each visit and therefore more visits but prevents erroneous data collection and avoids disappointing participants as a result of overbooking. A checklist of all the equipment required is mandatory when planning visits. Specialist equipment integral to data collection should be tested the day before data acquisition to provide the researchers with a level of confidence that collection will run smoothly. This also enables repair work to be conducted should this be deemed necessary. A schedule or crib-sheet for the planned data collection can also be of help. If significant blocks of time are devoted entirely to data collection, a clear schedule may assist both the researcher and participants. A participant information sheet regarding the approximate time required for each stage of data collection and data entry helps to avoid disappointment
309
and minimise DNA (did not attend) rates. The latter is a mandatory requirement for any study requiring local regional ethical committee approval.
25.2.3.2 Randomisation The process of assigning study participants to treatment groups is known as randomisation [9]. Randomisation gives a given study participant a known and usually equal chance of being assigned to any of the groups. In non-randomised studies, there is a possibility that a significant difference between treatment groups is due to systematic differences (i.e. bias) between the groups due to factors other than the intervention itself. Randomisation aims to eliminate this type of bias by ensuring that groups are as alike each other as possible. There are a number of methods of random sequence generation summarised as follows [10]: • Simple randomisation – is the simplest form of randomisation and can be thought of as tossing a coin, group A allocation being “heads” and group B allocation being “tails”. • Permuted block randomisation – or blocking as it is known, may be required to ensure balance between groups in small studies. Blocks have equal numbers of A’s (group A) and B’s (group B) and a random sequence generator is used to determine the particular block. • Stratified allocation – enables further restriction to prevent imbalances between groups and enables the introduction of groups constraints depending upon characteristics • Adaptive random allocation – allocates patients to treatment groups based on the characteristics of patients already randomised. This is a sophisticated method requiring computational programming. Whichever method is used for randomisation, it is important that the clinical researcher running the trial is unaware of the group to which the participant will be assigned. This is known as Allocation Concealment [10, 11]. It is usually possible to identify researchers or staff not intimately involved in data collection that can keep the randomisation list or envelopes. If envelopes are used, they should be opaque and well-sealed [2]. It is important to adhere to appropriate randomisation principles when planning the study since at dissemination, scientific audiences will want to know who generated the
310
random sequence, which method was used for randomisation, an estimate of randomisation success and how concealment was maintained and monitored [1, 12].
25.2.3.3 Blinding It is important, where possible, to maintain blinding or masking of the investigators and study participants after random allocation [11]. For example, imagine a randomised clinical study to test a method to reduce the pain induced by injection of local anaesthesia. Knowledge of group allocation may unfairly bias the subject to rank pain lower than that actually experienced and may also unfairly bias the investigator to manipulate the delivery of local anaesthesia to reduce pain (e.g. infiltrating the anaesthetic more slowly than usual). In certain circumstances, it may not be possible or feasible to mask research participants or investigators. Some treatments are impossible to mask owing to adverse effects. For example, in the case of cooling the skin prior to delivering local anaesthetics, it is difficult to mask the surgeon to the effects due to reactive skin erythema as well as the palpably lower temperature of the skin. The success of blinding should be evaluated and be commented on at study termination and subsequent dissemination.
25.3 Clinical Research Audit The Department of Health has issued a guideline for the regulation of research called the Research Governance Framework for Health and Social Care [13]. Integral to these regulations are research audits. As a result, internal and external audit is becoming an increasing common feature of clinical research studies. For externally sponsored projects, audit is a mandatory requirement. Investigators should anticipate undergoing internal audit (host-institution) at timely intervals through a nonexternally funded research project. Research audit refers to a systematic and independent examination of trialrelated activities and documents to determine whether the evaluated trial-related activities were conducted, and the data recorded, analysed and accurately reported according to the protocol, sponsors (if applicable), standard operating procedures, Good Clinical Practice (GCP) and the applicable regulatory requirements [14]. Regarding clinical research projects involving UK
D. R. Leff et al.
hospital Trust patients, the Research Governance Framework requires systematic and spot audits. Audit provides an opportunity to review the process of eligibility for inclusion in the trial. Participant consent forms and procedures are routinely evaluated, along with any changes or significant additions to study protocol. In addition, audits enable assessment of pharmaceutical dispensing protocols, specimen storage procedures and laboratory processing arrangements. Therefore, it is important to maintain hard copies of relevant documents for the audit trail. The written evaluation from the auditor, often referred to as an audit report, contains the results and recommendations of the study audit. The report provides an opportunity for remedial action in cases of significant errors and/or inconsistencies.
25.4 Database systems 25.4.1 Database Software A number of different companies produce software for writing and managing databases. For the research student, the software made available at the place of study will largely determine the software used. The majority of higher education establishments will use the Microsoft Office suite of products, which include Excel and Access. Both of these can be used for database development. Microsoft Excel is useful for simple data collection where the number of recorded outcomes and unique events are both small. Microsoft Access, or its rival product, FileMaker Pro, both allow for the development of relational databases. For more complex solutions, particularly where data may be collected by several people working in collaboration, a networkbased solution may be beneficial. SQL Server software is the best solution under these circumstances. Table 25.2 summarises some of the software available for database development, although this is not exhaustive. The advantage of using the basic relational database software shown in Table 25.2 is that they allow for design of a graphical interface, know as forms. This allows the database designer to present the fields of a given table in a fashion that can facilitate data entry, rather than having data entered into tables. Forms can be linked together using subforms, whereby one form is embedded in another, or using buttons which open the form. Buttons have control elements behind them that
25
Data Collection, Database Development and Quality Control: Guidance for Clinical Research Studies
Table 25.2 Software solutions for database development Simple Basic relational Network relational Microsoft Excel
Microsoft Access
Microsoft SQL Server 2008
FileMaker Pro
MySQL Server
Alpha Five
Oracle
Paradox Lotus Approach
can automatically filter the data being shown in the form that is being opened. For example, we may have a form on which demographic information about a patient is entered, and on this form there is a button that opens a form on which details of their operation will be recorded. On clicking this button, we would not want to have another patient’s operation shown. A filter is therefore applied so that only the current patients’ operations are shown. Utilising commands within forms like this gives the database designer much more control over data entry. The network database solutions offer the greatest flexibility. They can typically store much more data than simple or basic relational databases. For example, Microsoft Access is limited to a maximum database size of 2 GB or 32,768 database objects. SQL databases are only limited by the amount of disk space available (the free Microsoft SQL Server 2008 Express is limited to 4 GB). While it is unlikely that many research projects will require this much data storage, SQL databases offer other advantages. By having the data storage on a networked drive, it is possible for many users to access the data simultaneously. This may be necessary in collaborations across different university campuses. Commands can be embedded within the database itself and “called” upon, thereby minimising network traffic and improving database performance. Other database software, such as Microsoft Access or FileMaker, can be used to develop a graphical interface (i.e. forms) that is linked to the networked database. Alternatively, a web-based interface could be developed for data entry.
25.4.2 Database Design A database is a structured collection of records or data. The aim of any database is to be able to organise, store and retrieve information as rapidly and as efficiently as
311
possible [15]. Computational databases rely on software to organise the storage of data. This software is known as database management system (DBMS). DBMS are classified by the type of database algorithms or model used to house the data. The best databases contain minimal redundant information and ensure that neither security nor quality of data is compromised. There are a vast array of textbooks, academic papers and seminars available on database design, development and implementation. The aim of this chapter is to familiarise the novice researcher with the fundamental principles of database design and management and to provide illustrated examples of clinical databases to equip clinicians when developing their own databases for clinical research.
25.4.3 Database Models Database models determine the query languages that are available to access the database. The advantages and disadvantages of each model along with the researchers’ own objectives will help to determine which format will best meet requirements.
25.4.3.1 Flat File The classic example of a flat file database is a nameaddress book list or research participant list in a spreadsheet. In flat files, data are arranged in a series of columns and rows organised in tabular format. An example of a flat file database is provided in Table 25.3. In a flat file database, columns are generally restricted to one specific data type (e.g. blood pressure). In the example above, columns are separated by whitespace characters, although other delimiters (e.g. comma separated values and tab-separated values) can be used. Flat files do not need extra preparation or complex computer Table 25.3 An example of a flat file database Id Name Blood pressure
Cholesterol
1
Paul Press
114/90
7.8
2
Stacey White
120/70
7.4
3
Peter Sharp
160/110
4.6
4
Stacey White
110/70
–
5
Paul Press
–
8.0
312
D. R. Leff et al.
software packages. Searching, sorting and calculating features are all available but may start to become arduous as the size of the database expands. Redundant data can easily develop in a flat file database. In the example provided in Table 25.3, one patient has been entered twice as she has been followed up on two occasions. Additionally, another patient attended clinic, but did not have a blood pressure recording. This leads to wasted space as we are still recording the data and may become problematic with many thousand patients in a study. Furthermore, the ID number for one subject has been incorrectly assigned. In the case of very large flat files, it may be cumbersome to perform data queries or calculations and may result in redundant data entry and redundant information. Flat files work well for small datasets, where special reporting is not required and redundancy is less of a problem.
25.4.3.2 Hierarchical In a Hierarchical database model, data are organised into an inverted tree-like structure with the root being the single table from which other tables branch, as illustrated in Fig. 25.2. This structure arranges the various data elements in a hierarchy and helps to establish logical relationships among data elements of multiple files. Each unit in the model is a record which is also known as a node. Each record on one level may be
Patient 2
25.4.3.3 Network The Network model was invented by the prominent computer scientist Bachman and was formalised and published by Conference on Data Systems Languages [16]. The Network model was proposed as a solution to the Hierarchical model to enable more natural modelling of many-to-many relationships between entities. Unlike the Hierarchical model structure of an inverted tree with a single parent record having many children, the Network model enables multiple parent and child records forming a lattice type structure as illustrated in Fig. 25.3. Unfortunately, this model has proved too arduous for end users and ultimately been displaced by the relational model which affords a higher level, more declarative interface and greater flexibility. 25.4.3.4 Relational
Hospital
Patient 1
related to multiple records on the next level. A record with subsidiary records is known as the parent and the subsidiary records known as children. Parents may have many children, but each child may have only one parent. The data entities are related to each other by 1:N mapping also known as one-to-many relationship. Hierarchical databases work well for data that require a one-to-many relationships that are inherently hierarchical, but not for many-to-many relationships (such as in a clinical study with multiple patients attending on multiple occasions). However, this model has largely been superseded by the relational database model.
Patient 3
The term relational database was originally coined by Codd working at IBM Almaden Research Centre in 1970 [17]. A relational database is a collection of relations or tables. Relational databases utilise a
Cholecystectomy
Surgical
Procedure
Left Hemicolectomy
Admission 2
Admission 1
Consultant
Right Hemicolectomy
Outcome
Imaging
Fig. 25.2 An example of a Hierarchical database model
Patient 1
Patient 2
Laparoscopic
Patient 3
Patient 4
Lap-Assisted
Patient 5
Patient 6
Open
Fig. 25.3 An example of a Network database model
25
Data Collection, Database Development and Quality Control: Guidance for Clinical Research Studies
313
number of mathematical terms which are equivalent to Structured Query Language (SQL – pronounced sequel). Tables of data consist of tuples (rows) and attributes (columns). As illustrated in Fig. 25.4, relations are defined as a set of tuples that have the same attribute. Key values enable tables to be related to one another, hence the term relational database. This enables a one-to-many relationship as highlighted in Fig. 25.5.
Despite tables containing completely different data, they are related through key values. Relational databases provide a useful means of linking several hundred tables and relational database management system (RDMS) is used to store information that supports world economy, financial information, manufacturing data as well as personal records. Oracle, IBM and Microsoft are leading vendors of RDMS.
Fig. 25.4 Fundamental mathematical terms underlying relational databases
Attribute
Tuple
Relation
Fig. 25.5 An example of a one-to-many relationship created using Microsoft Access
314
D. R. Leff et al.
Table 25.4 Flat file spreadsheet for blood pressure data collection Unit no. Name DOB Clinic
Blood pressure
Pulse
Weight
Height
1234
Linda Peterson
12/5/54
03/03/08
114/90
62
62
164
5678
Peter Thompson
07/03/65
03/03/08
120/70
78
79
182
9012
George Smith
06/01/60
23/03/08
160/110
72
86
180
3456
Sylvia Gilbert
04/04/30
23/03/08
110/70
80
80
170
1234
Linda Peterson
12/5/54
23/03/08
122/85
68
61
164
5678
Peter Thompson
07/03/65
04/04/08
130/80
82
80
182
7890
Elsie Roberts
05/09/23
04/04/08
180/110
98
45
158
25.4.4 Good Database Design The type of database used to collect data can have wide-ranging implications on the ease with which future analysis of data can be achieved. Poor database design can result in host of problems for the clinical researcher including unreliable or inaccurate data, performance degradation, poor flexibility, data redundancy and inefficient calculations [15]. Good database design, on the other hand, ensures stability and reliability of the data. One of the most important steps in designing a good database is conceptualising a logical model. This undoubtedly involves conducting a paper design first [15]. This will save time, provide a blueprint for discussion and limit subsequent problems with data modification. Alternatively, database design solutions can be used, such as MySQL Workbench, to design the database and create and template script for its creation.
25.4.4.1 Data Considerations It is important, early in the design process, to create a list of all the data fields you require. At this stage you should also determine the format of the data (e.g. text, date, numeric, etc.) as this can have implications on any calculations that may need to be made at a later stage. If you have already decided to utilise a flat database, e.g. Excel spreadsheet, then no further work is required. The next step is to organise the data into logical groups and start thinking about the quantity of data being collected in each of the fields. Let us assume we have a simple database in which we collect the following details: hospital number, name, date of birth, date of clinic attendance, blood pressure, pulse, weight and
height. The database can be created as flat spreadsheet as shown in Table 25.4. However, this would require all of the fields to have data entered into them every time the same patient attends clinic as illustrated (e.g. for patients Linda Peterson and Peter Thompson). This gives rise to considerable data redundancy. It is better to consider the data as items that are unlikely to change at each clinic visit (hospital number, name, date of birth and height) and those that will (date of clinic attendance, blood pressure, pulse and weight). We have split the list of fields into two groups, and these can be considered as separate tables in our database, as illustrated in Fig. 25.6. Note that the hospital number is recorded in each of the tables, since this allows us to link the clinic measurements back to the individual patient. In the example given above, the recording of items such as blood pressure in a database can give rise to additional considerations. When blood pressure is recorded it is written as follows: systolic/diastolic(mmHg) If we were to record this in a field, it would have to be of text format and we would not be able to conduct any calculations on the data (e.g. mean systolic and
Fig. 25.6 Splitting the list of fields into tables of data that are unlikely to change (left table) and those that are likely to change (right table)
25
Data Collection, Database Development and Quality Control: Guidance for Clinical Research Studies
315
25.4.4.3 Indices
Fig. 25.7 Recording blood pressure as separate systolic and diastolic components
diastolic readings over consecutive clinic attendances). One way around this is to split the data into systolic and diastolic fields, which can then contain numeric data and allow us to conduct calculations if required. Figure 25.7 illustrates how our table would now look.
25.4.4.2 Primary Keys Every table in a database should have a single field that uniquely identifies the data stored in each row. This is known as a primary key. These need to be assigned during the database design process. The simplest primary keys are numerical (usually long integer) and have an auto increment statement attached to them, for example, ClinicID in TblClinics in Fig. 25.7 above. These automatically increase in value every time a new row of data is added and therefore remain unique within the table. Sometimes, you may wish to add a different primary key. In most healthcare settings, the hospital number or NHS number can be used as a substitute (see Fig. 25.7 above). It is worth noting, however, that different hospitals utilise different systems for their hospital numbers and these may be alphanumeric or numeric. Furthermore, there may be more than one hospital using the same system. These points are of particular importance when conducting multicentre studies, and it is preferable to use an auto increment key in these settings and to store the hospital number in a separate field.
Indices are other fields within database tables that improve the speed of database operations. All primary keys are automatically indexed. Additionally, other fields may be manually assigned to an index if these are likely to be used frequently when searching for data. For example, the HospNo field in TblClinics in Fig. 25.7 above will be used to rapidly identify a given patient’s clinic data. Indices can be assigned as allowing or disallowing duplicate values. In the examples shown here, HospNo in TblClinics would be indexed to allow duplicates since the same patient may attend clinic on multiple occasions. In a different example, we may be collecting outcome data after a single operation, in which case the HospNo field in the operations table would be assigned as unique assuming that a single patient can only have a certain operation once. For example, imagine collecting outcome data on a series of patients undergoing subtotal colectomy.
25.4.4.4 Foreign Keys The foreign key identifies a column or a set of columns in one (referencing) table that refers to a column or set of columns in another (referenced) table. The columns in the referencing table (e.g. TblClinics) should be the primary key in the referenced table (e.g. TblDemographics). In our example, the foreign key for TblClinics is the primary key for TblDemographics (i.e. HospNo).
25.4.4.5 Relationships In order to create a relationship between two tables in a database, each of the tables must have a field that contains the same data (e.g. HospNo). It is common practice to use the primary key from at least one of the tables when defining the relationship and the foreign key from the other table. In our example above, this could be the hospital number. Relationships should be defined between each of the tables in the database. The type of relationship is determined by the frequency with which the data are collected. Relationships between database tables have practical significance when exporting clinical data for analysis (see section 25.4.6 ).
316
Fig. 25.8 An example of a one-to-one relationship
The common types of relationships are as follows: • One-to-one In the example of a series of patients undergoing subtotal colectomy (Fig. 25.8), each patient can only undergo this procedure once. The HospNo field in TblOperations has been assigned as the primary key and is, therefore, unique, as is the HospNo field in TblDemographics. • One-to-many This type of relationship is illustrated in our example of longitudinal recordings of blood pressure. Patient demographic data are collected once, whereas the blood pressure data collected at clinic attendance are recorded on multiple occasions. Therefore, this is an example of a one-to-many relationship. • Many-to many Many-to-many relationships occur when neither field in the referencing tables contains unique data. In our experience, there are few circumstances where this would occur in clinical practice. Databases involving many-to-many relationships are highly complex and for the clinical researcher are best avoided. Wherever possible, it is advisable to utilise one-to-one or one-tomany relationships.
25.4.4.6 Referential Integrity While defining relationships, it is possible to define whether this should have referential integrity. This ensures that consistency remains between the tables. There are two varieties of referential integrity which can be used simultaneously: • Cascade update – is best illustrated using the example of blood pressure recordings. Imagine, in our
D. R. Leff et al.
example, that during the data collection process, the hospital introduces a new system for hospital number patient identification. The researchers will need to update the HospNo field in the table TblDemographics. If cascade update has not been enabled, HospNo field in TblClinics will need to be manually updated risking typographical error. If errors occur during this process, data linkage between tables will not be possible as the fields on each side of the relationship would contain different data. • Cascade delete – refers to the automatic deletion of all data for a given patient across all tables. For example, if a patient is removed from our demographic table, all of the clinic attendance data for that patient should also be removed. This ensures that the database is not retaining redundant data which may influence database performance.
25.4.4.7 Look-Up Tables Sometimes, the data being entered into a field are known to be one of a limited number of responses. For example, yes, no or unknown, or in the case of operative procedures, open, laparoscopic, laparoscopic converted to open and laparoscopically assisted. In order to conserve space and minimise the database size, tables can be created that contain these data, together with a numerical primary key (Table 25.5). These are then linked using relationships to fields in the main data tables. Common conventions are used when assigning numerical values in lookup tables are to use 0 and 1 for negative and positive binary responses (e.g. no and yes), 1 and 2 for male and female and 99 or 999 for unknown. Many clinical outcomes contain ordinal data. This occurs when a series of values are used to define progression, but where the differences between each value are not necessarily linear. For example, Dukes’ staging for colorectal cancer as shown in Table 25.6. When defining the “lookup” table, it is convention to rank outcomes according to severity or prognostic factors and to assign numerical values accordingly. Table 25.5 An example of a “lookup” table for gender Gender ID Gender 1
Male
2
Female
99
Unknown
25
Data Collection, Database Development and Quality Control: Guidance for Clinical Research Studies
Table 25.6 An example of a “lookup” table for Dukes’ staging Dukes stage ID Dukes stage 1
A
2
B
3
C1
4
C2
5
D
99
Unknown
25.4.5 Graphical User Interface (GUI) All database systems are primarily designed to store data in fields, usually grouped as a series of tables. Navigating between tables to enter data can become a complex task, particularly when relationships exist between tables and data integrity needs to be maintained. A good GUI such as that shown in Fig. 25.9 can greatly simplify the data entry process. As can be seen in this example, the fields are laid out in a clear fashion, with groups of fields containing similar data grouped together. The use of large buttons can simplify record navigation and can be used to open other forms in the database. Note that the database above utilises tabs (at the top of the figure) to navigate
Fig. 25.9 Example of a GUI database form for a bariatric database (courtesy of Richard Lovegrove, 2007)
317
between different data entry areas. These allow for clear visualisation of the different parts of the database. Another advantage of using forms for data entry is that calculations can be embedded in the Form and are run as fields are updated. In the above example, the BMI, BSA and excess weight fields are all automatically calculated upon entering or updating the height and weight fields. The drop-down lists included in the above illustration (denoted by the downward pointing triangle to the right of the field) represent the inclusion of lookup tables in this database. Upon selecting the field, the list of values stored in the lookup table are shown and can be selected, with the field storing the numerical value assigned to that outcome. While using SQL databases, it may be preferable to use a web-based graphical interface. This requires knowledge of several web programming languages such as HTML, Javascript, ASP or PHP. Programmes such as Microsoft Visual Studio (of which Express editions are available free) can aid in the development process. A good example of a web-based interface is illustrated by the Intercollegiate Surgical Curriculum Project’s surgical logbook (https://surgeonslog.iscp. ac.uk), and is shown in Fig. 25.10. As with the example of the bariatric database illustrated in Fig 25.9, the fields are clearly laid out and
318
D. R. Leff et al.
Fig. 25.10 The Intercollegiate Surgical Curriculum Project surgical logbook (with permission, ISCP©)
buttons are utilised to navigate between fields and add or remove records. The use of SQL databases, together with web-based interfaces such as this, is particularly useful when collecting data across several sites. This simplifies database management by enabling the inroduction of changes to the database without having to update multiple locally-held copies.
25.4.6 Exporting Data for Analysis In most clinical research settings, investigators will want to export all of the collective data for statistical analysis. While the method for data exporting is dependent on the software package, in most cases it is straightforward. Sometimes only selected data will need to be exported for analysis. This can be done using Queries within the database package. The query language used is entirely dependent on the software package and is beyond the scope of this chapter. Examples of limiting data in the example of blood pressure data shown above may be to identify only those patients having hypertension. Alternatively, we may wish to only export data from the patient’s last clinic attendance. While writing queries that combine data from multiple tables, it is possible to induce a cascading effect that results in all conceivable combinations of data being created. This is known as a Cartesian Product.
This can be avoided by careful use of queries and appropriately limiting data.
25.4.7 Quality Control and Data Integrity Within any database it is easy for data to become corrupt. This may be the result of external factors such as hardware failure or computer viruses. These examples highlight the need for data back-ups on a regular basis. Databases help improve the quality of the study by detecting and preventing errors during data entry. The data may become corrupt through typographical errors or entering data that are not appropriate for the field. The use of lookup tables as described above can help prevent typographical errors when the responses to a given field are limited. However, these may not always be appropriate. In order to prevent inappropriate data from being entered into a field, rules may need to be created that limit the allowable entries. In Microsoft Access this can be done in the Table Design window using validation rules. For SQL databases, this needs to be applied within the front-end application. For example, a rule could be created to prevent a date of birth being entered that is in the future (in Microsoft Access this would be ≤ now() ). Alternative strategies include enforcing double data entry and advanced database analysis techniques to identify potential discrepancies within or between tables.
25
Data Collection, Database Development and Quality Control: Guidance for Clinical Research Studies
Data integrity can be a time-consuming process, and sometimes tedious. However, it can minimise the effort required when the time comes to export data for analysis.
25.4.8 Data Security All clinical researchers storing electronic data relating to individuals need to comply with the data protection act 1998 [18], full details of which can be found at the Information Commissioners Office website (http:// www.ico.gov.uk). As a minimum requirement, all databases should be password protected. Passwords can be applied at a generic level (e.g. access database file-level password) or on a user specific basis (e.g. SQL applications). Ideally, all users should have their own log on and password to access the database and the database should store the user ID of individuals with rights to make changes to data. It is worth noting that software exists that is capable of retrieving file-level passwords from databases such as those created in Microsoft Access. While these may prove useful if you have misplaced or forgotten a password, they provide a potential access point for the unscrupulous. File encryption is a further method for protecting sensitive data and can be applied at the file-level, folder-level or drive-level. File encryption should be applied to all data contained on removal storage devices. Several recent high profile losses of government acquired data highlights the need for this security measure [19]. Computers linked to networks or the internet should be protected by both firewalls and continually updated anti-virus software. As discussed above, back-ups of data should be made on a regular basis. In order to be fully compliant with the data protection act, data on removal storage should be encrypted and ideally, back-ups should be stored both onsite and offsite in fireproof safes.
25.5 Conclusions In summary, data collection and storage are fundamental to any clinical research study. The use of a well-
319
conceived protocol and planning key steps of the data collection process such as MOP will help to ensure accurate and reliable data collection. Databases will assist storage and retrieval of clinical research data and careful planning is required to determine the type of database software and model (flat, relational, etc) that best suits the study design and outcome measures of interest. Fundamental considerations for database design are the type of data being stored, whether and how to link data and to determine the relationships between fields of data. Finally, whenever researchers are storing clinical data regarding individuals, attention should be paid to ensure that data security is not compromised.
References 1. Altman DG, Schulz KF, Moher D et al (2001) The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Intern Med 134:663–694 2. Holbrook JT, Shade DM, Wise RA (2005) Data collection and quality controltranslational and experimental clinical research: principles of translational and experimental medicine. Lippincott Williams & Wilkins, Philadelphia, pp. 136–151 3. Boissel JP (2004) Planning of clinical trials. J Intern Med 255:427–438 4. Cohen J (1989) Statistical power analysis for the behavioural sciences, 2nd edn. Lawrence Erlbaum Associates, Hillsdale 5. Hollis S, Campbell F (1999) What is meant by intention to treat analysis? Survey of published randomised controlled trials. BMJ 319:670–674 6. CONSORT (2008) CONSORT flow diagram. Available from http://www.consort-statement.org/index.aspx?o = 1077 7. Anonymous (1991) North American Symptomatic Carotid Endarterectomy Trial. Methods, patient characteristics, and progress. Stroke 22:711–720 8. Anonymous (1991) MRC European Carotid Surgery Trial: interim results for symptomatic patients with severe (70–99%) or with mild (0–29%) carotid stenosis. European Carotid Surgery Trialists’ Collaborative Group. Lancet 337: 1235–1243 9. Altman DG, Dore CJ (1990) Randomisation and baseline comparisons in clinical trials. Lancet 335:149–153 10. Beller EM, Gebski V, Keech AC (2002) Randomisation in clinical trials. Med J Aust 177:565–567 11. Viera AJ, Bangdiwala SI (2007) Eliminating bias in randomized controlled trials: importance of allocation concealment and masking. Fam Med 39:132–137 12. Kunz R, Vist G, Oxman AD (2007) Randomisation to protect against selection bias in healthcare trials. Cochrane Database Syst Rev MR000012 13. Department of Health (2005) Research governance framework for health and social care, 2nd edn. DoH, London. Available from http://www.dh.gov.uk/en/Publicationsandstatistics/Publications/ PublicationsPolicyAndGuidance/DH_4108962
320 14. Holzenbein J, Kretschmer G, Glanzl R et al (1997) Endovascular AAA treatment: expensive prestige or economic alternative? Eur J Vasc Endovasc Surg 14:265–272 15. Mulvhill DA, Gibson DW, Cole TG (2005) Translational and experimental clinical research: principles of translational and experimental medicine. Lippincott Williams & Wilkins, Philadelphia 16. Bachman C (1973) The programmer as navigator (ACM Turing Award lecture). Commun ACM 16:653–658 17. Codd E (1982) Relational database: a practical foundation for productivity. Commun.ACM 25:109–117 18. United Kingdom Act_of Parliament (1998) Data Protection Act. Available from http://www.opsi.gov.uk/Acts/Acts1998/ ukpga_19980029_en_1 19. O’Neill S, Ford R (2008) Thousands of criminal files lost in data fiasco In: The Times 22 August 2008. Available from http://www.timesonline.co.uk/tol/news/uk/crime/article4583747.ece
Further Reading A wealth of information on database design and development is available both online and from good book retailers. Readers may wish to look at the following
D. R. Leff et al.
information on Microsoft Access, Microsoft SQL Server and MySQL databases. Ben-Gan I (2006) Microsoft SQL Server 2005: applied techniques step by step. Microsoft Press Ben-Gan I (2006) Microsoft SQL Server 2005: database essentials step by step. Microsoft Press Ben-Gan I, Kollar L, Sarka D (2006) Inside Microsoft SQL Server 2005: T-SQL querying. Microsoft Press Brust AJ, Forte S (2006). Programming Microsoft SQL Server 2005. Microsoft Press DeBetta P (2004) Introducing Microsoft SQL Server 2005 for developers. Microsoft Press Dow Lambert M, Lambert JPS (2007) Microsoft Office Access 2007 step by dtep. Microsoft Press Microsoft TechNet From http://technet.microsoft.com/en-gb/ sqlserver/default.aspx MySQL AB (2006) MySQL administrator’s guide and language reference, 2nd edn. MySQL Press Online Documentation From http://dev.mysql.com/doc/ Schneider RD (2005) MySQL database design and tuning. MySQL Press Ulrich Fuller L, Cook K, Kaufeld J (2006) Access 2007 for dummies. Wiley, New York Viescas J, Conrad (2008) Microsoft Office Access 2007 inside out. Microsoft Press Welling L, Thomson L (2003) MySQL tutorial. MySQL Press
The Role of Computers and the Type of Computing Skills Required in Surgery
26
Julian J. H. Leong
Contents
26.7
26.1
Introduction ............................................................ 321
26.2
Hardware ................................................................ 322
26.7.1 Web 2.0..................................................................... 335 26.7.2 Virtual Reality .......................................................... 335 26.7.3 Robotics.................................................................... 336
26.2.1 26.2.2 26.2.3 26.2.4
Desktop Computer ................................................... Laptop Computers .................................................... Moore’s Law ............................................................ Smart Phones and Personal Digital Assistants .........
26.3
Software................................................................... 323
26.4
Computing Skills .................................................... 324
26.4.1 26.4.2 26.4.3 26.4.4
Basic Computing Skills ............................................ Needs-Based Computing Skills................................ The Internet Resources ............................................. Role of Computers in Specific Environment............
26.5
Clinical Environment ............................................. 332
26.5.1 Picture Archiving and Communications System (PACS) ......................................................... 26.5.2 Logbook ................................................................... 26.5.3 Web-Based Clinical Resources ................................ 26.5.4 Hospital Information System ................................... 26.5.5 Medical Software for Hand-Held Devices ...............
322 323 323 323
324 328 329 330
332 333 333 333 333
26.6
Research Environment........................................... 334
26.6.1 26.6.2 26.6.3 26.6.4 26.6.5 26.6.6
Electronic Journals ................................................... Bibliographic Database ............................................ Citation Report ......................................................... Publish or Perish ...................................................... Statistical Packages .................................................. Ethical Approval.......................................................
334 334 334 335 335 335
J. J. H. Leong The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
26.8
The Future .............................................................. 335
Conclusions ............................................................. 336
Abstract In this chapter, there is a concise description of the computing skills required in surgery. The role of computing is important as there are many applications in the clinical, educational, research, and personal setting that can be used to facilitate the surgeon’s daily activity. Several technical aspects are explained and practical web resources are presented.
26.1 Introduction It is difficult to imagine the days before computers became a part of our daily lives; typewriters instead of printers, letters instead of emails, and a crowded high street instead of online shopping. Surgeons tend not to be technophobic; however, it is the ever-changing technology in surgery that attracts them into the field in the first place. Computers started as highly specialised equipment specifically designed for computer scientists; as technology advanced, computers have become more usable and intuitive. Computers have become essential not only in the research and teaching environment in surgery, they have also become more and more important in the clinical peri-operative and operative settings.
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_26, © Springer-Verlag Berlin Heidelberg 2010
321
322
26.2 Hardware The main components of the computer are the Central Processing Unit (CPU), Random Access Memory (RAM), Hard Disc, and the Graphics Processing Unit (GPU), which are all mounted on the motherboard. The CPU is best thought of as the brain of the computer, where almost all of the processing takes place. The two main manufacturers are Intel® and AMD®, who have been competing to produce the fastest CPUs at the lowest cost. It would be redundant to describe the different options of CPUs available now; it suffices to say that in 2 years of time, all will be different again. However, in a more generic fashion, there are generally two types of processors: desktop and mobile. Desktop CPUs are designed for desktop computers, and are generally faster and cheaper. Mobile CPUs are used in most laptop computers, and the focus is on low heat production, lower power consumption, and smaller size. RAM is perhaps like the short-term memory of the system. In essence, processes and programmes that are essential for the computer to function are loaded up in the RAM every time the computer starts, and cleared when the computer shuts down. The size of RAM greatly influences the computer performance, as any excess information that cannot be stored in the RAM will be accessed from the hard disc, which is much slower. Hard discs are the long-term storage of the computer, and personalise the computer to the individual. It is the information that is stored in the hard disc that determines how the computer starts up and functions. It is also where all the personal files are stored. Videos and pictures tend to take up more space, and users should be aware to future-proof their computers by buying a bigger hard disc. GPU is a separate processor that handles the graphical output of the computer. The more basic GPUs can handle desktop applications; however, if 3D graphics are used, then a dedicated high-powered GPU would be needed.
26.2.1 Desktop Computer Desktop computers can be divided into three categories: business, sub-£1,000, and £1,000+. Business desktops
J. J. H. Leong
are low-specification computers that are optimised for office applications and internet browsing; they tend to have fewer capabilities for 3D graphics. Sub-£1,000 desktops are the entry-level computers, with enough processing power for all day-to-day computing, and probably capable of handling most games. These computers are likely to be outdated in 2–3 years, depending on the individual’s computing needs. £1,000+ desktops are usually those that use the most up-to-date processors and GPUs, and large amount of RAM and hard disc. Increasingly, a new genre of computers, lifestyle desktops, is becoming popular. These are computers that compromise between real processing power and external appearance. These computers are usually coupled with software to increase their usability, and concentrate on multimedia entertainment at home. Recently, both Microsoft® and Apple® have released operating systems (OSs) that can handle multimedia entertainment using a simple remote control, rendering the system into a television and video player when needed.
26.2.1.1 Microsoft-Based Microsoft is the largest computer software company in the world. The Microsoft® operating system (Windows) The Windows operating system (Microsoft) is the most commonly pre-installed OS, and most computer hardware and software are designed to work in the Windows environment. Computers that run Microsoft Windows are colloquially referred to as PCs, though the abbreviation literally means “Personal Computers”. The advantages of PCs are the large range of software available and better compatibility between different computer components.
26.2.1.2 Macintosh-Based Macintosh™, or Mac for newer models, is a line of computers designed, developed, and marketed by Apple® Inc. Apple® computers are more renowned for their user-friendliness, powerful video and image editing software, rarer virus attacks, and clean exterior design. Apple® computers run the Mac OS, and there are Mac versions of most desktop-based applications, for example, Microsoft Office Windows and Microsoft office are registered trademarks of Microsoft Corporation in the United States and other countries, Adobe® Photoshop®, and others.
26 The Role of Computers and the Type of Computing Skills Required in Surgery
Recently, newer Apple® computers are using Intel CPUs, which means that they can also be used to run Microsoft Windows. Boot camp® is the name of the Mac software that enables Macs to boot natively in Microsoft Windows. There are also other solutions, like Parallels, which can virtualise a Microsoft environment within the Mac OS; however, this would significantly decrease the computer’s performance.
323
an integrated circuit increases exponentially, doubling every 24 months. This is known as Moore’s law, and this trend is likely to continue as shown in Fig. 26.1. This law is important to remember, as each computer is most likely to last between 2 and 5 years, depending on the individual computing needs, and the initial investment made.
26.2.1.3 UNIX®-Like Operating Systems
26.2.4 Smart Phones and Personal Digital Assistants Linux and Solaris (SunOS) are Unix probably the two ®
most commonly used open source Unix-like OSs. These are largely free OSs, and generally require fewer resources for operating. However, they are generally less user-friendly and require a sound knowledge of computing languages to use, and are beyond the scope of this chapter.
26.2.2 Laptop Computers Laptop computers are generally divided into: ultraportable, desktop replacement, and others. Ultraportable computers tend to use mobile processors (or ultra low voltage processors), which use less energy and hence improve the battery life of the computer. Furthermore, the GPUs are generally of low performance, which renders these computers more useful for office-based applications and web-browsing, but not for processing intensive applications (e.g. video editing or games). The development of these computers is focused on long battery life and light weight (1–2 kg). Desktop replacement laptops, as the name suggests, often use the same CPUs as desktop computers, and generally have a separate GPU to handle 3D graphics. They tend to have near-desktop performance, but are much heavier and have a shorter battery life than ultraportable laptops. The rest of the laptops are in between the two categories.
26.2.3 Moore’s Law George E. Moore, co-founder of Intel, describes that the number of transistors that can be incorporated in
Currently, these two entities are merging into one, as it seems redundant to carry both. The most popular OSs for these devices are: Symbian OS, Linux, Palm® OS, and Windows Mobile®. More recently, Research in Motion have developed the Blackberry® OS, which is more geared toward “push emails”, where emails are delivered automatically to the device, as opposed to delivered on demand (“pull emails”). Symbian and Palm® OS have the advantage of the larger variety of third-party software, and Windows Mobile is more compatible to synchronise with Microsoft desktop products. The iPhone™ from Apple runs a streamlined version of the Mac OS X®. Medical applications for these devices are discussed later in the chapter.
26.3 Software Software is a collection of programmes that allows the computer to perform certain functions. They are generally divided into three types: Operating, Application, and Programming software. The OS, as described earlier, manages the basic computing resources while providing an interface for interaction with them. The most commonly used application software is Microsoft Office®, which is a suite including a word processor, spreadsheet, presentation, and database programmes. Other examples for applications are: Internet Explorer® for browsing the internet, Adobe Photoshop® for editing images, and iTunes® for playing media. Programming software can be used to develop new application software; programming languages like C++®, BASIC, and PEARL are commonly used.
324
J. J. H. Leong
Transistors*
1970 1975 Year of Introduction
Intel Dual core Itanium® processor
1000,000,000
Pentium® 4 processor Intel Itanium® processor Intel Xeon™ processor
Pentium® III processor
Intel® Celeron® processor
Pentium® II processor
Pentium® pro processor
Pentium® processor
i486™ Processor
i386™ Processor
286
8088
8086
8080
4004
8008
Intel Itanium® 2 processor
10,000,000,000
100,000,000
10,000,000
1,000,000
100,000 10,000 1980
1985
1990
1995
2000
2005
2010
*Note: Vertical scale of chart not proportional to actual Transistor count.
Fig. 26.1 Moore’s law Intel® microprocessor chart
26.4 Computing Skills 26.4.1 Basic Computing Skills The European Computer Driving Licence Foundation Ltd (ECDL Foundation, www.ecdl.com) is a nonprofit organization, which provides training and certification in competency in personal computing skills. They offer a number of products ranging from basic desktop applications, for example, word processing, internet, spreadsheet, and presentation, to more advanced courses. ECDL, though often not required, is a desirable certification to have for job applications.
Our aim is not to write a complete manual for using computer programs, though some of the more useful features that are perhaps less well known will be briefly described in the following sections.
26.4.1.1 Word Processing Track changes (Tools > Track Changes) is a good way to circulate manuscript drafts amongst co-authors. It allows each author to contribute to the final draft, and to add their own comments to it. Inserting pictures (insert > picture > from file…) usually causes a lot of formatting problems in the
26 The Role of Computers and the Type of Computing Skills Required in Surgery
manuscript, and can increase the size of the document, and hence, result in slower loading. One way is to insert the link of the picture file (insert > picture > from file… > highlight picture file > click the drop down menu next to insert > link to file); this can decrease the size of the actual document to just the text. Beware that the linked picture file has to stay in exactly the same directory to be retrieved, for example, C:\my pictures\picture.jpg. Therefore, if this document is sent to a colleague, they will have to have the linked picture files in C:\my pictures\picture.jpg in the local computer. After a picture is inserted, double click on the picture> layout tab> advanced > Text wrapping tab > select Top and Bottom. Click OK in the horizontal alignment > select center > OK. These options can keep the text above and below the inserted picture, but Word® would still move the picture to the next page if more text is added before it. Once the document is prepared, the formatting can still change if it is opened in another computer if a different printer is selected. This is due to the slight differences in the print area between the printers, and some of the text and illustrations will then be moved. To save the formatting, one way is to print the document to a .pdf file. There are various programs that can achieve this, including Adobe Acrobat (The Adobe Reader is free, but will not convert documents into .pdf files). Pdf995 (http://www.pdf995.com/) is one of the free pdf converters available, but it cannot edit the document after conversion. There are other less commonly used programs for word processing, for example, open office or LaTeX (a high-level programming language for the TeX typesetting program), which may have advantages of lower cost and easier formatting with equations, but lack the usability and cross-compatibility of Microsoft Word®.
26.4.1.2 Spreadsheet Microsoft Excel® is an example of a spreadsheet program, which is useful for basic data entry, graph plotting, and statistical analysis. When handling large data sets where analysis is repetitive, it is worthwhile knowing the Macro function (Tools > Macro), although some prior knowledge of programming in BASIC would help in understanding this.
325
Actions to perform the analyses can be recorded and translated into a Macro Script automatically; this can then be re-played to repeat the analysis. Tools > Macro > Record New Macro… Perform the analysis, and then click the stop (square) button. Tools > Macro > Visual Basic Editor will show a new window > Expand Modules > Module 1. This will be the Macro Script of the actions. This same Macro can be run on other data sets, provided the data are arranged in exactly the same way.
26.4.1.3 Presentation Gone are the days of upside down slides, jammed projectors, and missing slides; PowerPoint® presentations have been gaining widespread use over the last 10 years. There are other software options for presentations, like the Apple® iWork Keynote® (http://www. apple.com/iwork/keynote/) or OpenOffice.org Impress (http://www.openoffice.org/product/impress.html), but one cannot expect the conference host to have the software to view them. The basic principles of presentation are still the same: avoid crowded slides, more graphical illustrations, and consistent colour coding. Perhaps, the most useful, but also feared, aspect of Powerpoint is the incorporation of videos and animations. These will be discussed in detail later in the chapter.
26.4.1.4 Mind Map One of the ways to organise thoughts is the use of mind maps. These are diagrammatic presentations of ideas and words that are interconnected according to the flow of information and the use of mind maps can help generate and clarify the thought processes for creating presentations or manuscripts. Software like MindManager® (http://www.mindjet. com/) or OpenMind 2 (http://www.matchware.com/ en/products/openmind/) can facilitate this, and a draft mind map of this chapter is presented in Fig. 26.2. These mind maps can then be exported into headings in Word® or PowerPoint®.
26.4.1.5 Email There are four most-used protocols to receive and send emails: Messaging Application Programming Interface
326
J. J. H. Leong Chapter - What is the role of Computers and what type of computing skills are required in surgery
Introduction
Hardware
Skills
Software
Desktop
Basis computing skills
Commercial
ECDL
need-based
Future Role of computer - in specific environments
The Internet and on line resources
Robotics Complex Data-analysis Video data
Web 2.0
Mahalo
Information management Google image
Word processing
Mac
Microsoft
PC
Adobe
Slide presentation
Lynix
Mindmanager
Spreadsheets
Laptop
Subtopic Open
Ultraportable
PDA software - lorenzo
Macro E-mail
Desktop replacement
Internet Browser
3G laptop card
File Management
PDA
Reference manager
Google
Databases
Internet for doctors
Programming VBA - Macro Matlab
Education environment
Forums
photo editing
Doctors.org, medscape
Video, virtualdub
Subtopic
Network and communications
Smart phones
Slide presentation
Annals RSC page
Clinical environment
Research Environment
Hospital Information systems
Electronic journals - BMJ
Software for patient care
Statistical packages
Electronic logbook
SPSS
doctors.org.uk - medical textbook
Codec
orthoteers
Package for CD
wheeler’s
STATA Literature search
Software for education
clinical evidence
Pubmed
XRay 2000
trip database
Cochrane
E-mail
Casimage
Palm based
Network
NICE
Google image
Wireless network
Google Scholar
Blackberry Symbian
Antivirus
Microsoft Mobile
Anti Spyware
External hard drive
OVID ISI
VOIP
Citation index
Messaging
Impact Factor
networked or not
Ethics
commercial rules of backing up
NREC MHRA Publish or perish software
Fig. 26.2 Mind map
(MAPI), Post Office Protocol 3 (POP3), Internet Message Access Protocol 4 (IMAP4), and Simple Mail Transfer Protocol (SMTP).
POP3 POP3 is a commonly used protocol where email messages are downloaded from the server directly, and stored in the local computer (unless save a copy in the server option is selected). In other words, the message no longer exists in the server, and can only be retrieved at the local computer. It is mainly designed for offline mail processing, and was most popular during the days when internet access was relatively slow and charged by the amount of time used.
IMAP4 IMAP4 can be accessed both online and offline. The email messages are stored on the central server, and the local computer(s) can download new messages in the Inbox whenever connected. Its main advantage is that multiple computers can be synchronised with the main server, and each can store a local copy of all the messages.
verifies the sender’s identity within the private network (university or company) before sending the outgoing emails through the server. This can either be done by physically connecting to the private network, or through a Virtual Private Network (VPN), or through a login and password. Second, it returns undeliverable emails to the sender. Most email programs support all three protocols, for example, Mozilla® Thunderbird® (free), Windows® Live Mail® (New version of the free Outlook Express), Eudora®, Mac Mail®, Outlook®, etc.
MAPI Like the IMAP4 protocol, multiple local computers can synchronise email with the Microsoft Exchange Server® through the MAPI protocol. The Microsoft Exchange Server® holds not only the emails, but also contact lists, calendar schedules, address books, tasks, and other functions. Programs like Microsoft Outlook® (Entourage®, Mac Version) support MAPI. In fact, Microsoft Exchange also supports Outlook® Web Access, where no email programming client is needed, but only a web browser. Note: Using Mobile Access, contact, and calendar (as well as emails) can be synchronised with mobile phones, which greatly reduces the hassle of losing a mobile phone and all the phone numbers re-entered.
SMTP 26.4.1.6 Internet Browsing SMTP is a protocol for sending messages between a mail client to the server, and then relayed to the internet. The SMTP server performs two basic functions: First, it
Most people know how to use Internet Explorer®, which is pre-installed in all Microsoft Windows®
26 The Role of Computers and the Type of Computing Skills Required in Surgery
computers. It is also worth knowing about other free browsers that may have improved security and use up less computing resources; some examples are Mozilla® Firefox® (http://www.mozilla.org/), Opera (http:// www.opera.com/), and Safari® (http://www.apple. com/safari/).
327
the server is also backed up periodically. It can be an expensive option and is normally calculated by space required, and this space is “rented” in a monthly/ yearly basis. The other advantage is the ease of access anywhere in the world, as long as there is an internet connection. Examples are Carbonite® (http://www.carbonite.com/) and Mozy® (http:// mozy.com/).
26.4.1.7 Backup This is a very important topic. Backup programs are ideally automatic, easy to setup, fast, and able to backup to more than one location. The safest option is to have more than one copy of backup in more than one location.
Local Copy External hard drives are now relatively cheap and often come with a free backup program. Otherwise, a good software product can be purchased from Acronis® (www.acronis.com), which can backup to a network drive or a local hard disc. Other advantage is the use of incremental backups, which means that after the initial backup, subsequent operations only look for changes made and shorten the time of operation.
Network Drive Network-attached storage (NAS) is an alternative solution with a number of added security options. NAS can be connected to the home wireless/wired network, and can obviously be accessed by all the computers connected with the home network. More expensive NAS devices include 2–4 hard discs arranged in Redundant Arrays of Inexpensive Disks (RAID), which, depending on the settings, can tolerate one complete hard-disc failure and still recover all the data.
Internet Backup Solutions Probably, the most convenient and safest solution is where the data is backed up in a remote server and
26.4.1.8 Communications Exchange of information between surgeons in different countries has relied on conferences and journal communications. It is now possible to have free internet-based video conferences, virtual meetings, and electronic publications. Two examples of innovative use of free internet-based tools are discussed as follows:
Voice over Internet Protocol (VOIP) There are many free and reliable software packages for video conferencing. Skype® (www.skype.com) and iChat® (pre-installed with Apple® PCs) are used for manipulating a surgical robot in Seattle from London (Fig. 26.3).
Virtual Worlds Conferences and meetings usually involve travelling for most attendees; this can incur considerable cost and time spent. Virtual worlds are computer-simulated environments, where individuals may interact with each other using Avatars through text or voice. Some of the earlier virtual worlds, like chat rooms, were poorly regulated, and resulted in social abuse. Second Life® (http://secondlife.com/) is a newer generation virtual world, where inhabitants can create businesses and communities within the world. Its rapid acceptance coupled with the improved technology and internet connection speed, has meant that it is more feasible to organise a virtual conference within these virtual worlds. Figure 26.4 shows the first virtual surgical conference in Second Life® (http://ivas.wordpress.com/).
328
J. J. H. Leong
fast program using relatively little computer resources. Its other advantage is the ability to search network drives. Picasa® (http://picasa.google.com/) and ACDSee® (www.acdsee.com) are programs that index and organise picture files quickly and efficiently.
26.4.2 Needs-Based Computing Skills 26.4.2.1 Manuscript Preparation and Reference Management Apart from basic word processing skills, referencing can be time-consuming, especially if there are significant changes in the manuscript after the first draft. Reference Manager® (http://www.refman.com/) and Endnote® (http://www.endnote.com/) are two referencing software products, which can automatically update the citations according to the changes in the manuscript, and also search directly from PubMed (http://pubmed.gov/). They both have pre-defined output styles as required by different journals, and custom options for adjustments. Fig. 26.3 VOIP
26.4.1.9 File Management
26.4.2.2 Databases
Apart from using the basic folder > subfolder filing system integral to the OS, there are other solutions to find files that are hidden in dark corners within the computer. Copernic™ (http://www.copernic.com/) is a free desktop search program capable of searching keywords in a document, email, music, picture, video, or even contact. After an initial indexing scan, it is a
This topic has been presented in Chap. 25.
Fig. 26.4 Second life
26.4.2.3 Programming Computer programming or software development is not generally necessary in everyday surgical computing, although some basic knowledge in computer programming can often improve the productivity in research and data analysis. As briefly described earlier, Visual Basics for Application (VBA) is a relatively simple language that can increase the efficiency of handling relatively large data sets using Macros in Excel®. An internet search of keywords “VBA tutorial” should return links with free tutorials and examples of writing VBA scripts, which is beyond the scope of this chapter. However, there is a maximum number of data rows in Microsoft Excel® (65,536 in Excel® 2003, and 1,048,576 in Excel® 2007), which may not be adequate.
26 The Role of Computers and the Type of Computing Skills Required in Surgery
329
Matlab® (http://www.mathworks.com/) is a numerical analysis software and high-level programming language, and can handle larger data sets than Excel®. There are many additional analysis packages that can be bought in addition from MathWorks™. Its other advantage is the large library of code contributed by the user community; for example, the Camera Calibration Toolbox for Matlab® (http://www.vision.caltech.edu/bouguetj/calib_ doc/) is probably one of the most referenced Matlab® toolboxes. There is a much steeper learning curve for using Matlab®, and some prior knowledge in vectors and matrices (linear algebra) is highly recommended.
and also a POP3 server, with 500 Mb of capacity. The online textbooks include a full version of the Oxford Textbook of Medicine. Relatively recently, a very useful feature has been added, called Journal Watch, where the editor reports interesting articles from highimpact journals (e.g. The Lancet, BMJ) in a short and digested format. It also has a very active forum, recently even more popularised by the introduction and controversy of the Modernizing Medical Career (MMC). Other features such as Continuing Medical Education (CME) modules and job advertisements are also very useful.
26.4.3 The Internet Resources
26.4.3.4 Medscape®
There are numerous internet-based search engines, forums, and databases. This section will concentrate on the generic online resources; later in the chapter, we will concentrate on the use of the internet in specific situations.
Medscape® is a similar concept as doctors.net.uk mentioned earlier, with the capacity to customise the information sent to the user by specialist interests. Registration also allows access to eMedicine®, which has a large database of medical pathology.
26.4.3.1 Google™
26.4.3.5 Sermo™
Google™ uses a rating system based on an algorithm developed by the founders, called PageRank™. In essence, a particular website has a rating according to the number of links it receives. This rating is then used as a weighting scale; when a keyword is used for searches, the results will be displayed in this order. Recently, a study showed that by using 3–5 symptoms from the case records of the New England Journal of Medicine as search terms, Google™ returns a correct diagnosis in over half of the cases.
This is a US-based online community for doctors (http://www.sermo.com/), first started off as an adverse drug-effect reporting system, and has grown into a discussion board or forum for doctors. Individuals can post clinical questions or observations in the forum, and other members can contribute by commenting on them. It is worth noting that the site is user-moderated, and cannot be treated as peer-reviewed scientific evidence. Recently, it has received challenges from pharmaceutical companies.
26.4.3.2 Google™ Image 26.4.3.6 Wikisurgery Using terms like “Chest XRay” or “Achilles Tendon Rupture” in Google™ Image, medical images can easily be found on the internet. It should be emphasised that copyright issues still apply for web-based images.
26.4.3.3 Doctors.net.uk It is a UK-based company providing free internet resources for doctors. It provides a web-based email
This is a surgical site (http://wikisurgery.com/) based on the Wikipedia® format, where users can contribute by posting information about surgical pathology. Probably most useful for the surgical trainees are the posted operation scripts, which are step-by-step guides to perform particular operations. To quote from the website, “no information in this script should be used without the approval of a fully trained practising surgeon”.
330
J. J. H. Leong
26.4.4 Role of Computers in Specific Environment 26.4.4.1 Education Environment In surgical education, there is a combination of didactic and surgical skills teaching, which usually encompass the use of bedside teaching, lecture/conference-based learning and apprenticeship. 26.4.4.2 Bedside Teaching Bedside teaching can be enhanced by using a remote presence robot, like the RP-7® robot from InTouch Health® (http://www.intouchhealth.com/products_rp7robot.html), where live interaction between the teaching physician and the patient can be transmitted directly to a lecture theatre for trainee doctors. This certainly reduces the traditionally overcrowded teaching ward rounds, while the students can still have live interactions with the teachers.
Tips for using pictures in Powerpoint: 1. When inserting pictures, make sure that the resolution is more than 300–400 pixels per side, especially when presenting in a large screen. 2. Too many high-resolution pictures can increase the size of the file. Double click on a picture in the presentation > picture tab > Compress… > apply to All pictures in document > Change resolution Web/ Screen (or Print if higher resolution is desired) > select Compress pictures and Delete cropped areas of pictures > OK. 3. Inserting pictures in slide master. View > Master > Slide Master. The same pictures can then be inserted in every slide, for example, the logo for the institution. 4. Use a photo editing software to enhance and emphasise a picture. The Photoshop® from Adobe® (www.adobe.com) is a powerful tool for this, but can be expensive to buy. The GNU Image Manipulation Program (GIMP, www.gimp.org) is a free alternative.
26.4.4.3 Creating Lecture Presentations Pictures
Movies
A well-chosen picture can convey the most complex ideas, and there are many internet-based resources that can be used for searching clinical photos. These are listed as follows, but one must be aware of the copyright issues when using them:
A well-chosen movie illustration is perhaps even more effective than a photo; however, this is probably the cause of most interruptions during presentations. Most of these issues are due to incompatible codecs, and there are ways to minimise problems with videos on Powerpoint®.
Casimage
http://pubimage.hcuge.ch/
X-Rays
XRay2000
http://www.e-radiography. net/
X-Rays
Google Image
http://images.google. com/
Miscellaneous
Flickr®
http://www.flickr.com/
Miscellaneous
List from Karolinska Institutet
http://www.mic.ki.se/ MEDIMAGES.html
Miscellaneous
®
Codec
Primal Pictures http://www.primalpictures.com/
Anatomy
cgCharacter
Anatomy
http://www.cgcharacter. com/
This is usually a software for encoding and decoding a video signal; the word “codec” is the combination of “compressor–decompressor” or “coder–decoder”. There are many codecs available, and it is usually a balance of compatibility, speed, quality, and file size. The following is a summary of the codecs commonly used and the author’s experience with them; a more detailed description can be found in http://people.csail.mit.edu/tbuehler/video/codecs/avi. html#msvid:
26 The Role of Computers and the Type of Computing Skills Required in Surgery Codec
Compatibility
Compression
Quality Ease of use
Microsoft® MPEG-4 v2
++
+++
++
+++
DivX®
+
++
+++
++
Xvid
+
++
++
+
Indeo®
++
+
++
++
Microsoft® DV
+++
+
++
++
Huffyuv
+
+
+++
++
Microsoft® Video 1
+++
+
+
+++
Windows® Media Video
+++
++
++
+
Cinepak®
+++
+
++
+++
331
some advanced editing functions for videos. Virtualdub does not take up much system resources and does not need to be installed. Video codec conversion is easy; after editing the video, choose Video > Compression… > select codec > OK. Then File > Save as AVI…. Video Capture can also be done using AMCap (http://amcap.en.softonic.com/), which is a free program that can capture videos from any video device attached to the computer in real time. Adobe® Premier Pro® (http://www.adobe.com/ products/premiere/) is a professional video editing and capture software, which has been used to produce commercial movies. It is obviously a more expensive option, but the Adobe® Premier Elements® is a stripped down version of this. A more comprehensive comparison of the video editing software can be found in http://en.wikipedia. org/wiki/Comparison_of_video_editing_software.
Capture and Editing Inserting Movies and Package for CD Simple video editing can be done using the free software available. The Windows Movie Maker® can certainly capture videos from digital camcorder or camera, and some basic editing and codec conversion can be achieved. This program does favour exporting in the Windows Media Video® format, rather than using other codecs described earlier. Virtualdub (http://www.virtualdub.org/) is a free downloadable software, which offers video capturing with
Fig. 26.5 Saving a movie file
Inserting Movies in PowerPoint® is reasonably straightforward. Choose Insert > Movies and Sounds > Movie from File… > select the file > OK. Then, choose the movie to start automatically, which means the movie starts when the slide is displayed. Right click on the movie, and choose Edit Movie Object, and the following pop up menu should appear as given in Fig. 26.5.
332
Near the bottom of the pop up menu, in the Information section, notice the File path is E:\…\ABOS 2008\ L3N1comp.mpg. This is important, as the video will not play if the file is moved to another directory that is not exactly the same as the displayed path, for example, to a memory stick. The way round this is to use the function File > Package for CD… > Copy to Folder… > choose a folder location > OK. This will copy the presentation and all the linked movies to this directory, together with a PowerPoint® viewer, such that it can be viewed even in a computer without PowerPoint® software installed. The difference can be seen in Fig. 26.6. Notice that the movie file path is now listed as L3N1comp.mpg, that is, the movie will play as long as the file is in the same directory as the presentation. Tips for using Movies in PowerPoint®:
J. J. H. Leong
Converting Whole Presentations into Flash A program like Articulate® Presenter (http://www.artic ulate.com/products/presenter.php) can convert the whole PowerPoint® file into a Flash-based presentation. This can enable the incorporation of narration into the presentation, while making it more interactive and allowing users to navigate through the presentation. It is best used to disseminate lectures and presentation in a web-based environment, and can be viewed using a Adobe Flash player® already installed in most internet browsers.
26.5 Clinical Environment
26.5.1 Picture Archiving and Communications System (PACS) 1. Edit movies in Virtualdub, and then set compres2. 3. 4. 5. 6.
sion to Microsoft® MPEG-4 VKI Codec V2 Change the file name extension from example.avi to example.mpg Insert Movie as described earlier, and select to start automatically Package for CD as described earlier, and copy to a temporary folder Copy the temporary folder to a memory stick Test presentation before use
Fig. 26.6 Saving a movie file (package for CD)
PACS has undoubtedly changed the way our clinical practice is run; as a part of the National Programme for IT by NHS Connecting for Health, it has enabled the electronic storage of X-rays and scans, which can be retrieved in computers across and between NHS trusts. This has greatly reduced the incidence of lost X-rays while in transit between clinics, wards, and operating theatres. Some systems also allow access from a remote
26 The Role of Computers and the Type of Computing Skills Required in Surgery
site, making it more convenient to seek specialist opinions about the radiological images. It does, however, have its disadvantages in clinical practice. Pre-operative templating in Orthopaedics has relied on standard X-ray magnifications and the ability to draw on the hard copy of the radiographs, though there are now electronic templating softwares like OrthoView™ (http://www.orthoview.com/) to substitute. Each NHS trust also uses slightly different software for PACS, creating a small learning curve when working in a different hospital. Digital transmission of PACS images between systems is not always possible, and these images have to be burnt on a CD ROM with a viewing software (which can be difficult to use and lack confidentiality) for transfer. Lastly, a system crash would mean none of the images can be viewed throughout the hospital.
26.5.2 Logbook Most surgical trainees will be familiar with the electronic logbook (www.elogbook.org) that can be accessed using an internet browser, and www.iscp. ac.uk or downloaded onto a PDA or a local computer, and synchronised with a central server over the internet.
26.5.3 Web-Based Clinical Resources The following is a short list of commonly used clinical resources:
333
26.5.4 Hospital Information System Hospital information system (HIS) is a broad term for a management system for the administrative, financial, and clinical aspects of a hospital. There are many computer software that are running in the background of the day-to-day operation of the hospital. Patient Administration System (PAS) is an administrative system that can range from keeping the basic demographic details of the patients to managing waiting lists for operations and clinics. For example, Oasis is a product by Capula Health (http://www.capulahealthcare.com/). Laboratory information system (LIS) is probably the most commonly used system by doctors in the hospital; there are many different LIS that interface with the HIS, providing biochemistry, hematology, and pathology laboratory results.
26.5.5 Medical Software for Hand-Held Devices There are many programs written for PDAs to aid dayto-day clinical practice; pdaMD (www.pdamd.com) has a selection of these that one may find useful. ePOCRATES® (www.epocrates.com) is a mobile medical textbook with drug information and interactions, and information about diseases and diagnostic tests. Skyscape (www.skyscape.com) has a comprehensive collection of medical textbooks for PDA, sorted by specialities. These include Cochrane reviews of specific topics, and eBooks that can easily be referenced during clinical practice.
Orthoteers
http://www.orthoteers.org/
Orthopaedics
Wheeless’
http://www.wheelessonline.com/
Orthopaedics
Clinical Evidence
http://clinicalevidence.bmj.com/
General
National Institute for Health and Clinical Excellence (NICE)
http://www.nice.org.uk/
Clinical guidelines, technological appraisals, and interventional procedures
Cochrane
http://www.cochrane.org/
Evidence-based reviews
Internet Viewings in the Annals of the Royal College of Surgeons of England
http://www.ingentaconnect.com/ content/rcse
General
Trip database
http://www.tripdatabase.com/
Evidence-based reviews
334
Electronic Patient Record on PDAs is perhaps the most wanted feature for clinicians; however, this concept is not that easily achievable. First, the data from each PDA has to be synchronised with a central server, and the information must be accessible for other clinicians in the institution. This has to be a part of a larger switch of the institution to electronic medical records. Second, data need to be sufficiently encrypted to protect the loss of patient confidentiality. Patient Tracker (http://www.patienttracker.com/) and Inchware (http://www.inchware.co.uk) are off-theshelf systems for electronic medical record. ExtraMed (http://www.extramed.co.uk/) and Kelvin Connect (http://www.kelvinconnect.com/) provide more bespoke systems, which can suit individual needs.
26.6 Research Environment 26.6.1 Electronic Journals Almost all current issues of medical journals are now available online, and most of them provide free abstracts; the British Medical Journal (www.bmj.com) actually has selected free full text articles for general access. The Royal College of Surgeons of England provides an Athens password for accessing their collection of electronic journals (http://www.rcseng. ac.uk/library/collections/ejournals.html).
26.6.2 Bibliographic Database MEDLINE was developed by the National Center for Biotechnology Information at the National Library of Medicine, located at the National Institution of Health. It includes over 17 million citations dating back to the 1950s. EMBASE from Excerpta Medica contains over 11 million records from 1974 till present, and is updated within 2 weeks on receipt of the original journal. PubMed and OVID are systems that search the MEDLINE® database; OVID can also include EMBASE in the searches. The following is a comparison of the PubMed and OVID:
J. J. H. Leong OVID
PubMed
Access
Restricted by license to University personnel
Free
Other database coverage
CINAHL; PsycInfo; CancerLit; ERIC; Current Contents; HAPI; Cochrane; DARE; Best Evidence; HealthSTAR, and most NLM databases
MEDLINE® In-Process & Health STAR
Full-text availability
Over 200 journals
Links provided are accessible only if already owned
The Cochrane Collaboration (www.cochrane.org) publish the Cochrane Library on a quarterly basis, containing all up-to-date Cochrane Reviews based on the best available information about healthcare interventions through systematic reviews. Google™ Scholar (http://scholar.google.co.uk/) is a relative newcomer since 2004, and is still in beta version. It is a web search engine that indexes most peerreviewed online journals.
26.6.3 Citation Report Thomson Scientific provides online databases called Institute for Scientific Information (ISI) Web of Knowledge (http://isiwebofknowledge.com); the three main databases most used by academic surgeons are: Web of science that represents another bibliographic database like MEDLINE®, ISI Proceedings examines proceedings of international conferences, and Journal Citation Reports® as described further. Journal Citation Reports® return, among other parameters, the Impact Factor and Immediacy Index. The Impact Factor in 2006 is calculated by: A – the number of citations (in 2006) with regard to the journal’s articles in 2004–2005 B – the total number of articles published in 2004–2005 Impact Factor 2006 = A/B The Immediacy Index in 2006: C – the number of citations (in 2006) with respect to articles published in 2006
26 The Role of Computers and the Type of Computing Skills Required in Surgery
D – the total number of articles published in 2006 Immediacy Index 2006 = C/D Although there are lots of criticisms against the use of Impact Factor to rank the quality of articles published, it is still a reasonably objective measure.
26.6.4 Publish or Perish Journals that are not ISI listed will not have a calculated Impact Factor; this does not necessarily mean that the journal has no impact in its field. Publish or Perish (http://www.harzing.com/resources.htm#/pop. htm) is an interesting software that uses Google Scholar™ to calculate the raw citations to the author’s published articles. It has various output parameters, which try to measure the author’s academic achievements.
26.6.5 Statistical Packages The Statistical Package for the Social Sciences (SPSS)® (http://www.spss.com/) is one of the most used statistical analysis software for scientists. Most universities provide some basic SPSS tutorials, and some even have a Statistical advisory service to help with data analysis. A simple manual would also be useful to guide novices with SPSS. Discovering statistics using SPSS (Second Edition) by A. P. Field is a very good book for this. STATA® (http://www.stata.com/) has a slightly less user-friendly interface, but is more useful if complex modelling is required.
26.6.6 Ethical Approval All ethical applications must be made through the National Research Ethics Service (NRES, formerly COREC) online form, which is available at http://www. nres.npsa.nhs.uk/. The completed application will then be passed onto the local NHS Research Ethics Committee (REC), if the research involves patients from the NHS. Certain types of applications need to be booked using the Central Allocation System (CAS), which will then direct the applications to the appropriate REC.
335
The Medicines and Healthcare products Regulatory Agency (MHRA) approval will be needed if a novel medical device or medication is to be used for patient trial. Their website is http://www.mhra.gov.uk.
26.7 The Future 26.7.1 Web 2.0 Web 2.0 has incipiently become a part of the everyday internet experience. It really describes the change in the way the internet is used, toward a more interactive media and information sharing. The appearance of blogs and social-networking sites really marks the insurgence of Web 2.0. There are still disagreements over the strict definition; however, it appears that while Web 1.0 focused on users who were passively absorbing information on the internet, Web 2.0 emphasises the importance of active contribution of the web communities by their users. It is uncertain how this is going to influence medicine, though projects like Wikipedia® (http://en.wikipedia. org), where a web-based encyclopaedia that is written collaboratively by its own users, can certainly inspire novel media for information sharing and validation (see Wikisurgery earlier). Search engines that are humanpowered are now available (http://www.mahalo.com/); perhaps, this could influence how we can access the 17 million records on MEDLINE®.
26.7.2 Virtual Reality Virtual reality is neither a new concept nor a novel technology; however, it has been shown to be effective both as assessment tools and training tools. The Minimally Invasive Surgical Trainer – Virtual Reality (MIST-VR™, http://www.mentice.com/), although no longer technologically advanced, has been shown to be a valid assessment tool in minimally invasive surgery, and it seems that the skills acquired from the simulator transfers to the operating theater. More recent virtual reality surgical simulators (http://www.simbionix.com/LAP_Mentor.html) have become more realistic by including force feedback,
336
J. J. H. Leong
and using patient-specific anatomy reconstructed from CT images. It is not unfeasible to think that the future operations can be rehearsed by the trainee surgeon, the day before the actual operation. Though these simulators are laparoscopic-based, which limits the degrees of freedom that need to be modelled, future virtual surgical trainers could be used to assess open surgery, perhaps using a pair of glove sensors.
Though the uptake of computer navigation for orthopedic surgical planning and execution has been slow, there are now academic societies for surgeons interested in computer-assisted orthopedic surgery (http://www. caos-international.org/). Future application of guided surgery may prove to be useful in soft-tissue surgery, especially in Natural Orifice Transluminal Endoscopic Surgery (NOTES).
26.7.3 Robotics
26.8 Conclusions
Perhaps, the most exciting future developments would be the use of Surgical Robotics to decrease the technical demand of the surgeon; the existing da Vinci® Surgical Systems from Intuitive (http://www.intuitivesurgical.com/) is probably the surgical® best selling surgical robot. It provides 3D vision, motion scaling, and extra degrees of freedom of movement that are not available from traditional laparoscopic surgical instruments.
Although this is not an exhaustive list of realities and possibilities of applications in surgical computing, it is very likely that the role of computer in surgery is only going to expand. Observing others using computers and the perseverance of trial-and-error are the best ways to become familiar with the technology. If there is a difficult problem, it is quite likely that someone else would have also encountered it; hence, search engines can often find the solution.
Computational and Statistical Methodologies for Data Mining in Bioinformatics
27
Lee Lancashire and Graham Ball
Abbreviations
Contents Abbreviations ..................................................................... 337 27.1
Introduction ............................................................ 338
27.1.1 Advents and Early Approaches in Development of Medical Diagnostics ............................................ 338 27.2
Experimental Methods........................................... 338
27.2.1 Mass Spectrometry ................................................... 338 27.2.2 Microarrays .............................................................. 338 27.3
Challenges in Biomarker Discovery ..................... 339
27.3.1 27.3.2 27.3.3 27.3.4
Quality Control......................................................... Dimensionality and Complexity of the Data ............ Reproducibility ......................................................... Multiple Testing and Control of Error Rates............
27.4
Computational Methods for Data Analysis ......... 340
339 339 340 340
27.4.1 Conventional Parametric Statistics ........................... 340 27.4.2 Pattern Classification and Modelling Using Unsupervised Methods ............................................. 341 27.4.3 Pattern Classification Using Predictive Supervised Methods .................................................................... 343 27.5
Model Evaluation Using Cross-Validation ........... 346
27.5.1 Measuring Performance with ROC Curves .............. 347 27.6
Summary and Conclusions .................................... 347
References ........................................................................... 348
L. Lancashire () Paterson Institute for Cancer Research, University of Manchester, Manchester, M20 4BX, UK e-mail: [email protected]
AUC ANN CART FDR GA KNN LDA MLP MS PCA ROC RSCV SELDI SVM
Area under the curve Artificial neural network Classification and regression trees False discovery rate Genetic algorithm K-nearest neighbours Linear discriminant analysis Multi-layer perceptron Mass spectrometry Principal components analysis Receiver operating characteristic Random sample cross-validation Surface-enhanced laser/desorption ionisation Support vector machine
Abstract The aims of this chapter are to provide an overview of the high-throughput technologies currently available, with particular focus on genomic and proteomic analyses of biological samples. Further, a nonmathematical overview of the statistical and computational methods that are available for the analysis and subsequent data mining of these data will be discussed, together with an outline of the careful considerations that have to be made prior to these analyses. Given that the literature is vast within the area of computational biology, we seek to present an overview of some of the most commonly used methods. Additionally, apart from describing the methods, we will illustrate selected examples with results from our own studies using real data sets.
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_27, © Springer-Verlag Berlin Heidelberg 2010
337
338
27.1 Introduction 27.1.1 Advents and Early Approaches in Development of Medical Diagnostics The advent of post genomic technologies and their application to biomedical problems has resulted in a massive increase in the complexity of data being generated. Subsequently, there is a requirement for more refined statistical and computational methods for analysis of these data sets. These data sets may be viewed as a series of vectors whose components are filled with numbers from an experiment. Through comprehensive analysis and modelling approaches, structures or classes in the data space can be defined and de novo predictive biomarkers identified and clinical decisionsupport systems developed. The aims of this chapter are to provide an overview of the high-throughput technologies currently available, with particular focus on genomic and proteomic analyses of biological samples. Further, a non-mathematical overview of the statistical and computational methods that are available for the analysis and subsequent data mining of these data will be discussed, together with an outline of the careful considerations that have to be made prior to these analyses. Given that the literature is vast within the area of computational biology, we seek to present an overview of some of the most commonly used methods. Additionally, apart from describing the methods, we will illustrate selected examples with results from our own studies using real data sets.
L. Lancashire and G. Ball
coverage, given the current technological limitations) of the genome or proteome. Two of the methods that are commonly used for high-throughput sample profiling are gene microarrays and mass spectrometry (MS). These technologies are complementary to one another in describing biological systems, and their basic principles will be briefly outlined. MS approaches, more specifically matrix-assisted laser/desorption ionisation and a modification of this named surface-enhanced laser desorption/ionisation (SELDI) time of flight MS, have been used to generate proteomic profiles of biological samples. Simply, a mass spectrometer consists of an ion source, a mass analyser to measure the mass/charge ratio (m/z) of the analytes that have been ionised (mass spectrometers do not measure mass directly, but rather the mass to charge ratio of ions formed) and finally a detector that records the number of ions at each m/z value, generating a spectrum or “fingerprint” for the sample being analysed. For an overview of the method, see [62]. These analyses generate profiles consisting of tens of thousands of points to hundreds of thousands of points, each point representing a protein mass, a peptide mass or a fragment of the above. This high dimensionality provides a barrier to analysis and limits many methods. There has been much debate regarding the suitability of various methods for proteomic profiling, especially with respect to biomarker identification. This debate is presented in [5, 26, 62] and will not be dwelt upon further here.
27.2.2 Microarrays 27.2 Experimental Methods 27.2.1 Mass Spectrometry With the advent of techniques such as high-throughput proteomics and genomics, the potential for identification of new biomarkers has increased massively. These methods facilitated the comprehensive profiling of samples representing disease states. The hurdle to overcome with these technologies was the sheer complexity of the data generated. This complexity is necessary to represent coverage (or even partial
A DNA microarray consists of a solid surface onto which DNA molecules have been chemically bonded. The purpose of microarrays is to detect the presence and the abundance of labelled nucleic acids in a given biological sample, which will then hybridise to the DNA on the array and become detectable via the label. The source of the labelled nucleic acids is the mRNA of the sample of interest and, therefore, the purpose of a microarray is to measure the gene expression. As there may be thousands of different DNA molecules bonded to an array, it is possible to measure the expression of many thousands of genes simultaneously, leading to the potential for extremely high throughput
27 Computational and Statistical Methodologies for Data Mining in Bioinformatics
analysis. There are essentially two types of microarray technology used today: cDNA and oligonucleotide arrays, such as that marketed by Affymetrix. For a more detailed explanation of these technologies, the reader is referred to [67] and [48] for cDNA and oligonucleotide microarrays, respectively.
27.3 Challenges in Biomarker Discovery 27.3.1 Quality Control Data profiling using complex proteomic and genomic/ transcriptomic technologies is complicated by many factors, such as technical and biological variability, producing “noise” in the data. These may influence the data and show differences between the sample groups that have no true biological meaning. Therefore, data pre-processing is essential in order to minimise the effect that this experimental variation may have upon sample analysis, and to remove questionable measurements. Many data pre-treatment steps have been proposed and a selection of these will be briefly discussed in this section. In gene expression and microarray analysis, it is essential that data be normalised. This is to overcome any experimental bias, such as quantity of RNA, labelling efficiency and bias in the measurement of expression when using microarrays [65]. There have been several approaches proposed for the normalisation of gene expression intensity values, such as total intensity normalisation, mean, median and log centring of data. However, the most popular method is that of logically weighted scatterplot smoothing normalisation [23], because of its ability to remove intensity-dependent effects in the log2 ratio intensity values. In addition to adjusting the mean intensity ratios, one may also normalise by adjusting the intensity values across all arrays so that the variance between the arrays is equal [66], or use quantile normalisation to ensure that all samples have identical distributions [14]. In MS data, it is common to normalise the spectra by creating a profile of relative intensities, to overcome discrepancies in signal intensity from sample to sample. Another common problem in microarray data is that the variability increases dramatically as the measured signal intensity decreases and approaches background.
339
Assuming that this background is constant over the array, it is possible to subtract this value from each spot [87]. If the background is not constant, one approach would be to use only those elements whose signal intensities are statistically significantly different from their respective background intensities. As in genomic analysis, data generated from mass spectrometers do not come without problems. Owing to chemical and electronic noise, baseline effects are common in MS data and often need subtracting. Many algorithms are available for this, although some are far from perfect, and one needs to ensure that important information is not removed with the background. Whilst mass accuracy in MS is generally very good, it is sometimes prone to shifts in m/z between corresponding molecules associated with alignment problems, which is also a major issue when comparing multiple profiles over many samples. Software such as SpecAlign [12] provides a robust solution to this problem using fast Fourier transform. A recent and more detailed discussion of the pre-treatment of mass spectral profiles can be found in [3]. As with microarray data, it may be necessary to transform and normalise mass spectral profiles in order to make the experiments comparable, in addition to the consideration of data smoothing and alignment before any disease-associated factors may be extracted by computational analysis.
27.3.2 Dimensionality and Complexity of the Data One of the major hurdles to the analysis of the data types described earlier is the high dimensionality and complexity of the data. Conventional statistical theory would indicate that for a valid representation of the population, one should have at least twice as many replicates as the number of dimensions in the data. Clearly, a data set requiring hundreds of thousands of samples is not feasible due to sample availability. In practice, large sample sets should be determined by appropriate power analysis and the replication should be sufficient to allow validation sets of the same number of samples. Another problem with the analysis of this data type is caused by the high dimensionality of the data masking the importance of markers. As the dimensionality of the input data space (i.e. the number of parameters)
340
increases, it becomes exponentially more difficult to find global optima for the parameter space due to the scarcity of the data. This has been termed “the curse of dimensionality” [10, 13], and often leads to an input space with many irrelevant or noisy inputs, subsequently causing predictive algorithms to behave badly as a result of them modelling extraneous portions of the space.
27.3.3 Reproducibility Superimposed on the dimensionality issues are issues of the quality of data. In order to identify the biomarkers, the data should be reproducible within samples, between sample runs and across multiple instruments (at least instruments of the same model [26]. This can be optimised through the use of technical and experimental replicates, where filtering and averaging of samples are methods that are commonly used to assess reproducibility and increase the confidence in the profiles for comparison. Technical replicates provide information on the variability that occurs when performing a particular assay, whilst experimental (or biological) replicates give a measure of the natural sample to sample variation. Lack of reproducibility decreases the validity of markers and makes validation, and ultimately clinical use, difficult [53]. Reproducibility of the data adds to the issues of dimensionality by making the relevant data sparser with respect to the optimal solution. Low replication and poor data quality can lead to the introduction of features not representative of disease but of sample run, sample collection, storage and preparation, introducing random features within the data.
27.3.4 Multiple Testing and Control of Error Rates When statistical hypothesis testing procedures are applied in order to identify features that differ between groups of interest in a data set, a number of statistical problems may arise, and even more so in data which are of a high dimensionality such as that generated by mass spectrometers and microarrays. The multiple testing problem is a major concern when a large
L. Lancashire and G. Ball
number of statistical tests are to be performed. As the feature size of the data set becomes larger, so does the number of these features being labelled as “differentially expressed” when they indeed are not. This is known as false positive and it is important to address this issue by testing correctly. The false discovery rate (FDR) introduced by Benjamini and Hochberg [11] is a measure of the number of features incorrectly identified as “differential” and various approaches have been suggested to accurately control the FDR. In addition to controlling and estimating the FDR, other approaches such as controlling the family-wise error rate [20] and empirical Bayes methods [30] have been discussed. Additionally [64], a more detailed discussion on the FDR and the multiple testing problem, in general, has been carried out.
27.4 Computational Methods for Data Analysis 27.4.1 Conventional Parametric Statistics Conventional statistical approaches to data mining for biomarkers based on probability measures have been commonly used. Fold changes are commonly used to screen gene microarray data sets to determine and rank influential markers, with P values being calculated using parametric (e.g. Student’s t test) or non-parametric (e.g. Wilcoxon statistic) tests. One example of the use of fold changes P values to derive a diagnostic panel is presented in [36], where they were successfully used to derive a diagnostic panel for mesothelioma. Additionally, one can refer to [71] for the use of P values to pre-screen for genes correlated with clinical characteristics in prostate cancer. These were then followed up with other modelling algorithms such as K-nearest neighbours (KNN) and Kaplan– Meier survival curves. An earlier study [34] used similar methods for the classification of acute myeloid leukemia. The significance analysis of microarrays is a popular extension of conventional parametric statistical tests and was proposed by Tusher et al. [78]. This analysis identifies gene markers by assigning a score to a given gene based on differential gene expression between classes (ultimately based on variance of
27 Computational and Statistical Methodologies for Data Mining in Bioinformatics
PC1
Dimension 2
expression through a population) compared with the standard deviation of multiple measurements for the same population. Multiple permutations of this approach are then used to determine those genes identified as important by random chance (false discovery). FDR is then calculated and the significant genes determined.
341
27.4.2 Pattern Classification and Modelling Using Unsupervised Methods
PC2
27.4.2.1 Principal Components Analysis Principal components analysis (PCA) is one of the most widely used multivariate techniques for input dimensionality reduction in data sets where the number of inputs far exceeds the number of cases. PCA transforms the input space into a new space described by what are known as principal components, which are expressed as linear combinations of the original variables. These principal components lie orthogonal to one another and are ranked according to an eigenvalue. By selecting the vectors with the largest eigenvalues, the vectors that map the largest variations in the input space are determined. Therefore, the ultimate aim of PCA is to capture those vectors or principal components which explain the most variation in the data, thus reducing the dimensionality of the data space [40]. The main limitation of using PCA for proteomic and gene expression data is the inability to verify the association of a principal component vector with the known experimental variables. This often makes it difficult to accurately identify the importance of the proteins or genes in the system. Marengo et al. [52] applied PCA analysis to proteomic data generated from neuroblastoma tumour samples and identified two groups of samples in the data set. By analysing the loadings of the principal components, they could identify the discriminatory variables, and by following this up with MS, they identified proteins responsible for the differences occurring between healthy and diseased samples. Liu et al. [50] analysed the gene expression data using PCA as a dimensionality reduction tool, followed by logistic regression for classification purposes. This approach was applied to five publicly available tumour-based
Dimension 1
Fig. 27.1 Principal components analysis separates subgroups in a data set
data sets, and was able to distinguish different classes with high accuracy. Figure 27.1 shows the principles of separation of subgroups in data by PCA.
27.4.2.2 Clustering Hierarchical clustering is routinely used as the method of choice when analysing gene expression data [2, 63, 80, 85], and functions by arranging the profiles of samples into a tree-like structure so that the most similar profiles lie close together, and profiles very different to one another lie farther apart, allowing for the rapid visual assessment of patterns within the data. The methodology is based on the construction of a distance matrix that enables the two samples with the most similar profiles to be determined. These are then placed together in the tree to form a cluster, and the distance between this newly defined cluster and the remaining samples is calculated. A new cluster is then determined and this process is repeated until all of the samples have been placed in a cluster. There are various linkage methods used for calculating distance, such as single linkage, complete linkage and average linkage. Single linkage computes the distance as the distance between the two nearest points in the clusters being compared. Complete linkage computes the distance between the two farthest points, whilst average linkage averages
342
all distances across all the points in the clusters being compared. Similarly, there are also several distance metrics, which can be used to compute this value, such as Pearson correlation and Euclidean distance. Different linkage methods and methods of calculating distances often lead to very different dendrograms, and hence it is recommended that many methods are applied before drawing conclusions regarding the relationships in the data [74]. Clustering has been used in the medical field mainly for the analysis of gene expression data from cancer patients in order to correlate individuals having similar expression profiles with gene transcripts having similar expression through the population. van ‘t Veer et al. [80] used hierarchical clustering to analyse primary breast tumours to identify a gene expression signature, which was predictive of a “poor prognosis signature” and Scherf et al. [68] used clustering methods based on linkage distance to derive a molecular profile for 60 cancer lines relating to drug responsiveness. Welsh et al. [85] used oligonucleotide microarrays of approximately 6,000 genes to identify candidate markers of epithelial ovarian cancers. They found that normal tissues were easily separated from tumour tissues, and that the tumours could be further divided into groups correlating with known histological and clinical observations. Further examples of clustering can be found in [39, 73, 76]. One major problem concerning clustering is that it suffers from the curse of dimensionality when analysing complex data sets. In a high dimensional space, it is likely that for any given pair of points within a cluster, there exist dimensions on which these points are far apart from one another. Therefore, distance functions using all input features equally may not be truly effective [28]. Furthermore, clustering methods will often fail to identify coherent clusters due to the presence of many irrelevant and redundant features [37].
27.4.2.3 Self-Organising Maps Kohonen self-organising maps [46] consist of just two layers, an input layer and an output layer. The output layer of these networks may be two-dimensional [79], so that these networks may be used to map a threedimensional surface onto a two-dimensional map [7]. The training patterns are presented to the input layer,
L. Lancashire and G. Ball
then propagated to the output layer and evaluated, with one output neuron being labelled as the “winner”. The network weights are adjusted during training and this process is repeated for all patterns for a pre-determined number of epochs, forming clusters within the data. These networks are unique in that they autonomously self-organise themselves and converge into a stable structure representing the information that has been learnt [59]. This enables the discovery of patterns within the data, so that cases with similar expression profiles are grouped together, allowing for visualisation by the way of topographical map.
27.4.2.4 Decision Trees, Classification and Regression Trees (Cart), Boosted Decision Trees and Random Forest Methods Another extension of clustering and classificationbased methods is decision trees. In this instance, a decision is made based on a feature that separates classes (one branch of the cluster dendrogram from another) within the population. This decision is based on a logical or numerical rule. Decision trees have been used in the analysis of SELDI data derived from serum for the diagnosis of prostate cancer [1]. Classification and regression trees (CART) was proposed by Breiman et al. [16]. This approach is used for both classification and regression problems. Through the process of analysis (as in decision trees), multiple if–then logical splits are produced, which allow classification or regression rules to be derived. Wadsworth et al. [81] used CART-based methods to classify head and neck cancers. Regression trees may be further enhanced by boosting, a process where classifiers are derived that allow prediction of those not correctly predicted by earlier steps. Here, the error in earlier steps is predicted and forms the basis for the boosted decision tree algorithm. These approaches have been applied to the analysis of SELDI MS data by Adam et al. [1], who used them for the diagnosis of prostate cancer. Another extension of tree-based classification methods is the random forest classifier, proposed by Breiman [17]. Here, multiple trees are linked by taking the mode of the output class prediction. This approach has been shown to be very good at making generalised
27 Computational and Statistical Methodologies for Data Mining in Bioinformatics
343
classifications [17]. The approach essentially derives each tree from a random vector with equivalent distribution from within the data set, essentially an extensive form of cross-validation. Examples for the use of random forest in the classification of cancer include [44], in which this approach was used in the analysis of SELDI data within a cancer prevention trial. Diaz-Uriarte and Alvarez de Andres [27] used the approach to identify genes from a microarray data set that classified a range of cancers and [56] used them for the diagnosis of urinary biomarkers of transitional cell carcinoma.
allow representation of a class membership and of binomial distributions. The approach is an extension of the generalised linear model [58] using a logit link function. Logistic regression has been used to develop multiple models using a range of biomarkers including plasma insulin-like growth factor to predict prostate cancer risk [19], CA125 and other clinical markers for the diagnosis of malignancy in women with adnexal masses [4] and identification of breast cancerspecific diagnostic biomarkers in serum from SELDI MS data [47].
27.4.3 Pattern Classification Using Predictive Supervised Methods
27.4.3.2 K-Nearest Neighbours
There are also numerous methods available that allow for the classification of samples and are subsequently able to position new samples with unknown class membership into a particular group. Here, the ultimate aim is to identify genes or proteins (or subsets of these) through the development of models of the relationship between these biomarkers and clinical questions of interest, for example, in the prediction of response to therapy in a patient-specific manner. In this instance, model performance for a given set of biomarkers provides an indication of the importance of a biomarker in a given system. Thus, a correlation exists between a marker and (1) an outcome class, for example, healthy and diseased or (2) a continuous variable, for example, post-operative survival time. This is achieved by generating predictive models using the measurements across a number of variables (e.g. ion mass intensities or gene expression ratios) for samples whose class is known a priori. This is known as supervised learning, and some of the most popular methods will now be discussed.
27.4.3.1 Logistic Regression One commonly employed method for the identification of biomarkers is logistic regression. This works on the principle of fitting a logistic (sigmoid) function to the relationship between a given biomarker and the classification outcome of clinical questions. Logistic functions are superior to linear functions because they
KNN is one of the simplest methods to perform when deciding the class to which an unknown sample belongs. This method compares the profiles of an unknown sample with those of samples with known group membership. The class of the unknown sample is then determined to be the same as that of the known sample to which it is most similar. There are two parameters to consider when using this method, k and l; where k is the number of nearest samples to look at and l is the margin of victory required in order for a class decision to be made. If this margin is not met, then the sample is unclassified. Thus, for example, if k = 3 and l = 3, then the unknown sample would be assigned to the same class as the nearest three samples, but only if all the three of them belong to the same class, otherwise the sample would be unclassified [74]. Shen et al. [69] showed that using a KNN approach for predicting membrane protein types resulted in high success rates. KNN was also the method of choice in Barrier et al.’s study [8] for creating a classifier from gene expression measurements in colon cancer patients. The main disadvantages of using this method are that in highly dimensional biological systems with many samples, outliers are likely to be present. As this approach is not particularly robust to outliers, this may lead to incorrect classifications when such outliers are present in a given system. Additionally, this approach takes into consideration every attribute of every element when classifying a new sample. Hence, if the target concept depends on only a few features in a highly dimensional data space, the samples that are truly most similar may well be placed a large distance apart [55, 57].
344
L. Lancashire and G. Ball
27.4.3.3 Linear Discriminant Analysis
27.4.3.4 Artificial Neural Networks
Linear discriminant analysis (LDA) calculates the straight line (or hyper plane) between two classes that best separates them. It does so by taking into consideration sample to sample variation within the classes, so as to minimise variance within classes and maximise it between classes. The class of any unknown samples is then determined simply by the side of the hyper plane on which it lies. As LDA takes into account the variation within the sample population, this method sometimes performs better than other linear-based approaches at classifying unknown samples; however, this approach does not extend naturally to data that are not linearly separable and hence, should be avoided with such data sets [74]. This approach was used successfully in [88] where it the aim was to detect serum proteomic patterns by applying SELDI MS technology to the staging of colorectal cancer patients. Gao et al. [33] also utilised LDA where sera from patients with lung cancer and healthy controls were subjected to antibody microarray analysis.
An artificial neural network (ANN) is an adaptive, nonlinear form of artificial intelligence inspired by the way the human brain learns and processes information. The most important facet of this paradigm is the fact that it is built from a potentially large number of interconnected processing elements, which work together to solve specific problems. A popular form of ANN is the multi-layer perceptron (MLP) and is used to solve many types of problems such as pattern recognition and classification, function approximation and prediction. They learn in a fashion that is analogous to the way learning in the human brain is carried out, that is, by example. In humans, learning involves minor adjustments being made to the synaptic connections between the neurons; in ANNs, learning is achieved by updating the weights that exist between the processing elements that constitute the network topology (Fig. 27.2). The algorithm fits multiple logistic functions to the data to define a given class in an iterative fashion, essentially an extension of logistic regression. Once
l1
l2
l3 Input layer
Weighted links (X ) 1 Σ l.f.X Weighted links (X ) 2
Hidden layer
Output layer Σ fX
2
Where f = the transfer function usually a Logistic function Acceptable model
Input
Fig. 27.2 Multi-layer perceptron neural network architecture and the principle of the back-propagation training algorithm
Adaptive system (w)
Change parameters
Network output
Training Algorithm
Error
Actual output
27 Computational and Statistical Methodologies for Data Mining in Bioinformatics
trained, ANNs can be used to predict the class of an unknown sample of interest. Additionally, the variables of the trained ANN model may be extracted to assess their importance in the system of interest. The major disadvantage usually associated with ANNs is that the ability to interpret how they reach an optimal solution is often perceived as difficult and as such, they have been referred to as “black boxes” [29, 72, 77, 83]. A review of their use is in a clinical setting presented in [49]. Here, we will present some of the key examples. BP-MLP ANNs were first proposed for use in the identification of biomarkers from SELDI MS data by Ball et al. [6]. This work has been developed further for melanoma where biomarkers have been derived by MS for late-stage disease [53]. They have been widely used in the identification of markers of biomedical systems. ANNs have also been proposed for the analysis of gene microarray data [45]. Another early use of ANNs in the analysis of gene microarray data was the prediction of oestrogen receptor status in breast cancer [38] and for developing prognostic and diagnostic systems in lymphoma [60].
27.4.3.5 Genetic Algorithms Genetic algorithms (GAs) were first proposed in [42]. They are not predictive per se, but provide the means by which an optimum solution can be defined. Where a potential pool of solutions exists, for example a range of conditions in a logistic or ANN model [41], a GA may be used to define the optimal solution for maximal predictive performance. Using GA, an optimal solution is determined by evolution. Each set of conditions is given a fitness measure. Iterative changes are made through multiple generations and the fittest solution survives in a directed manner. This approach was used to define markers of ovarian cancer in [62].
27.4.3.6 Support Vector Machines Support vector machines (SVMs) are a relatively new development in the machine learning community, and function in a manner similar to LDA, in that they work by separating the data into two regions by constructing a straight line or hyper plane (Fig. 27.3). The advantage that SVMs have over other linear separators is that the data are first projected into a higher dimensional
345
space (by a kernel function, for example, polynomial or radial basis functions) before being separated by a linear method, which allows for discrimination of nonlinear regions of space, and therefore separation of nonlinear data. The class of the unknown sample is then determined by the side of the “maximal marginal hyper plane” on which it lies [24]. This is in contrast to ANNs where a function that separates the classes is defined rather than data transformation to achieve separation. SVMs are a popular classification tool in the biological sciences, and their uses are widely documented [22, 32, 82, 84, 89]. They have been successfully used in the classification of a wide range of biological data, for example, identification of diagnostic peaks from MS data in prostate cancer patients [82]. Meanwhile, Warnat et al. [84] analysed the sets of gene expression data from leukaemia patients and found that the SVM-based classifiers could predict with high sensitivity and specificity. Further examples can be found in [32]. The major disadvantages associated with SVMs are that they are affected by speed and size, both in training and testing, and can be extremely slow when in the test phase [18]. Furthermore, from a practical point of view, for largescale tasks, extensive memory is required due to the high complexity of the data [18, 61]. 27.4.3.7 Bayesian Approaches Bayesian methods are based on the probabilistic approach to model induction. Its theory is founded upon that of Bayes [9]. This theorem is used to assign a certain probability to a hypothesis. This probability is commonly known as the prior probability and allows for the definition of class membership of a given sample based on the prior probability determined from a large number of cases. For example, if from a previously measured group we have 100 cases and 20 are metastatic cancer, we have a 20% prior probability of metastasis. The next step is to examine a subset of cases in the vicinity (within the data space usually coupled with a predictor) of a new case. This defines a likelihood of a new point being a part of the metastatic or non-metastatic group. In a vicinity of five cases around the new point, three are metastatic and two are normal, effectively defining the new point as on the border. The likelihood of the
346
L. Lancashire and G. Ball
Fig. 27.3 A linear hyper plane, derived using a support vector machine, separates one class from another. The plane is rotated in all dimensions until maximum separation is achieved
new point being metastatic is 3/20 and non-metastatic is 2/80. Thus, we define a global probability and a local probability to a defined region of the data surrounding a given point. The prior probability and likelihood for a given point are then combined by multiplication to give a posterior probability of a point belonging to a given class. The highest posterior probability then defines the class. Probability of new point being metastatic = 20/100 × 3/20 = 0.03 Probability of new point being non-metastatic = 80/100 × 2/80 = 0.02 Therefore, the new point is metastatic. Bayesian methods were employed by West et al. [86] in the analysis of breast cancer microarray data where they were coupled with regression methods to define the prognostic profiles associated with ER and
lymph-node status. Furthermore, they have been used in the modelling of tumour-associated antigens for the detection of ovarian cancer [31] and in the integration of microarray data and MS data [25].
27.5 Model Evaluation Using Cross-Validation A common problem when using pattern classifiers is to ensure that they are capable of generalising to a wider population of future cases (i.e. they are markers of the feature of interest within the global population), by providing an estimation as to their likely performance on new data. Classifiers that are capable of classifying a given training set of samples correctly but that fail to predict to a similar level for new cases are said to overfit the data, which are essentially a memorisation of the training data set. Therefore, it is essential to estimate
27 Computational and Statistical Methodologies for Data Mining in Bioinformatics
the performance of these models on new data, in order to be confident that overfitting has been avoided. The most universal approach to address this problem is resampling. Typically, in resampling approaches, the data set is split into different subsets, training and test. The classifier is trained and optimised using the training set, whilst being assessed with respect to a test subset. This helps to avoid over-training and therefore improves the ability of the model to generalise well to new data. This can be enhanced further by splitting the data into three subsets as opposed to just two; these are known as training, test and validation. Training performance is monitored as in the previous method, but here, the classifier is further validated once the model has been trained using the validation data split, which gives an unbiased estimation of the likely performance on future cases. A number of approaches to validation are commonly used, such as Monte Carlo resampling, bootstrapping, k-fold validation and random sample cross-validation (RSCV). Monte Carlo resampling is perhaps the simplest method, where a training, test and validation set are selected at random, with an equal number of cases in each subset. Alternatively, the validation subset may be kept constant, with the training and test sets drawn at random, to enable comparison between models for validation data [13]. Bootstrapping has been shown to be an effective measure of estimating the error of predictive values in neural network models, and therefore is a reliable approach in determining generalisation of the network [75]. In bootstrapping, subsamples of the data are analysed, where many “pseudo-replicates” are created by resampling the original data. Here, cases are drawn at random from the data set, with equal probability, in order to replicate the process of sampling multiple data sets. k-fold validation is an effective approach when the number of samples is not efficient enough to split the data into three subsets. A widely used version of this is called leave one out cross-validation [15, 43], where N divisions are made (where N is the total number of cases in the data set) and in each division, the network is trained on all of the samples except one, which is set aside for test purposes. This process is repeated so that all of the samples are used once for testing. Finally, in RSCV, the training, test and validation data splits are randomised a number of times, so that each sample is represented in the validation split on numerous occasions, enabling confidence to be determined for the predictions on blind (validation) data.
347
The final and ultimate form of validation is through the identity of the biomarkers discovered and by biological confirmation of the role of the identified markers in the system being studied. There has been much debate following an earlier study [62] as to whether the biological relevance of a discovered biomarker should preclude its acceptance. If a biomarker identified has a known function in relation to the feature of interest, then it is likely to be valid. If, however, the function is not known or it is epiphenomenal, then doubt is cast on the validity of the marker. This, however, results in the assumption that we currently know all there is to be known about the feature of interest and limits the discovery process and scientific pragmatism.
27.5.1 Measuring Performance with ROC Curves Receiver operating characteristic (ROC) curves are a common approach used to assess the performance of a classifier [35, 51, 54]. An ROC curve determines the number of true positives, true negatives, false positives and false negatives and produces a summary statistic for performance, detailing model accuracy, sensitivity and specificity. It achieves this by plotting the true positive rate against the false positive rate at different possible cutpoints (in this case, prediction error thresholds). The area under the curve (AUC) value measures the discrimination, that is, the ability of the model to correctly classify the true positives and true negatives. A perfect ROC curve (and therefore a perfect test) would have an AUC value of 1, thus, the closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the test will be.
27.6 Summary and Conclusions A wide range of tools are available for the analysis of complex biological data and there has been much debate over which is the superior method. It is however the method of application that really defines the most appropriate algorithm to use. Cross-validation and biological confirmation are the most important criteria for truly assessing the power of the method
348
applied. This chapter has discussed a range of computational approaches that are available to biologists in order to maximise the information gained from advanced experimental technologies. Ultimately, the methods employed should seek to reduce the number of dimensions in the data, explaining the variation in the data with respect to a given class or question in a parsimonious fashion.
References 1. Adam BL, Qu Y, Davis JW et al (2002) Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Res 62:3609–3614 2. Alon U, Barkai N, Notterman DA et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96:6745–6750 3. Arneberg R, Rajalahti T, Flikka K et al (2007) Pretreatment of mass spectral profiles: application to proteomic data. Anal Chem 79(18):7014–7026 4. Aslam N, Banerjee S, Carr JV et al (2000) Prospective evaluation of logistic regression models for the diagnosis of ovarian cancer. Obstet Gynecol 96:75–80 5. Baggerly KA, Morris JS, Coombes KR (2004) Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics 20: 777–785 6. Ball G, Mian S, Holding F et al (2002) An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers. Bioinformatics 18:395–404 7. Barlow TW (1995) Self-organizing maps and molecular similarity. J Mol Graph 13:24–27, 53–25 8. Barrier A, Lemoine A, Boelle PY et al (2005) Colon cancer prognosis prediction by gene expression profiling. Oncogene 24:6155–6164 9. Bayes T (1991) An essay towards solving a problem in the doctrine of chances. 1763. MD Comput 8:157–171 10. Bellman RE (1961) Adaptive control processes. Princeton University Press, Princeton 11. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300 12. Bhattacharjee A, Richards WG, Staunton J et al (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA 98:13790–13795 13. Bishop C (1995) Neural networks for pattern recognition. Oxford University Press 14. Bolstad BM, Irizarry RA, Astrand M et al (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19:185–193
L. Lancashire and G. Ball 15. Braga-Neto U, Dougherty E (2005) Exact performance of error estimators for discrete classifiers. Pattern Recognit 38: 1799–1814 16. Breiman L, Friedman JH, Olshen RA et al (1984) Classification and regression trees. Chapman & Hall/CRC Monterey, CA 17. Breiman L (2001) Random forests. Machine Learning 45: 5–32 18. Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167 19. Chan JM, Stampfer MJ, Giovannucci E et al (1998) Plasma insulin-like growth factor-I and prostate cancer risk: a prospective study. Science 279:563–566 20. Cheng C, Pounds S (2007) False discovery rate paradigms for statistical analyses of microarray gene expression data. Bioinformation 1:436–446 21. Chu F, Wang L (2005) Applications of support vector machines to cancer classification with microarray data. Int J Neural Syst 15:475–484 22. Cleveland WS (1979) Robust locally weighted regression and smoothing scatterplots. J Amer Stat Assoc 74: 829–836 23. Crisianini N, Shawe-Taylor J (2000) An introduction to support vector machines (and other kernel-based learning methods) Cambridge University Press, Cambridge 24. Deng X, Geng H, Ali HH (2007) Cross-platform analysis of cancer biomarkers: a Bayesian network approach to incorporating mass spectrometry and microarray data. Cancer Inform 2:183–202 25. Diamandis EP (2004) Mass spectrometry as a diagnostic and a cancer biomarker discovery tool: opportunities and potential limitations. Mol Cell Proteomics 3:367–378 26. Diaz-Uriarte R, Alvarez de Andres S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7:3 27. Domeniconi C, Papadopoulos D, Gunopulos D et al (2004) Subspace clustering of high dimensional. In: SDM ‘04: Proceedings of the Fourth SIAM International Conference on Data Mining, University City Science Center, Philadelphia, pp 517–521 28. Duh MS, Walker AM, Ayanian JZ (1998) Epidemiologic interpretation of artificial neural networks. Am J Epidemiol 147:1112–1122 29. Efron B, Tibshirani R (2002) Empirical Bayes methods and false discovery rates for microarrays. Genet Epidemiol 23: 70–86 30. Erkanli A, Taylor DD, Dean D et al (2006) Application of Bayesian modeling of autologous antibody responses against ovarian tumor-associated antigens to cancer detection. Cancer Res 66:1792–1798 31. Eszlinger M, Wiench M, Jarzab B et al (2006) Meta- and reanalysis of gene expression profiles of hot and cold thyroid nodules and papillary thyroid carcinoma for gene groups. J Clin Endocrinol Metab 91(5):1934–1942 32. Gao WM, Kuick R, Orchekowski RP et al (2005) Distinctive serum protein profiles involving abundant proteins in lung cancer patients based upon antibody microarray analysis. BMC Cancer 5:110 33. Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
27
Computational and Statistical Methodologies for Data Mining in Bioinformatics
34. Goodenough DJ, Rossmann K, Lusted LB (1974) Radiographic applications of receiver operating characteristic (ROC) curves. Radiology 110:89–95 35. Gordon GJ, Jensen RV, Hsiao LL et al (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62:4963–4967 36. Greene D, Cunningham P (2005) Producing accurate interpretable clusters from high-dimensional data. In: Producing accurate interpretable clusters from high-dimensional data. In 9th European conference on principles and practice of knowledge discovery in databases, University of Dublin, Trinity College, Dublin 37. Gruvberger S, Ringner M, Chen Y et al (2001) Estrogen receptor status in breast cancer is associated with remarkably distinct gene expression patterns. Cancer Res 61: 5979–5984 38. Hastie T, Tibshirani R, Botstein D et al (2001) Supervised harvesting of expression trees. Genome Biol 2:research0003 39. Haykin S (1999) Neural networks: a comprehensive foundation. Prentice-Hall 40. Heckerling PS, Gerber BS, Tape TG et al (2004) Use of genetic algorithms for neural networks to predict community-acquired pneumonia. Artif Intell Med 30:71–84 41. Holland JH (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence The MIT Press, Cambridge 42. Hu Y, Zhang S, Yu J et al (2005) SELDI-TOF-MS: the proteomics and bioinformatics approaches in the diagnosis of breast cancer. Breast 14:250–255 43. Izmirlian G (2004) Application of the random forest classification algorithm to a SELDI-TOF proteomics study in the setting of a cancer prevention trial. Ann NY Acad Sci 1020:154–174 44. Khan J, Wei JS, Ringner M et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7:673–679 45. Kohonen T (1989) Self-organization and associative memory. Springer, Berlin 46. Li J, Zhang Z, Rosenzweig J et al (2002) Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clin Chem 48:1296–1304 47. Lipshutz RJ, Fodor SP, Gingeras TR et al (1999) High density synthetic oligonucleotide arrays. Nat Genet 21:20–24 48. Lisboa PJ, Taktak AF (2006) The use of artificial neural networks in decision support in cancer: a systematic review. Neural Netw 19(4):408–415 49. Liu Z, Chen D, Bensmail H (2005) Gene expression data classification with Kernel principal component analysis. J Biomed Biotechnol 2005:155–159 50. Lusted LB (1971) Decision-making studies in patient management. N Engl J Med 284:416–424 51. Marengo E, Robotti E, Righetti PG et al (2004) Study of proteomic changes associated with healthy and tumoral murine samples in neuroblastoma by principal component analysis and classification methods. Clin Chim Acta 345: 55–67 52. Matharoo-Ball B, Ratcliffe L, Lancashire L et al (2007) Diagnostic biomarkers differentiating metastatic melanoma patients from healthy controls identified by an integrated
349
MALDI-TOF mass spectrometry/bioinformatic approach. Proteomics Clin Appl 1:605–620 53. Metz CE (1978) Basic principles of ROC analysis. Semin Nucl Med 8:283–298 54. Mitchell TM (1997) Machine learning. McGraw-Hill 55. Munro NP, Cairns DA, Clarke P et al (2006) Urinary biomarker profiling in transitional cell carcinoma. Int J Cancer 119:2642–2650 56. Mylonas P, Wallace M, Kollias S (2004) Using k-nearest neighbor and feature selection as an improvement to hierarchical clustering. Springer, Berlin 57. Nelder JA, Wedderburn RWM (1972) Generalized Linear Models. J R Stat Society Ser A 135:370–384 58. Nour MA, Madey GR (1996) Heuristic and optimization approaches to extending the Kohonenself organizing algorithm. Eur J Oper Res 93:428–448 59. O’Neill MC, Song L (2003) Neural network analysis of lymphoma microarray data: prognosis and diagnosis nearperfect. BMC Bioinformatics 4:13 60. Osuna E, Girosi F (1999) Reducing run-time complexity in support vector machines. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in Kernel methods: support vector learning. The MIT Press, p 392 61. Petricoin EF, Ardekani AM, Hitt BA et al (2002) Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359:572–577 62. Pomeroy SL, Tamayo P, Gaasenbeek M et al (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415:436–442 63. Pounds SB (2006) Estimation and control of multiple testing error rates for microarray studies. Brief Bioinformatics 7:25–36 64. Quackenbush J (2002) Microarray data normalization and transformation. Nat Genet 32 Suppl:496–501 65. Rosenwald A, Wright G, Chan WC et al (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med 346: 1937–1947 66. Schena M, Shalon D, Davis RW et al (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467–470 67. Scherf U, Ross DT, Waltham M et al (2000) A gene expression database for the molecular pharmacology of cancer. Nat Genet 24:236–244 68. Shen HB, Yang J, Chou KC (2005) Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. J Theor Biol 240(1):9–13 69. Singh D, Febbo PG, Ross K et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1:203–209 70. Smith AE, Nugent CD, McClean SI (2003) Evaluation of inherent performance of intelligent medical decision support systems: utilising neural networks as an example. Artif Intell Med 27:1–27 71. Sorlie T, Tibshirani R, Parker J et al (2003) Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci USA 100:8418–8423 72. Stekel D (2003) Microarray bioinformatics. Cambridge University Press 73. Tibshirani R (1996) A comparison of some error estimates for neural network models. Neural Comput 8:152–163
350 74. Tibshirani R, Hastie T, Narasimhan B et al (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci USA 99:6567–6572 75. Tung WL, Quek C, Cheng P (2004) GenSo-EWS: a novel neural-fuzzy based early warning system for predicting bank failures. Neural Netw 17:567–587 76. Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98:5116–5121 77. Ultsch A, Roske F (2002) Self-organizing feature maps predicting sea levels Inf Sci 144:91–125 78. van ‘t Veer LJ, Dai H, van de Vijver MJ et al (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536 79. Wadsworth JT, Somers KD, Cazares LH et al (2004) Serum protein profiles to identify head and neck cancer. Clin Cancer Res 10:1625–1632 80. Wagner M, Naik DN, Pothen A et al (2004) Computational protein biomarker prediction: a case study for prostate cancer. BMC Bioinformatics 5:26 81. Wall R, Cunningham P, Walsh P et al (2003) Explaining the output of ensembles in medical decision support on a case by case basis. Artif Intell Med 28:191–206
L. Lancashire and G. Ball 82. Warnat P, Eils R, Brors B (2005) Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes. BMC Bioinformatics 6:265 83. Welsh JB, Zarrinkar PP, Sapinoso LM et al (2001) Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer. Proc Natl Acad Sci USA 98: 1176–1181 84. West M, Blanchette C, Dressman H et al (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA 98: 11462–11467 85. Wolkenhauer O, Möller-Levet C, Sanchez-Cabo F (2002) The curse of normalization. Comp Funct Genom 3: 375–379 86. Xu WH, Chen YD, Hu Y et al (2006) Preoperatively molecular staging with CM10 ProteinChip and SELDI-TOF-MS for colorectal cancer patients. J Zhejiang Univ Sci B 7: 235–240 87. Yu JS, Ongarello S, Fiedler R et al (2005) Ovarian cancer identification based on dimensionality reduction for highthroughput mass spectrometry data. Bioinformatics 21: 2200–2209
The Use of Bayesian Networks in Decision-Making
28
Zhifang Ni, Lawrence D. Phillips, and George B. Hanna
Abbreviations
Contents Abbreviations ..................................................................... 351 28.1
Introduction ............................................................ 351
28.2
Bayes’ Theorem ...................................................... 352
28.3
Bayesian Networks ................................................. 353
28.4
How to Use Belief Networks .................................. 354
28.4.1 Making Predictions .................................................. 28.4.2 Exposing the Underlying Logic and Assumptions ............................................ 28.4.3 Handling Evidences ................................................. 28.4.4 Incorporating Expert Judgments ..............................
AUC BN NPV PPV ROC
Area under the ROC curve Bayesian network Negative predictive value Positive predictive value Receiver operating characteristic
354 354 356 356
28.5
Measuring Model Performance ............................ 357
28.6
Dynamic Bayesian networks ................................. 358
28.7
Inference Diagrams ................................................ 358
28.8
Conclusions ............................................................. 358
Abstract Bayesian networks (BNs) are graphical tools of reasoning with uncertainties. In recent years, BNs have been increasingly recognized for their capacity to represent probabilistic dependencies explicitly and intuitively, handle incomplete information, and capture expert judgments along with hard data. In this chapter, we examine the underlying logic of BNs and discuss their applications.
References ........................................................................... 359 Further Reading ................................................................. 359
28.1 Introduction
Z. Ni () The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust at St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
Bayesian networks (BNs) [7, 11], also called Bayesian belief networks, belief networks, and probabilistic causal networks are tools of reasoning with uncertainties. At the centre of BNs is Bayes’ theorem [1], a probability theory that prescribes the revision of opinions in the light of new information. Ledley and Lusted [8] were the first who noted how medical decision-making could benefit from this form of rational reasoning – the belief updating process embodied in Bayes’ theorem is essentially the logic inherent in differential diagnoses. Their publication marked the beginning of an era in which a large number of computerized decision-making tools were developed, with Bayesian applications playing a key
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_28, © Springer-Verlag Berlin Heidelberg 2010
351
352
role. Psychologists at the University of Michigan were the first who developed the fundamental idea of using human expertise to provide probabilistic inputs, as embodied in the notion of beliefs [2]. Studies comparing the actual human inferences to the properties of Bayes’ theorem, however, led to the surprising conclusion that in general, people were conservatives [2, 12]. That is, given the same information, they do not revise their judgments as much as prescribed by Bayes’ theorem. Conservatism is exacerbated when more than one piece of information has to be taken into account. Computerized Bayesian tools provide a natural solution to conservatism. BNs were developed in the 1980s to solve problems encountered in early Bayesian applications, including computational complexity and independence assumptions. Compared with their predecessors, BNs have the capacity to handle problems with much greater complexity, while they represent probabilistic dependencies explicitly and intuitively. In recent years, these advantages have allowed BNs to become increasingly popular in a variety of disciplines including medicine [3], environmental protection [5], and financial risk management [9], just to name a few. In this chapter, we introduce BNs, focusing on their underlying logic and possible usage in medicine. Our goal is to familiarize the users with the tools and demonstrate their potential to improve medical decisionmaking. The sections are organized as follows. First, we present the Bayes’ theorem, which underlies all Bayesian applications. Second, we demonstrate how one can construct, interpret, and validate a BN. Third, we propose four potential uses of BNs in medicine. These are making predictions, examining the hidden logic and assumptions of a decision problem, handling incomplete information, and capturing expert judgments. We then briefly introduce the notion of influence diagrams, which are the generalized form of BNs in the context of decision analysis. We conclude by discussing some limitations of BNs.
28.2 Bayes’ Theorem Bayes’ theorem bears the name of Thomas Bayes, an eighteenth century English clergyman who first uncovered the underlying logic [1]. The theorem shows how
Z. Ni et al.
uncertainties should be updated in the light of new information. The idea is to capture one’s uncertainty about an event or uncertain quantity in the form of prior probabilities; then gather data, observe the results, summarize them as a special probability known as conditional probability or likelihood, and then apply Bayes’ theorem by multiplying the prior by the likelihood, giving a posterior probability. Let Hi denote one of n mutually exclusive events, and D be some diagnostic data. Bayes’ theorem can be written as:
P(Hi D) =
P(Hi)P(D Hi)
(28.1)
n
∑ P(H )P(D H ) i
i =1
i
In Eq. 28.1, P(Hi) is the prior probability of Hi; P(Hi|D), read as the probability of Hi given D, is the posterior probability of Hi when the data has been observed; and P(D|Hi) is the conditional probability of D when Hi is true. The product of P(Hi) and P(D|Hi) gives the joint probability P(Hi,D) that indicates how often Hi and D occur simultaneously. When all the joint probabilities of Hi and D are summed, as is the case in the denominator of Eq. 28.1, the result is P(D), i.e., the probability of the data. Bayes’ theorem can be written in many forms, including a very useful odds/likelihood ratio formulation when only two competing events are under consideration, e.g., H0 vs. H1. More detailed information can be found in the introductory statistics books or texts. An example illustrates how Bayes’ theorem works. Suppose you suspect a patient has a 30% (i.e., prior probability) chance of having tuberculosis. To find out more, you call for an X-ray, which gives either a normal or an abnormal result. From past experiences, you know abnormal results are observed not only in 90% of all patients with tuberculosis but also in 10% without the disease. This information is summarized in a conditional probability matrix, with the rows and the
Table 28.1 The conditional probability matrix for the diagnosis problem Tuberculosis X-ray
Present (%)
Absent (%)
Normal
90
10
Abnormal
10
90
28
The Use of Bayesian Networks in Decision-Making
columns indicating the states of the test and the states of the disease, respectively (Table 28.1). Suppose you learn that the X-ray result is normal. How does this change your belief in this patient’s risk of having tuberculosis? Intuitively, we know this risk is decreased. Bayes’ theorem, however, provides the answer to the key question: how much more or less likely. The posterior probability of tuberculosis given by a normal X-ray result is 4.54% (= 0.30 × 0.10/ (0.30 × 0.10 + 0.70 × 0.90) ). In other words, the patient’s risk of having tuberculosis has decreased substantially from 3 in 10 to 5 in 100. The belief updating process embodied in Bayes’ theorem is essentially the logic inherent in differential diagnoses. Not surprisingly, Bayesian applications enjoyed immediate success following Ledley and Lusted’s paper [8]. However, their development was slow. One common objection to the use of Bayes’ rule is the amount of data involved in computation. As we have seen, conditional probabilities are computed from joint probabilities. For our simple example with two binary events, any probability can be computed from 4 (= 2 × 2) joint probabilities, i.e., P(tuberculosis present, X-ray normal), P(tuberculosis present, X-ray abnormal), P(tuberculosis absent, X-ray normal), and P(tuberculosis absent, X-ray abnormal). As the number of events increases, the number of joint probabilities soon become absurdly large due to combinatorial explosion. Therefore, Bayesian applications were not feasible for many problems with a reasonable level of complexity. Another obstacle to the widespread use of Bayesian applications was the independence assumptions underlying Bayes’ theorem. To apply Eq. 28.1, Hi have to be independent of each other. That is, if the events are diseases, then we have to assume that a patient can have one and only one disease at a time. Such assumptions are difficult to meet. Modeling the diseases separately provides a solution, but at the cost of computational efficiency. For years, these problems reinforced each other and significantly restricted the penetration of Bayesian applications. The real solution arrived when BNs were developed in the late 1980s [7, 11].
353
28.3 Bayesian Networks BNs are directed acyclic graphs that contain nodes and links (or arcs) that connect nodes. The nodes represent uncertain events, and links represent their probabilistic dependencies or the lack of them; the strength of these dependencies is captured by conditional probability matrices. For example, Figure 28.1 shows a BN that represents the tuberculosis example incorporating the conditional probability for the example (Table 28.1). Compared with early Bayesian applications, BNs have several distinctive advantages. First, their graphical representation captures probabilistic relationships in a way that is intuitive, succinct, and explicit. The BN shown in Fig. 28.1 represents the dependent relationship between the disease and the test by a link, with the direction of the link consistent with their (believed) causality. Second, BNs improve in computational efficiency. Missing links signal conditional independence. Therefore, we do not need the entire set of joint probabilities to specify the complete probability distributions. To see what this means, consider the network displayed in Fig. 28.2. This BN contains four binary events, bronchitis (B), dyspnea (D), tuberculosis (T), and X-ray results (X). Without the independence assumption, we can compute their joint probabilities by Eq.28.2:
Tuberculosis
Fig. 28.1 A Bayesian network representation of the diagnosis problem
Tuberculosis (T)
X-ray Result (X)
1
We constructed all the Bayesian networks in this chapter using Netica® (Norsys Software Corp. http://www.norsys.com/).
X-ray Result
Bronchitis (B)
Dyspnea(D)
Fig. 28.2 A four-variable Bayesian network. The lack of links between tuberculosis and bronchitis, and between bronchitis and X-ray result indicates the conditional independence of each pair of events
354
Z. Ni et al. Tuberculosis
X-ray Result
Interpreted X-Ray Result
Fig. 28.3 A Bayesian network that takes into account uncertainties in evidence (X-ray)
P(X,D,T,B) = P(T)P(B|T)P(D|T,B) P(X|D,T,B) (28.2) The computation of Eq. 28.2 is complex; but this is unnecessary. The reason is that there are no arcs between bronchitis (B) and tuberculosis (T), or between bronchitis and X-ray result (X). In other words, bronchitis and tuberculosis are assumed to be conditionally independent; so are bronchitis and X-rays. In probability terms, this means P(B|T) = P(B) and P(X|D, T, B) = P(X|T). Accordingly, Eq. 28.2 becomes Eq.28.3: P(X,D,T,B,) = P(T)P(B)P(D|T,B) P(X|T).
28.4.1 Making Predictions
(28.3)
The savings in computation can be enormous when a problem involves hundreds or thousands of events. The third advantage of BNs is their flexibility. While early Bayesian applications mainly dealt with single level of inference, i.e., from reliable datum to the hypothesis of interest, BNs can be easily expanded to incorporate many layers of uncertainties. For instance, suppose in the tuberculosis example, the interpretation of X-ray result is imperfect. To account for this, we simply graft a child node “interpreted X-ray result” to “X-ray result” in the existing network (Fig. 28.1). The result is an expanded network (Fig. 28.3) that allows us to make inferences based on the interpretations3.
28.4 How to Use Belief Networks The advantages of BNs in representation and probability propagation make them ideal tools of reasoning with uncertainties. In this section, we will discuss
2
several usages of BNs, including making predictions, examining the hidden logic and assumptions of a decision problem, handling incomplete information, and capturing expert judgments. These are by no means exhaustive. Our goal is to show the potential of BNs in dealing with a diversity of decision problems in the medical context.
If the interpretation has a fixed relationship with the actual test, e.g., correct 80% of the time regardless of the test results, then we can keep the original network (Fig. 28.1), and simply provide evidences in probability terms, e.g., P(X-ray result = negative) = 0.80. The expanded network (Fig. 28.2) can handle more complex relationships, such as when the accuracy of the interpretation differs for a positive and a negative test result.
Once a BN is constructed, it can be used to provide quantitative information of how events interact with each other. As the tuberculosis example demonstrates, BNs are commonly used to predict how observations influence hypotheses. Computer software that performs belief updating automatically is available, thus relieving users from the burden of computation. For instance, Fig. 28.4 displays two networks that capture the process of belief updating in the diagnosis of tuberculosis. Note that we choose to represent the uncertain events by “boxes” rather than “nodes”, such that the probabilities are shown in numbers as well as “belief bars.”
28.4.2 Exposing the Underlying Logic and Assumptions BNs contain qualitative as well as quantitative information of the dependencies between the uncertain events. A major usage of BNs is to examine the logic and assumptions that are often hidden in the decisionmaking process, and in so doing, rendering the process available for analysis and communication. For instance, Fig. 28.5 shows network “Asia” [7], which is a small part of a much larger network (called “chest clinic”) that can actually be used for making diagnoses. Note that the eight events in “Asia” are organized in a hierarchy, with events on the top influencing events at
28
The Use of Bayesian Networks in Decision-Making
355
Tuberculosis
X-ray Result
present
30.0
abnormal
34.0
absent
70.0
normal
66.0
Tuberculosis
X-ray Result
present
4.55
abnormal
absent
95.5
normal
0 100
Fig. 28.4 The belief updating process for the tuberculosis example. The top and bottom panels capture the belief status before and after the X-ray result enters into the network, respectively. The dotted line in the bottom panel indicates the direction of proba-
bilistic reasoning, i.e., from the test to the disease. From Fig. 28.4, we can gather information including the prior probability of tuberculosis (30%), the initial chance for the X-ray result to be normal (66%), and the posterior probability of tuberculosis, given a normal X-ray result (4.55%)
the bottom. Starting from the top, the nodes represent the risk factors (“visit to Asia”, “smoking”), hypothesized diseases (“tuberculosis,” “lung cancer,” and “bronchitis”), two diseases when combined (“tuberculosis or cancer”), and observable evidences (“dyspnea” and “X-ray result”), respectively. The underlying assumptions are as follows. The risk factors influence the prevalence of the diseases, which influences the results of X-ray and the presentation of the symptom, i.e., via the intermediate node “tuberculosis or cancer.” This node might seem redundant, as its relationships with its two parent nodes are self-evident. However, its presence reveals the important assumption that tuberculosis and
lung cancer exert exactly the same influences on X-ray results and dyspnea. In other words, neither the test nor the symptom would be useful in discriminating between the two diseases. With events explicitly linked, “Asia” guides us through decisions including what differential diagnoses to assume, what history to take, what symptoms to look for, and which test to request. Every step of these decisions can be explored and their consequences on the diagnoses be tested and analyzed. Interested readers can download the network “Asia” from Norsys Software Corp’s website at http://www.norsys.com.
Visit To Asia
Smoking
Tuberculosis
Lung Cancer
Tuberculosis or cancer
Fig. 28.5 The Bayesian network “Asia”
XRay Result
Dyspnea
Bronchitis
356
Z. Ni et al.
28.4.3 Handling Evidences Owing to limitations in cognitive capacity, people tend to underestimate the influences of new data [2, 12]. The problem is exacerbated as evidences accrue. In contrast, BNs combine information with ease and make mathematically sound predictions even when the information is incomplete. In particular, BNs can handle one special form of probabilistic inference, known as explaining away, i.e., when the presence of one causal factor decreases the probabilities of other competing causal factors of the same effect. For example, the BN “Asia” shows how the risk of lung cancer depends on one’s smoking habit, recent trip to Asia, dyspnea, and X-ray result, when some or all this information becomes available. A useful way of presenting the results of this investigation is to construct a cumulative belief curve (Fig. 28.6) that presents the predicted risk as a function of the evidences. As shown, the risk of lung cancer increases steadily with smoking, presence of dyspnea, and an abnormal X-ray result. The finding of an abnormal X-ray result leads to the most dramatic increase in the risk, from 58% to 72.4%. In comparison, the Risk of lung cancer % 100 90 80 70 60 50 40 30 20 10 0
72.4 57.9
5.5 Original
14.8 10 Smoking
Dyspnea Abnormal Visit Asia X-ray
Fig. 28.6 A cumulative belief curve of lung cancer based on Bayesian network “Asia.” Probability of lung cancer (y-axis) depends on the cumulative findings of symptoms and risk factors (x-axis). Smoking, presence of dyspnea, and abnormal chest X-rays together increase the risk of lung cancer from the original 5.5% to 72.4%. Visiting Asia, however, decreases this risk to 57.9%. This happens because tuberculosis, a disease associated with visiting Asia, explains both the presence of dyspnea and abnormal chest X-rays. Predictions are hypothetical, based on the Bayesian network “Asia.” For more information, visit http://www.norsys.com/
findings of smoking and the presence of dyspnea increase this risk from 5.5% to 10%, and further to 14.8%. In updating probabilities, BNs do so sequentially, taking into account evidence one piece at a time. When evidence is incomplete, the model simply predicts based on whatever available. It is worth noting that a recent visit to Asia decreases the risk of lung cancer by nearly 15%. In network “Asia”, there are however no links between visiting Asia and having lung cancer. So how does this happen? The reason is that a visit to Asia increases one’s chance of tuberculosis, which provides an alternative account for the apparent evidence of lung cancer, i.e., an abnormal X-ray and the presence dyspnea. In other words, lung cancer has been explained away.
28.4.4 Incorporating Expert Judgments In real life, published literature rarely contains all the information one needs to solve a problem. This poses a challenge to making “evidence-based” decisions. Psychologists at the University of Michigan were the first to develop the fundamental idea of using human expertise to provide probabilistic inputs. The unique capacity of BNs to capture expert judgments along with hard data makes them ideal for dealing with such problems. Expert judgments can take many forms, quantitative, e.g., prior probabilities and conditional probabilities, as well as qualitative, e.g., the dependent relationship between the events. In recent years, “expert priors” are found to be useful even when BNs can be derived from databases. For instance, Gevaert et al. [3] constructed BNs from a clinical database that predicted pregnancy with unknown locations. They showed that expert judgments were useful in two ways. First, provided with expert priors, the model learned relationships between the events more efficiently from scarce data. Second, expert priors also improved the predictive accuracy of the model, given the abundant data. Pang et al. [10] compared the performance of a range of different methods in making prognostications for head injury, including logistic regressions, BNs, Artificial neural networks (see Chap. 38), among others. The performance of BNs stood out in two aspects: an explicit representation of the complex interactions between events and a capacity of incorporating expert opinions.
28
The Use of Bayesian Networks in Decision-Making
357
28.5 Measuring Model Performance
probabilities, P(disease present given test positive) and P(disease absent given test negative). Several things are worth noting. First, the definitions of these concepts imply the knowledge of a "goldstandard" that reveals the true nature of the hypothesis. Commonly used ones include the actual disease states that transpire in the follow-ups, pathological tests, or X-rays. Gold standards may not be perfect. X-rays, for instance, are known to be vulnerable to interpretation errors. It follows that clinicians sometimes have to settle with the second best option, e.g., expert judgments. Second, a test may have continuous rather than binary outcomes. This is typical with BNs, which predict a range of probabilities, such as P(lung cancer given smoking) = 10% rather than “lung cancer absent.” In such cases, we have to select cut-offs that convert a continuous prediction to a binary one. Suppose we choose 50% as the cut-off, then it means that any predicted risk of lung cancer more (less) than 50% is taken as indicating the presence (absence) of lung cancer. The selection of cut-offs has an impact on sensitivities and specificities. This is best understood by examining two extreme cases. First, imagine a cut-off of 0%. This means all patients presented to the clinic will be diagnosed as cancer patients. Thus, no cancer patients will be missed (B = 0 in Table 28.2, sensitivity = 100%), but a maximum number of false alarms will be raised (D = 0, specificity = 0). At the other end of the spectrum, imagine a cut-off of 100%. Now, no patients will be diagnosed as having cancer. Therefore, all cancer patients will be missed (A = 0, sensitivity = 0), but no healthy patients will be misdiagnosed (C = 0, specificity = 100%). It is easy to see that the higher the cut-off, the less sensitive but the more specific a test becomes in making diagnoses.
We may evaluate the predictive performance of BNs by sensitivity, specificity, positive, and negative predictive values. These values indicate how discriminative a test is in terms of a hypothesis, when both the test and the hypothesis have binary outcomes. The idea is as follows. An effective test has to distinguish between cases for which the hypothesis is true and those for which it is false. We may measure this from two different perspectives, of the hypothesis or the test. Suppose the hypothesis is that a disease is either present or absent and the test is either positive or negative. Then, from the perspective of the hypothesis, we can ask: for all those patients with (without) the disease, what is the proportion that tests positive (negative)? The answers are the sensitivity, i.e., the proportion of patients correctly identified as having the disease, and the specificity of the test, i.e., the proportion of patients correctly identified as disease-free. Despite being useful, sensitivity and specificity are clinically inadequate. The reason for this is that they do not tell us how a positive or negative result changes the likelihood of the disease. This information is contained in the positive and negative predictive values (PPVs and NPVs), which describe the discriminatory power of a test from the perspective of the test. Table 28.2 presents the computation details. To convert sensitivity and specificity to the clinically more useful PPVs and NPVs, we simply apply Bayes’ rule. This is because sensitivity and specificity are in fact conditional probabilities, P(test positive given disease present) and P(test negative given disease absent), whereas PPV and NPV are in fact posterior
Table 28.2 Possible outcomes of a diagnostic test Gold standard (+)
Gold standard (−)
Sum
Test (+)
A
C
A+C
Test (−)
B
D
B+D
Sum
A+B
C+D
A+B+C+D
A + B + C + D = the total number of patients in the target population A = the number of patients with the disease who test positive B = the number of patients with the disease who test negative (false negatives) C = the number of patients without the disease who test positive (false positives or false alarms) D = the number of patients without the disease who test negative A + C = All patients who test positive; B + D = All patients who test negative A + B = All patients with the disease; C + D = All patients without the disease Sensitivity = A/(A + B), specificity = D/(C + D), PPV = A/(A + C), NPV = D/(B + D)
358
Z. Ni et al.
28.7 Inference Diagrams
Sensitivity 1
z y x
AUC
Different cut−offs The gold−standard point
0
1-Specificity
1
Fig. 28.7 A receiver operating characteristic (ROC) curve. The area under the ROC curve (AUC) of a perfect test (gold standard) is 1. The more discriminatory a test is, the closer its AUC is to 1.
Such tradeoffs are best captured by a receiver operating characteristic (ROC) curve (Fig. 28.7).
28.6 Dynamic Bayesian Networks Events with cyclic relationships are typical in biological, environmental, and commercial systems. Figure 28.8 shows an example. In this case, the number of existing users of a social network determines the likelihood of an outsider to join the network, because more popular networks are perceived to be more beneficial. However, the joining decision per se influences the size of the membership, and via the perceived benefit, the joining decisions made by other outsiders. BNs cannot represent such problems because the predictions are not static, i.e., for a given time, but dynamic, i.e., changing with time. Researchers have introduced the concept of dynamic Bayesian networks (DBNs)[4], which allow events to have probability distributions dependent on time [4]. The work on DBNs is still going on.
In the context of decision analysis, BNs are a special case of inference diagrams [6] with only probability nodes (circles or ovals). Influence diagrams also contain consequence nodes and decision nodes, thus allowing decisions to be selected based on the (predicted) probabilities as well as the utility of the outcomes. For instance, suppose the predictions of network “Asia” are used to select treatment plans. The ideal case is to treat patients based on their actual states of the disease. Treating a cancer patient for tuberculosis has different consequences from treating a tuberculosis patient for cancer. To take this into account, we draw an influence diagram in which the consequences of selecting treatment plans depend on both the risks of the diseases and the decision (Fig. 28.9). The links indicate dependencies (as in BNs) but they may also indicate relevance. For instance, the link pointing out from the probability node (“predicted risks of tuberculosis, lung cancer, and bronchitis”) while into the decision node (“selecting treatment plans”) implies that the decision is made with the knowledge of the predictions made by the BN model “Asia”.
28.8 Conclusions Uncertainties permeate all forms of medical decisionmaking. The existence of computerized BN tools (e.g. Netica) means sophisticated knowledge in probability theories is not a pre-requisite for engaging in Bayesian
Predicted risks of tuberculosis, lung cancer and bronchitis
Selecting treatment plans Number of users
Perceived benefits of becoming a member
Probability of joining
Fig. 28.8 Uncertain events that form a loop and therefore cannot be handled by BNs
Consequences of the treatment plans
Fig. 28.9 Decision-making using predictions of the Bayesian network “Asia”. Influence diagrams have three different types of nodes: decisions (rectangles), probabilities (ovals or circles), and consequences (rounded rectangles). The links indicate dependence or relevance
28
The Use of Bayesian Networks in Decision-Making
reasoning. Clinicians can employ BNs to explore complex relationships between uncertain events, to combine multiple pieces of evidence, to capture expert judgments along with hard data, as well as to communicate their logic and findings in a way that is both intuitive and explicit.
References 1. Bayes T (1763) An essay towards solving a problem in the doctrine of chances. Phil Trans 53:370–418 2. Edwards W (1962) Dynamic decision theory and probabilistic information processing. Hum Factors 4:59–73. 3. Gevaert O, De Smet F, Kirk E et al (2006) Predicting the outcome of pregnancies of unknown location: Bayesian networks with expert prior information compared to logistic regression. Hum Reprod 21:1824–1831 4. Ghahramani Z (1998) Learning dynamic Bayesian networks. In: Giles CL, Gori M (eds) Adaptive processing of sequences and data structures. Lecture notes in artificial intelligence, LNAI, vol. 1387. Springer-Verlag, pp 168–197 5. Henriksen HJ, Barlebo HC (2007) Reflections on the use of Bayesian belief networks for adaptive management. J Environ Manage 88(4):1025–1036 6. Howard RA, Matheson JE (1984) Influence diagrams. In: Howard RA, Matheson JE (eds) Readings on the principles and applications of decision analysis II. Strategic Decisions Group, Menlo Park
359 7. Lauritzen SL, Spiegelhalter DJ (1988) Local computations with probabilities on graphical structures and their application to expert systems. J Roy Stat Soc Ser B50:157–224 8. Ledley RS, Lusted LB (1959) Reasoning foundations of medical diagnosis; symbolic logic, probability, and value theory aid our understanding of how physicians reason. Science 130:9–21 9. Neil M, Fenton N, Tailor M (2005) Using Bayesian networks to model expected and unexpected operational losses. Risk Anal 25:963–972 10. Pang BC, Kuralmani V, Joshi R et al (2007) Hybrid outcome prediction model for severe traumatic brain injury. J Neurotrauma 24:136–146 11. Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers, San Mateo 12. Phillips LD, Hays WL, Edwards W (1966) Conservatism in complex probabilistic inference. IEEE Trans Hum Fact HFE 7:7–18
Further Reading Norman Fenton’s website, http://www.dcs.qmw.ac.uk/~norman/ BBNs/BBNs.htm, contains many useful information on probability theories and Bayesian networks Norsys Corp’s website, http://www.norsys.com/, provides useful information about belief networks and free trial version of Netica for downloading
A Bayesian Framework for Assessing New Surgical Health Technologies
29
Elisabeth Fenwick
Contents 29.1
Introduction ............................................................ 361
29.2
A Bayesian Framework for Health Technology Assessment .......................................... 362
29.3
Applying the Framework: Pre-Operative Optimisation ........................................................... 364
29.4
Background to the Clinical Example.................... 364
29.5
Trial Analysis .......................................................... 365
29.6
Pre-Trial Analysis: Modelling the Available Information Set ...................................... 366
29.6.1 29.6.2 29.6.3 29.6.4
Probabilities.............................................................. Survival .................................................................... Costs ......................................................................... Results ......................................................................
29.7
Post-Trial Analysis: Combining Information Sets ..................................................... 369
366 368 368 368
Abstract The primary purpose of health technology assessment (HTA) is to provide information that can facilitate decision-making regarding alternative health technologies. No single randomised controlled trial can hope to provide definitive conclusions about all of the endpoints of interest to decision-makers when considering a new intervention for adoption by the clinical community and/or reimbursement by health care decision-makers. Therefore, HTA involves the synthesis and incorporation of all available evidence regarding the intervention. This chapter presents a Bayesian framework for managing the iterative process of HTA over the lifetime of the technology from the time when the technology is first identified for use in patients, to ensure consistency between reimbursement decisions and decisions regarding the need for and direction of further research.
29.7.1 Priors ........................................................................ 369 29.8
Discussion on the Framework ............................... 371
29.8.1 Impact of the Iterative Framework ........................... 371 29.8.2 Implications of the Framework for Health Technology Assessment .......................... 371 29.8.3 Employing Bayesian Methods ................................. 372 29.9
Conclusions ............................................................. 372
References ........................................................................... 373
E. Fenwick Community Based Sciences, Faculty of Medicine, University of Glasgow, 1 Lilybank Gardens, Glasgow, G12 8RZ, UK e-mail: [email protected]
29.1 Introduction The primary purpose of health technology assessment (HTA) is to provide information that can facilitate decision-making regarding alternative health technologies. No single randomised controlled trial can hope to provide definitive conclusions about all of the endpoints of interest to decision-makers when considering a new intervention for adoption by the clinical community and/or reimbursement by health care decisionmakers. Therefore, HTA involves the synthesis and incorporation of all available evidence regarding the intervention. The Bayesian approach provides a framework which not only allows for, but encourages, the systematic inclusion of existing evidence from other (external) sources through the formal and explicit use of prior information.
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_29, © Springer-Verlag Berlin Heidelberg 2010
361
362
Decision-making is inevitably undertaken in the context of uncertainty concerning the costs and effectiveness of health technologies. Reducing this uncertainty, and thus reducing the chance of making the wrong decision, through the acquisition of information is valuable to the society. However, information acquisition is not costless. In a budget-constrained system, the allocation of funds to acquire information will reduce the budget available for service provision. As a result, HTA must address two related issues: (1) the identification of cost-effective technologies for reimbursement, based upon the available evidence and (2) establishing whether it is worthwhile to invest in research to reduce decision uncertainty. Given that health technologies diffuse and develop over time, with new information emerging throughout the life cycle of the technology, either through deliberate effort or coincidentally, these decisions should not be taken just once, but regularly over the lifetime of the technology [1, 2]. An initial assessment should be undertaken early when the intervention first emerges as feasible for use in patients and utilised iteratively throughout the life-cycle of the intervention as new information is accrued. At each stage, the available information should be processed to determine whether the intervention provides good value for money and an efficient use of resources (on the basis of the information available), the extent of the uncertainty surrounding the cost-effectiveness of the intervention, the value of actively acquiring additional information from primary and secondary sources and the most efficient means of attaining that information. A fully Bayesian decision theoretic framework has been suggested for managing this iterative process of HTA, to ensure consistency between reimbursement decisions and decisions regarding further research over the lifetime of the technology [3, 4, 5, 6, 7]. Within this framework, the choice between interventions, given the information available, should be made according to the expected cost-effectiveness of the technologies. Uncertainty is irrelevant to this decision. Nonetheless, uncertainty is crucially important when it comes to deciding whether to fund further research to reduce uncertainty in the future, and value of information (VOI) analyses are employed to address this issue. Once new information is available, Bayesian methods are employed to incorporate it within the existing information set, and decisions are subsequently reassessed. Therefore, the framework provides a vehicle
E. Fenwick
to manage the process of HTA, identifying cost- effective interventions based on available information and directing future research effort over the life of the intervention. This chapter presents the Bayesian framework for managing the process of HTA over time. The chapter starts with an overview of the methods employed. The approach is then illustrated through an application involving the assessment of a new method of management for surgical patients. Finally, the chapter considers some of the benefits and implications of employing this framework for HTA. It should be noted that it is outside of the scope of this chapter to provide the reader with a comprehensive description of Bayesian methods and methodology for HTA. For a wider methodological discussion, including the implications of using Bayesian methods for clinical trials and evidence synthesis, the reader is referred to the work of Spiegelhalter et al. [8]. For a general discussion on Bayesian methods, the reader is referred to Berry’s “Bayesian Statistics” and others [9, 10, 11, 12]. Please note that throughout this chapter, the perspective taken will be that of a health care decisionmaker assessing the cost-effectiveness of interventions in order to determine whether to recommend and reimburse the technologies.
29.2 A Bayesian Framework for Health Technology Assessment The framework employs three main elements in order to manage the process of HTA over the life cycle of the intervention – probabilistic decision modelling; Bayesian VOI analysis and Bayesian updating. The process should begin as soon the intervention is identified for use in patients and continue until there is no further demand for additional research. The five main stages in operationalising the Bayesian framework are outlined in the following paragraph. The initial step within the framework is to construct a decision analytic model to represent the policy issue of interest. Decision analysis provides a framework through which healthcare policy decisions can be explicitly examined in a logical and rational manner. The technique involves identifying all relevant and realistic clinical strategies, as well as their subsequent
29
A Bayesian Framework for Assessing New Surgical Health Technologies
clinical events and associated probabilities. The process of providing this information explicitly often clarifies the problem for the decision-maker and enables others to understand and question the decision process. More details regarding decision analysis can be found in later sections of the textbook. The next stage is to identify, characterise and incorporate the existing available information within the model. This information is embodied within the model through the specification of (prior) probability distributions for each model parameter. These distributions represent the uncertainty surrounding the mean estimate of each parameter, presenting both the range of values that the parameter can take and the associated probability. The next stage employs Monte Carlo simulation to propagate the information through the model and generate a distribution of the expected costs and effects (e.g. life-years, QALYs) for the respective interventions to be compared. These distributions are then used to determine the cost-effectiveness of the interventions. Where the objective is to maximise health subject to a budget constraint, the choice between cost-effective interventions should be made on the basis of the expected values – the intervention with the highest expected incremental cost-effectiveness ratio (ICER) falling below some externally determined threshold for cost-effectiveness (l) is identified as optimal for reimbursement. There will be some uncertainty surrounding this decision which can be expressed through cost-effectiveness acceptability curves (CEAC). More details about the process of economic evaluation and CEAC can be found in the section on economic analysis. The third stage of the process involves a formal assessment of the uncertainty surrounding the decision, through the use of Bayesian VOI analysis, to determine the value to society associated with the collection of further information concerning the costs, effectiveness and cost-effectiveness of the interventions. These methods provide an explicit measure of the cost associated with the uncertainty surrounding a decision through consideration of the extent of and consequences associated with decision uncertainty [13, 14, 15, 16] and have been widely used in environmental economics [17], food safety [18], physics and engineering [19]. The methods determine the costs of the uncertainty in terms of the difference in the expected payoffs (costs and effects) when the decision
363
is based on the currently available information and when the decision is based on better information. This difference in payoffs can then be compared with the costs of undertaking research to reduce uncertainty, in order to value research. The expected value of perfect information (EVPI) is a special case which values the elimination of all uncertainty, thus providing a maximum return from investment in research. This maximum value can be compared with the cost of gathering further information to provide a necessary (but not sufficient) condition for determining whether further research is potentially worthwhile [20, 3, 4]. In addition, the EVPI can be calculated for individual, or various combinations of, parameters to assess the potential worth of research concerning particular elements of the decision. This process can usefully direct and focus the research effort towards the elements of the decision where the elimination of uncertainty is of most value. The EVPI for parameters (EVPPI) involves determining the difference between the expected decision payoffs with and without perfect information on this set of parameters. However, these methods can only determine the potential worth of further research, as perfect information is not achievable. Determining the actual worth of a specific research project requires a method to identify efficient research design and a means to value the reduction in uncertainty actually achievable. This involves determining the expected value of sample information (EVSI). This value is a function of the decision consequences of a reduction in uncertainty and the specifics of the research design itself (e.g. sample size and allocation; length of follow-up; endpoints of interest) [21]. Further details about the EVPI and VOI analyses can be found within the section on EVPI. The final stage of the framework is entered once additional information is available, either as a result of deliberate effort guided by the VOI analyses or coincidently from another policy area or jurisdiction. This stage involves incorporating the newly available data within the information set and using the new information set to make a revised assessment of the cost-effectiveness of the interventions. The information set is “updated” in the light of the new information using Bayesian updating. This represents a learning process. The process starts with an initial (prior) belief. As new information becomes available,
364
it is combined with the prior information, using Bayes’ theorem, to provide an “updated” (posterior) belief that incorporates all available information. When the prior and likelihood belong to the same class of distributions, the determination of the posterior distribution is simplified. In this case (termed “conjugate”), the posterior is restricted to a subset of possible distributions and can be determined analytically. For example, a normal prior and a normal likelihood function would form a normal posterior. When the prior and likelihood are non-conjugate, determination of the posterior involves integration over multiple complex distributional forms. In this case, the posterior can be approximated through the use of numerical methods, e.g., Markov Chain Monte Carlo methods. To summarise, the Bayesian decision analytic framework is used as a vehicle for the HTA process, identifying interventions that are cost-effective and directing future research effort on an iterative basis over the lifetime of the technology. Within the remainder of this chapter, an application of this fully Bayesian decision theoretic framework will be presented. This will be followed by a discussion on the implications and the potential for the development of the framework.
29.3 Applying the Framework: Pre-Operative Optimisation This section reports upon an application of the Bayesian framework in the context of pre-operative optimisation for high-risk patients undergoing major elective surgery. This research was funded as a part of the Bayesian Initiative in Health Economics and Outcomes Research and undertaken during a PhD study. The example has been adapted, for the purposes of presentation in this chapter, from previous analyses undertaken by Fenwick and others [22, 23, 6]. The section starts with a basic introduction to the clinical area of interest; this is then followed by the application of the framework. In this case, the framework is demonstrated out of order, with the trial analysis reported first and followed by the pre-trial analysis and finally the post-trial analysis. This reflects the nature of this particular research project where the pre-
E. Fenwick
trial element was actually undertaken retrospectively once the trial had completed, for expediency. However, this should not be seen as affecting either the results presented here or the prospective nature of the proposed framework, the principles of which remain the same – to assess the cost-effectiveness and the value of collecting further information at every stage in the development of the intervention.
29.4 Background to the Clinical Example Tissue hypoxia is the fundamental physiological event which leads to organ failure and death in critically ill patients. Hence, optimisation of tissue oxygen delivery and consumption are essential for reducing mortality and morbidity amongst these patients. In 1959, cardiac output pre-surgery was identified as the critical determinant of patient survival [24], and in 1960, it was reported that the pre-surgical levels of cardiac output and oxygen delivery associated with survivors of major surgery was considerably higher than for patients who died. As a result, it was suggested that the higher values of cardiovascular flow, observed in survivors, should become additional goals for peri-operative treatment for surgical patients [25]. Pre-operative optimisation for high-risk patients undergoing major elective surgery is a goal-oriented policy that involves admitting surgical patients ahead of surgery, inserting a pulmonary artery catheter to monitor cardiac index and administering inotropes. The aim is to enhance pre-surgical cardiovascular flow and oxygen delivery and thus, improve the chances of survival post-surgery. As such, the policy involves increased resource use before surgery (including a stay in an intensive or highdependency care unit), with the prospect of reduced resource use post-surgery (due to reductions in complications). Clinical trials have shown pre-operative optimisation to reduce the risk of complications and death in high-risk patients undergoing major elective surgery [24, 26]. There is also evidence that the use of pre-operative optimisation results in reduced hospital costs [24, 26] and that it constitutes a cost-effective method of managing high-risk patients undergoing major elective surgery [27].
29
A Bayesian Framework for Assessing New Surgical Health Technologies
29.5 Trial Analysis In 1999, a further trial compared standard pre-operative patient management with pre-operative optimisation (pre-op), in high-risk patients undergoing major elective surgery [28]. The study randomised 138 patients to receive standard management (n = 46) or pre-operative optimisation (n = 92) with either adrenaline or dopexamine. The results showed a significant reduction in hospital mortality associated with pre-op (3%) when compared with standard patient management (17%) and a reduction in morbidity associated with pre-op employing dopexamine (30%) when compared with that employing adrenaline (53%) and standard patient management (61%) [28]. In addition to the mortality and morbidity benefits, the trial also found some important differences in resource consumption. In particular, the use of dopexamine was associated with a significantly lower length of hospital stay [28]. A retrospective economic analysis of the trial data showed that these differences in resource use translated into cost savings, with the mean cost associated with patients receiving pre-op estimated as £7,261 when compared with £10,297 for patients receiving standard management. The economic analysis also translated the in-hospital mortality reduction into a mortality reduction (26% when compared with 33%) and survival benefit (1.68 years when compared with 1.47 years) at 2 years post-surgery. Thus, the economic analysis of the 1999 trial found that a policy of pre-operative optimisation dominated standard patient management for high-risk patients undergoing major elective surgery (i.e. it was both cheaper and more effective). Figure 29.1
365
illustrates the incremental cost-effectiveness plane for pre-op when compared with standard patient management. Each point in the figure represents one estimate of expected incremental costs and expected incremental effects generated by the Monte Carlo simulation. The majority of the points are located below the horizontal axis (negative incremental cost), indicating that the probability that pre-op is cost-saving is high (98%). A considerable proportion of the points are located within the southeast quadrant, where pre-op involves both reduced costs and higher survival duration than standard care, indicating a reasonably high probability that preop dominates standard patient management (94%). In addition, taking a cost-effectiveness threshold of £20,000 per life-year gained (indicated on Fig. 29.1 by the line labelled l), almost all of the points (99.3%) would be deemed cost-effective as they fall to the south-east of the line, indicating an ICER below the specified threshold. Figure 29.2 illustrates the cost-effectiveness acceptability curve for a policy of pre-op compared with standard patient management. The figure shows that if the decision-maker is willing to pay £20,000 per life-year gained, the probability that pre-op is optimal is 99.3% (as identified from the cost-effectiveness plane above). In this case, this probability falls slightly to 98.8% if the cost-effectiveness threshold increases to £30,000 per life-year gained. This is because some points within the south-west quadrant (lower cost and lower effectiveness) will no longer consider the value for money at this increased willingness to pay for life-years gained, i.e., now that life-years are worth more than these points involve too great a sacrifice in effectiveness for the cost-savings accrued from them.
Incremental Costs (£)
£5,000 £− −£5,000 −£10,000 −£15,000 −£20,000 −£25,000 −£30,000 −100
100 0 200 Incremental Survival (days)
300
Fig. 29.1 Cost-effectiveness plane for pre-operative optimisation vs. standard patient management
Probability pre-op cost-effective
£15,000 £10,000
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 £-
£20,000 £40,000 £60,000 £80,000 Willingness to pay for health outcome (λ)
£100,000
Fig. 29.2 Cost-effectiveness acceptability curves (CEAC) for pre-operative optimisation and standard patient management
366
Turning to the issue of whether further research is required, a VOI analysis for the decision between standard patient management and a policy of preoperative optimisation shows that, if the decisionmakers were willing to pay £20,000 per life-year, perfect information would be worth £7.66 per surgical procedure (translated into £0.78 million for an eligible population over 10 years) or £16.51 per surgical procedure (£1.7 million) if the decision-makers were willing to pay £30,000 per life-year. Thus, the economic analysis of the trial data suggested that preoperative optimisation was cost-effective with high probability, but that there was potentially some value to collecting further information. However, despite these results, the 1999 trial did not have a major influence on surgical management.
29.6 Pre-Trial Analysis: Modelling the Available Information Set As highlighted earlier, the 1999 trial was not the first trial of pre-operative optimisation in this patient group. Therefore, following the proposed framework, an analysis should have been undertaken on the basis of the information available before the 1999 trial. This would have demonstrated whether pre-operative optimisation was cost-effective based on the available information; the uncertainty surrounding the decision; whether it was potentially worthwhile collecting more information and, if so, which information and how to collect it. In order to undertake this stage for our example, it was necessary to take a step back from the 1999 trial results, to assume a position before the trial and examine the information that was available to decisionmakers (a “retrospectively prospective” view). The following illustrates the necessary steps and consequent results. The initial stage of the process involved the construction and population of a model to represent the information position available to decision-makers before the 1999 trial. The model was constructed within the Excel™ computer package, incorporating the add-in programme Crystal Ball™. The initial node in the decision model represents the surgical team’s choice of management strategy when undertaking major elective surgery with a highrisk patient. Each patient management strategy is then
E. Fenwick
incorporated as a separate branch following this decision node, with each subsequent branch representing the sequence of events that a patient might experience with each strategy, according to the assumptions of the model (see Fig. 29.3). For each treatment group, a proportion of those who undergo surgery will develop a complication as a result. Each patient management strategy is modelled by dividing the patient population according to the emergence of complications within 28 days of surgery. Irrespective of the patient’s complication status, a proportion of patients will die following surgery. Within the model, we have separated mortality into either “surgical mortality”, that occurring within 28 days of surgery (the usual end-point employed within intensive care trials) or “other mortality” (occurring after 28 days post-surgery). The “other mortality” is further split into three specific end-points: mortality within 6 months; mortality within 1 year and mortality 2 years post-surgery. These end-points were chosen to fit with the data from the 1999 trial, thus, simplifying the post-trial informed analysis. The next stage involved populating the model with the information that existed and was available to decision-makers before the 1999 trial commenced. A search of the literature and consultation with clinical colleagues led to the identification of three articles considered relevant [24, 27, 26]. Shoemaker et al. [26] detailed the results of a randomised trial of pre-op employing dopexamine vs. standard patient management in high-risk patients in the US, and provided a basic cost analysis. Boyd et al. [24] detailed the results of a randomised trial of pre-op employing dopexamine vs. standard patient management in high-risk patients in the UK. Guest et al. [27] provided a detailed analysis of the cost of resources associated with the management strategies encountered within the UK trial undertaken by Boyd et al. Owing to the similarity of the setting (UK) and the pre-operative procedure employed, more weight was given to the data from the UK studies when populating the model [24, 27]. In addition, patient-level data were used to populate the model when available.
29.6.1 Probabilities Decision analysis requires a probability to be associated with every possible event that patients might experience, conditional on the other events that occur
29
A Bayesian Framework for Assessing New Surgical Health Technologies
367
28 day mortality complication mortality 6 months survive 28 days mortality 2 years Survive 1 year
Treatment
survive 2 years
survive 6 months mortality 1yr
28 day mortality no complication mortality 6 months survive 28 days mortality 2 years survive 1 year survive 2 years
survive 6 months mortality 1 year
Fig. 29.3 Decision tree for each management strategy – Pre-trial model. Originally published in [6]. Reproduced by permission of Sage Publications
before it. As a result, the model required the following probabilities: • Probability of developing a complication, given management strategy • Probability of 28-day mortality, given complication status and management strategy • Probability of 6-month mortality, given complication status and management strategy • Probability of 1-year mortality, given complication status and management strategy • Probability of 2-year mortality, given complication status and management strategy The probability of complication and probability of death within 28 days were taken from the literature. The literature did not contain data beyond 28 days (the standard length of follow-up for intensive care trials), implying that after 28 days post-surgery, patients were expected to experience the same
outcomes irrespective of complication status and treatment group. The 1999 trial had a follow-up period of 2 years post-surgery. In order to incorporate the longer time horizon into the model, standard mortality rates [29] were used to determine the probability of mortality in the remaining time periods. The application of standard mortality rates within the model implies that after 28 days post-surgery: (1) the probability of mortality returns to the standard rate for a population of the same age and (2) the probability of mortality is independent of the complication status. Beta distributions [9, 30] were used to represent the uncertainty concerning each of the probabilities within the model. These can be populated directly from trial data (where available) because they are specified by two parameters, a (representing the number of successes in n trials) and b (representing the number of failures in n trials).
368
E. Fenwick
29.6.2 Survival
29.6.4 Results
In order to determine the expected survival duration associated with each management strategy, it was necessary to specify survival duration for every possible pathway. This was then used in combination with the probability associated with each pathway, to determine the expected survival. The process requires the derivation of a distribution of the survival duration for each treatment group, complication status and mortality status/period combination. For those dying within 28 days (with or without complications), an estimate of the mean length of survival and a measure of uncertainty were taken from the literature. For those dying beyond 28 days, data was not available and as such, an assumption was made that there was an equal probability of dying (from general mortality) on any day. Thus, a uniform distribution was applied to the survival, signifying that the first and last days of the period were the minimum and maximum values, respectively, with the mid-point of the period as the mean. For example, for pathways involving mortality within 6 months, the survival duration was represented by a distribution with a minimum value of 28 days, a maximum value of 183 days and a mean of 105.5 days. Finally, the timeframe of the analysis was restricted to 2 years, which requires the assumption that all surviving patients die on the third anniversary of surgery (day 731). Hence, the survival duration for all of these patients is 730 days.
Monte Carlo simulation was used to propagate the uncertainty and generate distributions of expected costs and life-years associated with each patient management strategy. The analysis of the information available before the 1999 trial, undertaken via the pre-trial model, showed cost savings associated with a policy of pre-operative optimisation (the mean expected cost associated with patients receiving pre-op was £9,412, while for patients receiving standard management, the mean cost was £11,885). In addition, the results from the model suggested a mortality reduction (16% compared with 29%) and survival benefit at 2 years post-surgery (1.74 years compared with 1.48 years) associated with pre-op. Thus, the pre-trial model suggested that a policy of pre-operative optimisation for high-risk patients undergoing major elective surgery dominated standard patient management. Figure 29.4 illustrates the incremental costeffectiveness plane for pre-op, compared with standard patient management. The majority of the points were located below the horizontal axis (negative incremental cost), illustrating that the probability that pre-op was cost-saving was reasonably high (75%). In addition, a considerable proportion of the points were located within the southeast quadrant, where pre-op involved both reduced costs and higher survival duration than standard care, implying a reasonable probability that pre-op dominated standard patient management (74%). Figure 29.5 illustrates the cost-effectiveness acceptability curve for a policy of pre-op compared with standard patient management. The figure shows that if the decision-maker is willing to pay £20,000 per life-year gained, the probability
29.6.3 Costs £15,000 £10,000 Incremental Costs (£)
A distribution of costs (split into pre-operative, intraoperative, post-operative and complication-related) was applied to each possible pathway in the model in order to determine the expected cost associated with each management strategy. Log-normal distributions for postoperative and complication-related costs were generated from the data contained in Guest et al. [27]. Pre-operative and intra-operative costs were not considered to be uncertain (due to strict protocols); hence, a fixed amount was specified for these costs at the prices applicable in 1999. Further details of all of the distributions used within the pre-trial model are available from Fenwick et al. [6].
£5,000 £− −£5,000 −£10,000 −£15,000 −£20,000 −£25,000 −£30,000 −100.00
0.00 100.00 200.00 Incremental Survival (days)
300.00
Fig. 29.4 Cost-effectiveness plane for pre-operative optimisation vs. standard patient management – pre-trial
Probability pre-op is cost-effective
29
A Bayesian Framework for Assessing New Surgical Health Technologies 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 £-
£20,000 £40,000 £60,000 £80,000 Willingness to pay for health outcome (λ)
£100,000
369
generated from the prior model to determine the posttrial cost-effectiveness of the patient management strategies and the investment potential of any additional research. The Bayesian updating was undertaken using WinBUGS™ (Windows-based Bayesian inference using Gibbs sampling), a specialist analysis package, which provides a numerical approximation of the posterior through Markov Chain Monte Carlo simulation via the Gibbs sampling algorithm.
Fig. 29.5 CEAC for pre-operative optimisation and standard patient management – pre-trial
that pre-op is optimal is 95.5%. This probability increases slightly to 97.3% if the decision-maker is willing to pay £30,000 per life-year gained. A VOI analysis for the decision between standard patient management and a policy of pre-operative optimisation shows that if the decisionmakers were willing to pay £20,000 per life-year, then perfect information would be worth £78 per surgical procedure (translated into £11 million for an eligible population over 15 years), or £50 per surgical procedure (£7 million) if the decision-makers were willing to pay £30,000 per life-year. The pre-trial analysis showed that, given the information that existed before the 1999 trial, a policy of pre-operative optimisation was cost-effective for managing high-risk surgical patients undergoing major elective surgery, and that there was a potential value in collecting further information.
29.7 Post-Trial Analysis: Combining Information Sets Within the Bayesian framework, once new information is available, the interventions are re-assessed, with the pre-trial information providing the informative priors for a Bayesian analysis of the trial data. To reiterate, in our example, the stages were inverted, where the 1999 trial was undertaken before the pre-trial analysis was conducted. Therefore, in what follows, a Bayesian reassessment of the decisions was undertaken on the completion of the pre-trial analysis. However, it should be noted that this does not affect the results of the analyses. The updating process combined the patient-level trial data concerning costs and survival duration with the estimates of mean cost and mean survival duration
29.7.1 Priors 29.7.1.1 Survival Survival duration, determined up to 2 years postsurgery, was available for each patient from the 1999 trial. Survival duration was modelled using a piecewise exponential distribution with the follow-up period split into four distinct time periods (each representing an important interval in the post-surgical recovery of the patient). This involves fitting a separate exponential function to approximate the survival function for each period, under the assumption of a constant hazard within, but not across each time period. In the post-trial analysis, the log hazard rates for each period and management strategy were modelled as normal distributions specified by a mean (representing the expected value) and precision (representing the uncertainty surrounding the mean). The prior values for the mean and precision were determined from the probabilistic analysis of the prior model, by converting the distribution of the survival probability to a distribution of the log hazard rates. The mean survival was estimated using the area under the survival curve for each interval and summing across the intervals.
29.7.1.2 Costs Patient-level data on resource use within the trial, with a follow-up of 6 months, was available from the trial. For the cost-effectiveness analysis, this resource use (including days in hospital, drug use and interventions) was converted to a patient-specific cost (for more details, see Fenwick et al. [22, 6]). In the post-trial
E. Fenwick
analysis, the patient-level cost data was modelled as a log normal distribution specified by a mean (representing the expected value) and precision (representing the uncertainty surrounding the mean). The mean of the log cost was itself modelled as a normal distribution, specified by a mean and precision, representing the variation in the mean log cost. The prior distributions for the parameters of this distribution were generated directly from the probabilistic analysis of the prior model. In turn, the precision of the log cost was modelled using a half normal distribution (a normal distribution truncated at zero to prevent negative values) specified by a mean and precision. The prior distributions for the parameters of this distribution were generated from the cost data used to populate the pre-trial model. The mean cost for each method of patient management was determined through a back-transformation of the log costs.
29.7.1.3 Results WinBUGS was used to undertake the updating (10,000 iterations following 10,000 burn-in iterations) and generate posterior distributions of expected costs and life-years associated with each patient management strategy. The informed analysis of the trial data showed costsavings associated with a policy of pre-operative optimisation. The mean expected cost associated with patients receiving pre-op was £7,075, while for patients receiving standard management, the mean cost was £10,180. In addition, the results of the informed trial analysis suggested a survival benefit at 2 years postsurgery (1.66 years compared with 1.39 years) associated with pre-op, when compared with standard patient management. Thus, the informed trial analysis suggested £15,000 Incremental Costs (£)
£10,000 £5,000 £− −£5,000 −£10,000 −£15,000 −£20,000 −£25,000 −£30,000 −100.00
0.00 100.00 200.00 Incremental Survival (days)
300.00
Fig. 29.6 Cost-effectiveness plane for pre-operative optimisation vs. standard patient management – post-trial
Probability Pre-op is cost-effective
370
Informed
1
Prior 0.95 0.9 0.85 0.8 0.75 0.7 £-
£20,000 £40,000 £60,000 £80,000 Willingness to pay for health outcome (λ)
£100,000
Fig. 29.7 CEAC for pre-operative optimisation and standard patient management – post-trial vs. pre-trial
that, based on the information set available following the 1999 trial, a policy of pre-operative optimisation for high-risk patients undergoing major elective surgery dominated standard patient management. Figure 29.6 illustrates the incremental cost-effectiveness plane for pre-op when compared with standard patient management. The majority of the points are located below the horizontal axis (negative incremental cost), indicating that the probability that pre-op is cost-saving is very high (99.4%). In addition, a considerable proportion of the points are located within the southeast quadrant, where pre-op involves both reduced costs and higher survival duration than standard care, indicating a high probability that pre-op dominates standard patient management (98%). Figure 29.7 illustrates the costeffectiveness acceptability curve for a policy of pre-op when compared with standard patient management. The figure shows that if the decision-maker is willing to pay £20,000 per life-year gained, the probability that pre-op is optimal is 99.94%. This probability falls very slightly to 99.88% if the decision-maker is willing to pay £30,000 per life-year gained. The figure also illustrates the reduction in decision uncertainty (between the pretrial model and the informed analysis) directly resulting from the 1999 trial. A VOI analysis for the decision between standard patient management and a policy of pre-op shows that if the decision-makers were willing to pay £20,000 per life-year, then perfect information would be worth £0.91 per surgical procedure (translated into £90,000 for an eligible population over 10 years) or £1.98 per surgical procedure (£200,000) if the decisionmakers were willing to pay £30,000 per life-year. The results of the Bayesian re-analysis with informative priors suggest that, given all the information available to them, decision-makers can be confident that
29
A Bayesian Framework for Assessing New Surgical Health Technologies
Table 29.1 Cost and survival duration for the different stages of the iterative framework – mean (standard error) Cost (£) Survival Incremental mean (se) (days) mean analysis (se) Pre-trial analysis Pre-op
9,412 (21,949)
636 (42)
Standard
11,885 (9,477)
541 (50)
Pre-op
6,941 (538)
612 (31)
Standard
10,512 (1,140)
535 (43)
Dominated
Trial analysis
Dominated
Informed Bayesian re-analysis Pre-op
7,075 (482)
606 (31)
Standard
10, 180 (21,342)
509 (38)
Dominated
pre-operative optimisation is a cost-effective method of managing high-risk surgical patients undergoing major elective surgery and that there is little potential value in collecting any further information.
29.8 Discussion on the Framework 29.8.1 Impact of the Iterative Framework The impact and value of the iterative framework can be determined through comparison of the various stages of the analysis. A comparison of the results and conclusions from the pre-trial model with those of the informed Bayesian re-analysis illustrates the impact that the data from the 1999 trial had upon the information set available to decision-makers. In addition, comparing the results and conclusions from the trial analysis with those of the informed Bayesian re-analysis illustrates the impact of formally incorporating the prior information position within the trial analysis, rather than discarding the information or relying upon informal methods to incorporate it. In terms of cost-effectiveness, the original trial analysis, pre-trial analysis and informed Bayesian re-analysis of the trial all gave the same result – a policy of pre-operative
371
optimisation is expected to dominate standard patient management. Therefore, the new trial data and the iterative framework had limited impact upon the costeffectiveness and thus, the reimbursement decision. Nonetheless, the analysis had considerable impact upon the extent of the uncertainty surrounding both the estimates of cost and effect and the decision. Table 29.1 summarises the expected costs and expected survival duration (mean and standard error) for each intervention for each stage in the iterative process. The standard errors illustrate that, as expected, the incorporation of the information available from the pre-trial model into the informed Bayesian re-analysis served to reduce the uncertainty surrounding the estimates of expected cost and expected survival duration. Comparing Figs. 29.1, 29.4 and 29.6 provides a graphical illustration of the reduction in the estimate uncertainty (costs – vertically, effects – horizontally) between the stages of the framework. In addition, the extent of the decision uncertainty falls between the pre-trial model (26%), the trial analysis (6%) and the informed Bayesian re-analysis (2%), regardless of the decision-maker’s willingness to pay for a lifeyear (Figs. 29.2 and 29.7). This translates into a fall in the EVPI between the stages of the framework. Before the 1999 trial was undertaken, further research was potentially worth £11 million, if decision-makers were willing to pay £20,000 per life-year. The analysis of the trial data in isolation suggested that further research was potentially worthwhile, with a maximum value of £0.78 million. However, once the pre-trial evidence was incorporated within the information set, the analysis of all the available data suggested that further research was unlikely to be worthwhile, with a maximum return of only £90,000.
29.8.2 Implications of the Framework for Health Technology Assessment The main implications of employing this framework for HTA are that the initial assessment should be undertaken when the intervention is first identified for use in patients. Moreover, the framework should be used iteratively throughout the life of the technology as new information emerges. While this will probably increase the burden on resources for HTA, there are significant potential benefits associated with this process. The main benefit is that, at each stage, the decisions about
372
reimbursement and further research will be informed by the up-to-date information available. In particular, the early assessment will allow the identification of potentially cost-effective interventions before further research is undertaken. In addition, there is no restriction on the number of alternatives that can be compared within a decision model. Therefore, when early modelling work is undertaken, it is possible to “think outside the box” and include a variety of interventions and management strategies within the decision. The technology chosen for reimbursement is the one with the highest expected cost-effectiveness, below the threshold, given the information available. Through VOI techniques, the iterative framework ensures that further information will be actively sought only when it is worthwhile and efficient. The EVPI provides an upper limit on the VOI from further data acquisition, which can be compared with research costs to determine whether the collection of further information is potentially worthwhile. More specifically, the EVSI provides a method to determine the worth of particular sample information and to design specific (efficient) research. In addition, the VOI approach can be used to determine the specific parameter focus, enabling more appropriate formulation of any programme of further data acquisition. The EVPPI provides an initial focus on the elements of the decision where further information is of the greatest potential worth, while EVSI can be used to determine the worth of specific research concerning particular parameters (see Claxton and others [21, 20, 3, 4]). Where early modelling is not undertaken, implicit judgements must be made about which parameters are important for the purposes of HTA. As a result, proposals for further data acquisition may either lack focus, leading to an unnecessarily large information requirement or fail to provide information about the appropriate parameters. This can result in expensive and potentially uninformative programmes of data acquisition. The use of the iterative framework, as discussed here, could potentially lead to situations where decisions regarding reimbursement are overturned once further research is undertaken and further evidence is incorporated within the model. Where there are costs associated with reversing the reimbursement decision (either financial or political), the use of expected value decisionmaking may be inappropriate. Suggested alternatives include utilising option price techniques to adjust costeffectiveness measures to account for irreversibility
E. Fenwick
[31], comparing the expected opportunity losses associated with reimbursement and rejection to determine whether the technology should be reimbursed [32] and formally incorporating a “delay and trial” alternative within the decision context to postpone the reimbursement decision until additional research is undertaken [33]. In practice, this tends to be handled informally by placing the burden of proof onto a new technology, which is unlikely to be recommended for reimbursement, on the basis of the expected values, when there is a considerable value associated with further research.
29.8.3 Employing Bayesian Methods While the use of prior belief regarding the existence/ prevalence of disease to interpret test results is uncontroversial, the wider application of this principle to incorporate prior or external evidence when interpreting the results of clinical trials has caused controversy within the literature. This controversy stems from the basic methodological and philosophical differences between a Bayesian analysis of data and a more classical (Frequentistic) type analysis. The most significant of these differences involve the Bayesians’ view of probabilities as degrees of belief in a proposition and of parameters as unknown variables for estimation. In addition, Bayesians allow and encourage the inclusion of external information within priors. Despite this controversy, the use of Bayesian methods has seen a revival in recent years, mostly due to the development of numerical methods for approximating posteriors for non-conjugate priors and data and the growth in computer power. This revival has filtered through to pharmacoeconomics [10, 34, 35] and there was recently a review of the use of Bayesian methods within HTA [8].
29.9 Conclusions This chapter has presented a Bayesian framework for managing HTA over the lifecycle of an intervention. The example has provided an application of the framework to a method for managing surgical patients. This analysis has demonstrated that the framework is valuable to decision-makers and practical to undertake. In
29
A Bayesian Framework for Assessing New Surgical Health Technologies
addition, it has illustrated that it is only by formally incorporating all of the information available to decision-makers, through the use of informed priors, that the appropriate estimates of cost-effectiveness and uncertainty are attained, upon which appropriate decisions can be based. Acknowledgements The example is taken from the work undertaken as a part of, and funded by, the Bayesian Initiative in Health Economics and Outcomes Research. Collaborators on the project were Professor Karl Claxton, Professor Mark Sculpher, Professor Keith Abrams, Mr Steve Palmer and Dr Alex Sutton.
References 1. Banta HD, Thacker SB (1990) The case for reassessment of health care technology: once is not enough. J Am Med Assoc 264:235–240 2. Sculpher MJ, Drummond MF, Buxton MJ (1997) The iterative use of economic evaluation as part of the process of health technology assessment. J Health Serv Res Policy 2:26–30 3. Claxton K, Sculpher M, Drummond M (2002) A rational framework for decision making by the National Institute For Clinical Excellence (NICE). Lancet 360:711–716 4. Claxton K, Fenwick E, Sculpher M (2006) Decision-making with uncertainty: the value of information. In: Jones AM (ed) Elgar companion to health economics. Elgar, Cheltenham, pp 514–525 5. Fenwick E, Claxton K, Sculpher M et al (2000) Improving the efficiency and relevance of health technology assessment: the role of iterative decision analytic modelling. CHE Discussion Paper 179, University of York, York 6. Fenwick E, Palmer S, Claxton K et al (2006) An iterative Bayesian approach to health technology assessment: application to a policy of pre-operative optimisation for patients undergoing major elective surgery. Med Decis Making 26: 480–496 7. Sculpher M, Claxton K, Drummond M et al (2006) Whither trial-based economic evaluation for health care decision making? Health Econ 15:677–687 8. Spiegelhalter DJ, Myles JP, Jones DR, Abrams KR (2000). Bayesian methods in health technology assessment: a review. Health Technol Assess 4(34):1-130 9. Berry DA, Stangl DK (1996) Bayesian Biostatistics. In: Berry DA, Stangl DK (eds) Bayesian biostatistics. Marcel Dekker, New York 10. Fryback DG, Chinnis JO, Ulviva JW (2001) Bayesian costeffectiveness analysis. An example using the GUSTO trial. Int J Technol Assess Health Care 17:83–97 11. Harrell FE, Peruggia M (1998) An introduction to Bayesian methods with clinical applications. 1998; Available from: 2 12. Spiegelhalter DJ, Freedman LS (1994) Bayesian approaches to randomized trials. J R Stat Soc (A) 157:357–416 13. Pratt JW, Raiffa H, Schlaifer R (1994) Introduction to statistical decision theory. In: Pratt JW, Raiffa H, Schlaifer R (eds) Introduction to statistical decision theory. MIT, Cambridge, MA 14. Raiffa H, Schlaifer RO (1959) Probability and statistics for business decisions. McGraw-Hill, New York
373
15. Raiffa H, Schlaifer RO (1961) Applied statistical decision theory. Harvard University, Cambridge, MA 16. Raiffa H (1968) Decision analysis: introductory lectures on choices under uncertainty. Addison-Wesley, New York 17. Thompson KM, Evans JS (1997) The value of improved national exposure information for perchloroethylene (perc): a case study for dry cleaners. Risk Analysis 17:253–271 18. Hammitt JK, Cave JAK (1991) Research planning for food safety: a value of information approach. RAND, Santa Monica, CA 19. Howard RA (1966) Information value theory. IEEE Transact Systems Science and Cybernetics (SCC) 2(1):22–26 20. Claxton K, Posnett J (1996) An economic approach to clinical trial design and research priority setting. Health Econ 5:513–524 21. Ades A, Lu G, Claxton K (2004) Expected value of sample information calculations in medical decision modeling. Med Decis Mak 24:207–227 22. Fenwick E, Wilson J, Sculpher M et al (2002) Pre-operative optimisation employing dopexamine or adrenaline for patients undergoing major elective surgery: a cost-effectiveness analysis. Intensive Care Med 28:599–608 23. Fenwick, E. (2002) An iterative framework for health technology assessment employing Bayesian statistical decision theory. PhD thesis. University of York 24. Boyd O, Grounds R, Bennett E (1993) A randomised clinical trial of the effect of deliberate perioperative increase of oxygen delivery on mortality in high-risk surgical patients. JAMA 270:2699–2707 25. Bland RD, Shoemaker WC, Shabbot MM (1978) Physiologic monitoring goals for the critically ill patient. Surg Gynecol Obstet 147:833–841 26. Shoemaker W, Appel P, Kram H et al (1988) Prospective trial of supranormal values of survivors as therapeutic goals in high-risk surgical patients. Chest 94:1176–1186 27. Guest JF, Boyd O, Hart WM et al (1997) A cost analysis of a treatment policy of a deliberate perioperative increase in oxygen delivery in high risk surgical patients. Intensive Care Med 23:85–90 28. Wilson J, Woods I, Fawcett J et al (1999) Reducing the risk of major elective surgery: randomised controlled trial of preoperative optimisation of oxygen delivery. BMJ 318: 1099–1103 29. Office for National Statistics (1998) Monitor: Population and Health. London: Stationery Office 30. Gelman, Carlin, Stern et al (1995) Bayesian data analysis. Chapman and Hall, Great Britain 31. Palmer S, Smith PC (2000) Incorporating option values into the economic evaluation of health care technologies. J Health Econ 19:755–766 32. GriffinS, Claxton K, Palmer S and Sculpher M (2006) Dangerous Omissions: The consequences of ignoring decision uncertainty. Health Economics Study Group meeting York (in press) 33. Eckermann S, Willan A (2007) Expected value of information and decision making in HTA. Health Econ 16:195–209 34. Parmigiani G, G.P. S, Ancukiewicz M et al (1997) Assessing uncertainty in cost-effectiveness analyses: application to a complex decision model. Med Decis Mak 17:390–401 35. Parmigiani G (2002) Modeling in medical decision making: a Bayesian approach (Statistics in Practice). Wiley, Chichester
Systematic Reviews and Meta-Analyses in Surgery
30
Sukhmeet S. Panesar, Weiming Siow, and Thanos Athanasiou
Contents
30.5.1 30.5.2
The Importance of Quality..................................... 389 Quality of Reporting .............................................. 390
30.1
Introduction to Systematic Reviews ................... 376
30.6
Pitfalls in Biased Inclusion Criteria ................... 391
30.1.1 30.1.2 30.1.3 30.1.4 30.1.5
The Rationale for the Systematic Review .............. So What Is a Systematic Review?.......................... The Meta-Analysis................................................. Advantages over Narrative Reviews ...................... Advantages over Randomised Controlled Trials ...
30.6.1
Dealing with Personal Bias and Inclusion Bias ..... 391
30.7
Which Meta-Analyses Should be Published? .... 391
30.8
Systematic Review of Observational Studies ..... 391
30.2
The Science of a Meta-Analysis .......................... 379
30.8.1 30.8.2
30.2.1 30.2.2 30.2.3 30.2.4 30.2.5 30.2.6 30.2.7
30.8.3 30.9
Other Types of Meta-Analyses ........................... 393
30.9.1 30.9.2
Data: Meta-Analysis of Individual Patient Data .... Study Type: Meta-Analysis of Observational and Epidemiological Studies ........................................ Study Type: Meta-Analysis of Survival Data ........ Method: Cumulative Meta-Analysis ...................... Method: Mixed-Treatment Comparison (MTC) Meta-Analysis............................................
30.2.13
Careful Planning is Important ................................ Defining the Objectives of the Study ..................... Defining the Population of Studies to be Included Defining the Outcome Measures............................ Locating all Relevant Studies ................................ Screening, Evaluation and Data Abstraction ......... Choose and Standardise the Outcome of Measure.............................................. Statistical Methods for Calculating Overall Effect Fixed and Random Effects Models ........................ Heterogeneity Between Study Results................... Meta-Regression .................................................... Conducting a Meta-Analysis in the Surgical Context........................................... The Learning Curve ...............................................
Use Cases for Observational Studies ..................... 392 Problems in Systematic Review of Observational Studies.................................................................... 392 Solutions to Problems in Observational Studies .... 393
30.3
Assessing the Quality of a Meta-Analysis .......... 388
30.4
Pitfalls in Conducting a Meta-Analysis ............. 389
30.4.1
Conflicting Results Between Meta-Analyses Compared with Large-scale RCTs ......................... 389
30.5
Pitfalls in the Variable Quality of Included Trials ................................................. 389
30.2.8 30.2.9 30.2.10 30.2.11 30.2.12
376 377 378 378 379
379 379 379 381 381 381
393 394 394 394
382 383 383 386 387
30.9.3 30.9.4 30.9.5 30.10
What Is the Use of Meta-Analyses?.................... 395
387 388
30.11
Meta-Analysis Software....................................... 395
30.12
Conclusion ............................................................ 396
S. S. Panesar () National Patient Safety Agency, 4 – 8 Maple Street London, W1T 5HD, UK e-mail: [email protected]
395
References ........................................................................... 396 Further Reading ................................................................. 397
Abstract The exponential rise in published medical research on a yearly basis demands a method to summarise best evidence towards its application to patient care in clinical practice. A robust meta-analysis is a valid tool. It is often considered to be a simple process of pooling results from different studies. However, this is not true. This chapter provides a structural framework to perform a meta-analysis. It guides the clinician on a journey from the identification of the correct
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_30, © Springer-Verlag Berlin Heidelberg 2010
375
376
clinical question to data analysis and through to producing a structured report. It also highlights the limitations and pitfalls associated with the meta-analytical technique. Solutions to these problems are also offered. We also briefly describe the newer methods of metaanalysis. While meta-analyses of homogeneous studies are the highest form of evidence, poorly conducted meta-analyses create confusion and serve to harm the patient.
S. S. Panesar et al.
This has made the up-to-date application of primary clinical research an overwhelmingly daunting task even for the eagerly enthusiastic clinician who would need to search, sift, analyse and summarise a large collection of studies in order to apply the fruits of research to the clinical setting. This can become a burden which cannot be taken lightly, especially within the constraints of limited time and energy of a surgeon in a busy general hospital.
30.1.1.2 The Presence of Conflicting Results
30.1 Introduction to Systematic Reviews 30.1.1 The Rationale for the Systematic Review The pressures of moral–ethical obligations, legal liability and health economic rationing have heralded the advent of evidence-based healthcare in the last few decades. To ensure the best possible outcomes for patients, clinicians are increasingly required to implement best practices and continual quality improvement processes in the clinical environment. This inextricably involves the application of the best available knowledge, usually in the form of scientific research, to guide clinical decision-making. Hence, the use of clinical research is no longer an option but a necessity.
30.1.1.1 The Problem of Information Overload The difficulty then arises from the recent information explosion in the biomedical field within the last quarter century as can be evidenced by the dense cornucopia of articles and journals which are now readily accessible and searchable through a myriad variety of online web-based bibliographic databases like PubMed and EMBASE. In addition to the huge volume of literature, its scattered nature poses further problems. Every time a new article appears, readers must compare new findings with the existing scope of evidence to come to a reframed overall clinical conclusion. On an average, a clinician would have to read seventeen original articles each day in order to keep up with advances in his/her chosen field.
Moreover, the presence of conflicting results among individual research studies does not improve matters. Not only could inconsistent results and conclusions be attributed to the statistical play of chance, but it might also be due to the presence of systematic error from poorly designed study methodology. This would entail the need to critically analyse each individual trial for study quality, adding an extra dimension of burden to the clinician.
30.1.1.3 The Narrative Review and its Shortcomings The narrative review partially resolves the problems mentioned earlier by providing a broad, updated and authoritative summary of research and opinion by key leaders in a field. However, this type of review brings with it its own attendant problems where different review authors can provide differing viewpoints and diametric conclusions from the very same source material used. This might be attributed to a number of factors like the use of an assorted mixture of ambiguous review methodologies, the lack of disclosure and transparency in techniques, the inability to statistically combine results and the inherent introduction of subjective bias present in the form of “expert” opinion [53].
30.1.1.4 The Limitations in Randomised Controlled Trials Furthermore, although randomised controlled trials (RCTs) – when conducted properly – are one of the more objective methods in determining the true relationship between treatment and outcome, the use of
30
Systematic Reviews and Meta-Analyses in Surgery
this particular type of study design also carries with it a number of limitations. This includes the need for large numbers of participants in a trial, usually ranging from thousands to tens of thousands of subjects, in order to ensure sufficient statistical power. This is especially so if the treatment effects being studied are small in magnitude but are still deemed clinically useful. It is further compounded by the study of rare diseases of low incidence and prevalence, where an RCT might have to be conducted over a prolonged period of time in order to gather sufficient number of required subjects for any statistically significant result to be derived. The presence of a latency period between exposure/treatment and outcome will also necessitate the need for a longer-term follow-up. Hence, although this type of study design is objective and free from bias compared with other study designs, in certain situations, it can prove to be costly in terms of time, manpower and money. As all groups do not have such resources in excess at their disposal, compromises are reached whereby trials are conducted anyway in smaller discrete populations. These make the results from such smaller studies liable to be statistically insignificant or at best imprecise with larger degrees of uncertainty in result estimates. With that, the overall usefulness of such RCTs is reduced. Moreover, the design of an RCT mandates that a standardised population demographic be tested in a controlled environment. In comparison with the true multivariate nature of the ‘real world’ clinical setting, the presence of heterogeneity in ethnicity, age and geography might make any significant result from RCTs inapplicable.
30.1.1.5 The Problem of Insufficient HighQuality Trial Data in Surgical Research A problem more specific to surgical literature lies in the relatively small proportion of high-quality evidence in most surgical journals. The number of surgical RCTs is indeed small and case reports and series are still the predominant publication type. Even then, within surgical studies, there are also heterogeneous differences in study quality, such as insufficient sample size, unclear methodologies, and the use of non-clinical outcome parameters [44].
377
30.1.1.6 The Solution From these issues, the need for a more objective method of summarising primary research together with the need to overcome the pitfalls in RCTs has spurred the development of a formalised set of processes and methodologies in the form of a systematic review and meta-analysis. In the surgical context, systematic reviews have become an important tool for finding important and valid studies, whilst filtering out the large number of seriously flawed and irrelevant articles. By condensing the results of many trials, systematic reviews allow the readers to obtain a valid overview on a topic with substantially less effort involved.
30.1.2 So What Is a Systematic Review? A systematic review is defined as the objective, transparent and unbiased location and critical appraisal of the complete scope of research in a given topic and the eventual impartial synthesis and, if possible, meta-analysis of individual study findings. The aims of a systematic review are manifold including: • The critical appraisal of individual studies. • The combination of individual results to create a useful summary statistic. • The analysis for the presence of, and reasons behind, between-study variances. • The exposure of areas of research which might be methodologically inadequate and require further refinement. • The exposure of knowledge gaps and areas of potential future research possibilities. Every systematic review is composed of a discrete number of steps which includes the: • Formulation of a specific question to be addressed • Definition of eligibility (inclusion and exclusion) criteria for primary studies to be included • Identification and location of all potentially eligible relevant studies whether published or unpublished • Critical appraisal of each individual study via the use of explicit appraisal criteria • Performance of a variety of statistical methods to assess for heterogeneity among studies
378
S. S. Panesar et al.
• Impartial unbiased analysis and synthesis of collected information, and lastly, • Creation of a structured report to state and discuss the findings.
30.1.3 The Meta-Analysis
Table 30.1 List of different types of meta-analysis By types of studies •
Meta-analysis of RCTs
•
Meta-analysis of observational and epidemiological studies
•
Meta-analysis of survival studies
•
Meta-analysis using different study designs (Taleoanalysis)
By types of data
In a systematic review, two types of synthesis can be performed: a qualitative synthesis where primary studies are summarised like in a narrative review and a quantitative synthesis where primary studies are statistically combined. It is this quantitative synthetic component which is termed a meta-analysis – the statistical quantitative integration of individual study findings to get an overall summary result. A common misunderstanding is that a meta-analysis is exactly identical to a systematic review and can be used interchangeably as synonyms. In truth, a metaanalysis is actually a subset component of a systematic review as illustrated in Fig. 30.1. A meta-analysis is also not limited only to the summarisation of RCT data. Different study designs, data types and follow-up spans as illustrated in list 1 could
•
Meta-analysis using aggregated trial summary data
•
Meta-analysis using independent patient data
be also used in a meta-analysis. More details with regard to the usage of each type of meta-analysis together with its attendant pros and cons would be discussed later. For now, emphasis would be given to the meta-analysis of RCTs. This is shown in Table 30.1. A meta-analysis can facilitate in synthesising the results for a number of scenarios where the findings of individual studies show: • No effect because of a small sample size • Varying directions of effect or • Effects versus no significant effects All of these findings can be commonly encountered among surgical topics. A meta-analysis may serve to combine findings from similar studies to help increase the power to detect statistical differences [38].
Systemic Review
30.1.4 Advantages over Narrative Reviews From the above-mentioned factors, the shortcomings of narrative reviews can be readily ameliorated as
Meta–analysis
Secondary Research
Fig. 30.1 A Venn diagram of the relationship between a systemic review and meta-analysis
• The adherence to a strict scientific design with transparent methodology in analysis ensures objectivity and reproducibility of the findings, • The presence of explicit inclusion and exclusion criteria ensures the comprehensiveness of the review, whilst in the process minimising the inclusion of bias within individual studies, • The presence of a meta-analysis can provide a quantitative summary of the overall effect estimate and lastly,
30
Systematic Reviews and Meta-Analyses in Surgery
• Any differences between study methodologies which affect the results can be explored. Narrative reviews by nature also tend to be generically broad and all-encompassing. The systematic review, in contrast, puts forward specific questions to answer which increases the applicability of such reviews in the clinical and surgical context.
30.1.5 Advantages over Randomised Controlled Trials A meta-analysis in a systematic review can enhance the statistical power of a group of RCTs, as the pooling of data from individual studies would increase the study population. With an increase in statistical power comes an increase in the precision of findings – reducing both uncertainty and ambiguity. Systematic reviews can also enhance the applicability of a trial as the pooling and analysis of data from different RCTs with different patient groups can reveal any heterogeneity or homogeneity in the findings. In conclusion, the systematic review and meta-analysis have great importance in the summarisation and application of scientific surgical research. It has become a cornerstone in forming clinical decisions and guidelines, and in the process, has given a better understanding of the areas in need of further research.
30.2 The Science of a Meta-Analysis 30.2.1 Careful Planning is Important It is a common misnomer that a meta-analysis is an easy study to undertake that can be done with minimum effort. In reality, little attention is often paid to the details of design and implementation. A valid meta-analysis still requires the same careful planning as any other research study [4]. This is shown in Fig. 30.2. Essentially, there are two goals to a meta-analysis. One is to summarise the available data and the other is to explain the variability among the studies. Ideally, all studies being meta-analysed should have similar patient characteristics and similar outcomes of interest.
379
In reality, a certain degree of variability is expected among the studies and this is the impetus for performing a meta-analysis [4]. Variability is assessed by sub-group analysis, heterogeneity studies and sensitivity analysis, all of which add “flavour” to the meta-analysis. As discussed previously, the steps involved in writing out a detailed research protocol for a meta-analysis includes: • The definition of study objectives and formulation of problem, • The establishment of inclusion and exclusion criteria, • The collection and analysis of the data and • The reporting of the results.
30.2.2 Defining the Objectives of the Study The first step is to identify the problem. This includes specifying the disease, condition, treatment and population of interest, the specific treatments or exposures being studied and the various clinical or biological outcomes being studied.
30.2.3 Defining the Population of Studies to be Included With a distinct problem, a discrete and objective statement of inclusion and exclusion criteria for studies can be created. This is crucial in a meta-analysis, helping to eliminate selection bias. These criteria need to be specified in the meta-analysis protocol in advance. Any inclusion criteria must include: The study type – it must be decided from the onset whether only RCTs or observational studies will be included, although there is a constant debate and research with regard to this [48, 52]. A hierarchy of evidence has been developed which allows for different types of studies to be included in the analysis [39]. Naturally, the lower the level of evidence of a type of study, the lower is the validity of the meta-analysis [38]. For more advanced types of meta-analysis, different study designs can even be included. This is called taleo-analysis, which although deemed as the
380
S. S. Panesar et al.
IS THE CLINICAL PROBLEM UNANSWERED? IS THE STUDY RELEVANT TO THE POPULATION WITH THE DISEASE CONDITION? IS THE CLINICAL QUESTION SPECIFIC ENOUGH?
Formulate a specific clinical question
PRESENCE OF BOTH INCLUSION AND EXCLUSION CRITERIA? CLEAR DEFINITION OF DISEASE CONDITION STUDIES TREATMENT MODALITIES / RISK FACTORS CLINICAL & BIOLOGICAL OUTCOME MEASURES?
Define objective eligibility criteria
SELECTION BIAS
STRUCTURED SEARCH STRATEGY TO LOCATE ALL PUBLISHED AND UNPUBLISHED STUDIES? COCHRANE CLINICAL TRIALS REGISTRY ELECTRONIC DATABASES (PUBMED & EMBASE) PEER CONSULTATION MANUAL SEARCHING OF MEETING ABSTRACTS
Location of all possible studies
PUBLICATION BIAS
UNBIASED APPLICATION OF INCLUSION AND EXCLUSION CRITERIA ABOVE? WEIGHTING OF STUDIES BY SCORING OF METHODOLOGY? USE OF MORE THAN ONE INDEPENDENT OBSERVER? BLINDING OF STUDIES TO REVIEWERS? NEEDED DATA FROM UNPUBLISHED SOURCES?
Screening & scoring of located studies
SELECTION BIAS
STANDARDISATION OF OUTCOME MEASURES USE OF SUMMARY STATISTIC WITH ODDS RATIO, RELATIVE RISK RATIO AND NUMBER NEEDED TO TREAT?
Data Extraction
HETEROGENEITY
Statistical calculation of overall effect
Creation of a structured report
Fig. 30.2 Overall pathway of systemic review/meta-analysis
USE OF FIXED OR RANDOM EFECTS MODELS? HETEROGENEITY ANALYSIS SENSITIVITY ANALYSIS SUBGROUP ANALYSIS
30
Systematic Reviews and Meta-Analyses in Surgery
best of both worlds, has its own limitations as detailed below and is out of scope of this work. Patient characteristics – These include age, gender and ethnicity, presenting condition, co-morbidities, duration of illness and method of diagnosis. Treatment modalities – For the condition in question, the allowable treatment type, dosage, duration and conversion from one treatment to another should be addressed.
30.2.4 Defining the Outcome Measures Most studies have multiple outcome measures. The protocol for the meta-analysis should specify the outcomes that will be studied [4]. There are two schools of thought. The researcher can either focus on one or two primary outcomes or make it a fishing expedition and assess as many outcomes as possible. Also of note is that one should include only one set of results from a single study, even if multiple publications are available. For example, a study carried out in the year 2000 might be published as a 2-year follow-up in the year 2002. More data might be included in a 5-year follow-up in the year 2005, hence, for metaanalysis purposes, only the year 2002 or 2005 paper should be included so as to avoid duplication of the data set. Thus, it is necessary to have a method for deciding which papers will be included. Most often it is reasonable to specify that this will be the latest paper published, or the paper with the most complete data on the outcome measures of interest [4].
30.2.5 Locating all Relevant Studies This is by far the most important, frustrating and timeconsuming part of the meta-analysis. A structured search strategy must be used. This usually involves starting with databases such as NLH Medline, PubMed, EMBASE, CINAHL and even Google scholar. There are different search strategies for various databases and effective use must be made of MeSH headings, synonyms and the “related articles” function in PubMed. It is worth getting a tutorial from a librarian on how
381
to obtain high-yield searches that include most of the published studies that one requires.
30.2.6 Screening, Evaluation and Data Abstraction A rapid review of the abstracts of the papers will eliminate those that are fit for exclusion because of inadequate study design, specific population or duration of treatment or date of the study. If the published material is just an abstract, there must be sufficient information to evaluate its quality. There must also be summary statistics to put into the meta-analysis, available either from the written material or in writing from the investigator. It is essential that when the available written information is insufficient for the meta-analysis, strenuous efforts be made to contact the principal investigator to obtain the needed information in order to reduce the effect of publication bias. This becomes even more important for material that has not been formally published, which can only be obtained from the principal investigator [4]. The next step is to collect the full papers. The data will then have to be extracted and added to a predesigned data extraction form. It is useful if two independent observers extract the data, to avoid errors. Next, extract all the patient demographics and baseline characteristics from all the included studies. All the clinical outcomes of interest should also be extracted. A table incorporating all the extracted data can then be created which shows all the variables and their values from all the studies included in the meta-analysis. Furthermore, it is essential to ascertain how well matched the studies for the various variables are and scoring them accordingly, the overall quality of the studies should also be noted. There is no consensus on this issue in the metaanalysis literature. Quality scores can be used in several ways: as a cut-off, with the meta-analysis including only studies above some minimum score; as a weighting value, with studies with higher quality scores being given more weight in the analysis or as a descriptive characteristic of the study, used in explaining study variability and heterogeneity [28, 35]. Blinding observers to the names of the authors and their institutions, the names of the journals, sources of funding and acknowledgements can lead to more consistent scores [28].
382
30.2.7 Choose and Standardise the Outcome of Measure Individual results have to be expressed in a standardised format in order to compare the studies. If the end point is continuous such as the length of hospital stay after bypass surgery, the mean difference (weighted mean difference, WMD) between the treatment and control groups is used. The size of a difference, however, is influenced by the underlying population value. For example, off-pump bypass surgery is likely to have a greater effect on high-risk groups. Differences are therefore often presented in units of standard deviation. Figure 30.3 shows the data presented in a forest plot. More will be discussed about the funnel plot later. If the end point is binary or dichotomous, such as mortality or no mortality, then the odds ratio (OR) or relative risk or risk ratio (RR) is calculated. The OR is the probability that a particular event will occur to the probability that it will not occur, and can be any number
Fig. 30.3 Example of a forest plot
S. S. Panesar et al.
between zero and infinity. In gambling, the OR describes the ratio of the size of the potential winnings to the gambling stake; in health care, it is the ratio of the number of people with the event to the number without. Risk is the concept more familiar to patients and health professionals. Risk describes the probability with which a health outcome (usually an adverse event) will occur. Measures of relative effect express the outcome in one group relative to that in the other. Hence, the RR is the ratio of the risk of an event in the two groups, whereas the OR is the ratio of the odds of an event. For treatments that increase the chances of events, the OR will be larger than the RR, and hence, the tendency will be to misinterpret the findings in the form of an overestimation of treatment effect, especially when events are common (with, say, risks of events more than 20). For treatments that reduce the chances of events, the OR will be smaller than the RR, so that again misinterpretation overestimates the effect of treatment. This error in interpretation is unfortunately quite common in published reports of individual studies and systematic reviews [25].
30
Systematic Reviews and Meta-Analyses in Surgery
Absolute measures, such as the absolute risk reduction or the number of patients needed to be treated (NNT) to prevent one event are more helpful when applying results in clinical practice [11]. The NNT can be calculated as 1/risk difference (RD).
30.2.8 Statistical Methods for Calculating Overall Effect The final step consists in calculating the overall effect by combining the data. Simply averaging the results from all the trials would give misleading results. This is what gives a meta-analysis “teeth” compared to a narrative review which uses these simple statistical methods. The results from small studies are more subject to the play of chance and should therefore be given less weight. Methods used for meta-analysis use a weighted average of the results, in which the larger trials have more influence than the smaller ones.
30.2.9 Fixed and Random Effects Models Two models can be used to assess the way in which the variability of the results between the studies is treated [3]. The “fixed effects” model considers that this variability is exclusively due to random variation. Therefore, if all the studies were infinitely large, they would give identical results. The “random effects” model assumes a different underlying effect for each study and takes this into consideration as an additional source of variation, which leads to somewhat wider confidence intervals (CIs) than the fixed effects model [9]. Effects are assumed to be randomly distributed and the central point of this distribution is the focus of the combined effect estimate. Both models have their limitations and a substantial difference in the combined effect calculated by the fixed and random effects models will be seen only if studies are markedly heterogeneous [3].
383
(or “fixed”) effect underlies every study in the metaanalysis. In other words, if we were doing a metaanalysis of ORs, we would assume that every study is estimating the same OR. Under this assumption, if every study were infinitely large, every study would yield an identical result [25]. In a fixed effects analysis, the methods used to analyse binary outcomes are: The general inverse variance-based method, the Mantel–Haneszel method and Peto’s method, each of which has certain advantages and disadvantages which will be discussed later. Each study is assumed to be a random representative conducted on a homogeneous population of patients. Each study is in essence identical to one another and the study outcome should fluctuate around one common outcome or effect measure – hence, the name fixed effects. This is the same as assuming there is no statistical heterogeneity among the studies. Thus, the summary measure is a simple weighted average and can be easily interpreted as an estimate of a single population outcome measure. The 95% CI will reflect only the variability between patients; hence, with this class of methods, the 95% CI will be very narrow with more power to reject the null hypothesis. The fixed effects analysis may be justified when the test for heterogeneity is not significant; i.e., when there is no evidence of major differences among studies whether methodological, clinical or otherwise. A very common and simple version of the meta-analysis procedure is commonly referred to as the inverse variance method. The inverse variance method is so named because the weight given to each study is chosen to be the inverse of the variance of the effect estimate (i.e., one over the square of its standard error). Thus, larger studies that have smaller standard errors are given more weight than smaller studies that have larger standard errors. This choice of weight minimises the imprecision (uncertainty) of the pooled effect estimate. A fixed effects meta-analysis using the inverse variance method calculates a weighted average as Equation 1: Generic inverse variance weighted average
( ) ( )
2 Generic inverse variance ∑ Ti Si = weighted average ∑ 1 Si2
(30.1)
30.2.9.1 Fixed Effects Meta-Analysis Methods of fixed effects meta-analysis are based on the mathematical assumption that a single common
whereby Ti is the treatment effect estimated in study i, Si is the standard error of that estimate and the summation is across all studies. The basic data required for
384
S. S. Panesar et al.
Table 30.2 Outcome data from a single RCT and Case–control study
vMH(ln(OR)) =
RCT Failure (dead)
Success (alive)
New treatment
a
b
Control
c
d
a
b
Control
c
d
⎛ k
⎞
⎝ i =1
⎠
+
2
k
∑ Qi Si
, + 2 ⎛ k ⎞⎛ k ⎞ ⎛ k ⎞ 2 ⎜⎜ ∑ Ri ⎟⎜ ⎟⎜ ∑ Si ⎟⎟ 2 ⎜ ∑ S ⎟ ⎜ i =1 i ⎟ ⎝ i =1 ⎠⎝ i =1 ⎠ i =1
i =1
⎝
(30.3)
⎠
A 100 (1 − a)% CI for the summary OR q, is calculated as follows: Equation 30.4: 100 (1 − a)% CI for the summary OR q
Non-diseased (controls)
New treatment
k
2 ⎜⎜ ∑ Ri ⎟⎟
Case–control study Diseased (cases)
∑ (PS i i + Qi Ri )
k
∑ PR i i
i =1
⎡
exp ⎢ln(T ⎢ ⎣
the analysis are therefore an estimate of the treatment effect and its standard error from each study. When data are sparse, either because the event rates are low or trial size is small, the estimates of the standard errors of the effect estimates that are used in the inverse variance methods may be the Mantel– Haenszel method that uses a different weighting scheme depending upon which effect measure (e.g. RR, OR, and RD) is being used. It has been shown to have better statistical properties when there are few events. The Mantel–Haenszel method is hence normally the default method of fixed effects analysis [21, 33]. The pooled estimate TMH is calculated by: Equation 30.2: Pooled estimate of OR. (Mantel– Haenszel method)
MH(OR)
) − zα /2 ⎛⎜⎝ v
1/2 ⎤
MH(OR)
⎞ ⎟ ⎠
⎥ ≤q ⎥ ⎦
⎡
≤ exp ⎢ln(T ⎢ ⎣
MH(OR)
) + z α/2 ⎛⎜⎝ v
1/2 ⎤
MH(OR)
⎞ ⎟ ⎠
⎥ ⎥ ⎦
(30.4)
Peto’s method can only be used to pool ORs [12]. It uses an inverse variance approach, but utilises an approximate method of estimating the log OR, and uses different weights. An alternative way of viewing Peto’s method is as a sum of “O − E” statistics. Here, O is the observed number of events and E is an expected number of events in the experimental intervention group of each trial. The approximation used in the computation of the log OR works well when the treatment effects are small (ORs are close to 1), events are not particularly common and the trials have similar numbers in experimental and control groups. In other situations, it has been shown to give biased answers. As these criteria are not always fulfilled, Peto’s method is not recommended as a default approach for meta-analysis. For k studies, the pooled estimate of the OR is given by: Equation 30.5: Pooled estimate of OR. (Peto’s Method)
k
∑ ai di
T MH(OR) =
ni
(30.2)
i =1 k
∑
bi ci ni
⎢⎣ i =1
i =1
(a + d ), Q =(b + c), R = a d i
i
ni
i
i
i
ni
k
⎤
vi ⎥⎥ , ∑ i=1 ⎥⎦
(30.5)
vi = Ei ⎡⎢⎣(ni − nti ) / ni ⎤⎥⎦ ⎡⎢⎣(ni − di ) / (ni −1) ⎤⎥⎦
where ai, bi, ci and di are the four cells of the 2 × 2 table for each study where i =1.…k studies as shown in equations 30.1 and 30.2 and ni is the total number of people in the ith study. (See Table 30.2) A variance estimate for the summary OR, TMH is required to calculate a CI around this point estimate. The formula that calculates a variance estimate for the log of TMH (OR) is as follows: Equation 30.3: Variance estimate of summary OR (Mantel–Haenszel method) where Pi =
⎡ k
T PETO(OR) = exp ⎢⎢∑ (Oi − Ei )
i
i i
ni
and Si =
bici . ni
Of note, ni is the number of patients in the ith trial and nti is the number in the new treatment group of the ith trial, di is equal to the total number of events from both treatment and control groups and Oi is the number of events in the treatment group. The Ei is the expected number of events in the treatment group (in the ith trial) and is calculated as Ei = (nti/ni)di. For each study, two statistics are calculated. The first O − E is the difference between the observed and the number expected to have done so under the hypothesis that the treatment is not different from the control, E. The second, v, is the variance of the difference O − E. An estimate
30
Systematic Reviews and Meta-Analyses in Surgery
of the approximate variance of the natural log of the estimated pooled OR is given by: Equation 30.6: Variance of pooled OR (Peto’s method) ⎛ k
⎞
⎜ ⎝ i =1
⎟ ⎠
var(ln T PETO(OR)) = ⎜ ∑ vi ⎟
(30.6)
Equation 30.7: 100 (1 − a)% non-symmetric CI (Peto’s method) A 100 (1 − a)% non-symmetric CI is given by: ⎛ k ⎞ k ⎜ ⎟ (Oi − Ei ) ± zα /2 ⎜ i⎟ ⎜ i =1 ⎟ 1 = i exp ⎜ ⎟ k ⎜ ⎟ ⎜ ⎟ i ⎜ ⎟ i=1 ⎝ ⎠
∑
∑v
(30.7)
∑v
All the methods discussed earlier have their merits and demerits which determine their use. Peto’s method may produce biased ORs and standard errors when there is a mismatch in the numbers of the two groups being compared [22]. If the number of studies to be pooled is small, but the within-study sample sizes in each study are large, the inverse-weighted method should be used. Conversely, if the number of studies to be combined is large, but the within-study sample size in each study is small, the Mantel–Haenszel method is preferred [18]. It is now recommended that a continuity correction be used (adding 0.5 to each cell) for sparse data, except in cases where there is strong evidence suggesting that very little heterogeneity exists among component studies. [45].
30.2.9.2 Random Effects Meta-Analysis When there is some statistical heterogeneity, as detected by a statistically significant heterogeneity test, it will be implausible to assume that the 95% CI or imprecision of the summary outcome reflects only between-patient variability. Therefore, the fixed effects model will not fit the observed data well, as the 95% CI will be too narrow. In the fixed effects analysis, each of the studies in the systematic review is assumed
385
to be fundamentally identical and is simply an independent random experiment done on an identical population of patients. In the random effects analysis, it is assumed that all the studies are fundamentally different and that the outcome of a study will estimate its own unique outcome, which differs from that of the other studies. Hence, each study outcome is not assumed to fluctuate around a fixed, common population outcome, but to fluctuate around its own true value. It is assumed, however, that each of these true values is drawn “randomly” from some underlying probability distribution; i.e., that of a “superpopulation”, commonly assumed to be a normal distribution; hence, the name “random” effects analysis. That is, under a random effects assumption, not only is each study performed on a sample drawn from a different population of patients, but each of these populations is still taken randomly from a common “superpopulation”. A random effects analysis makes the assumption that individual studies are estimating different treatment effects; hence, the 95% CI in a random effects analysis, reflecting the overall variability in the data, will be wider than that of a fixed effects analysis because of both between-patient variability and between-study variability [25]. The DerSimonian and Laird random effects method incorporates an assumption that the different studies are estimating different but yet related treatment effects. This method is based on the inverse variance approach, making an adjustment to the study weights according to the extent of variation, or heterogeneity, among the varying treatment effects. The DerSimonian and Laird method and the inverse variance method will give identical results when there is no heterogeneity among the studies (and thus also gives results similar to the Mantel–Haenszel method in many situations). Where there is heterogeneity, the CIs for the average treatment effect will be wider if the DerSimonian and Laird method is used rather than a fixed effects method, and the corresponding claims of statistical significance will be more conservative. It is also possible that the central estimate of the treatment effect will change if there are relationships between the observed treatment effects and sample sizes. Expressed mathematically, Ti is an estimate of the effect size and qi is the true effect size in the ith study: Ti=qi + ei
386
S. S. Panesar et al.
where ei is the error with which Ti estimates qi and var(Ti) = tq2 + Vi where tq2 is the random effects variance and vi is the variance due to sampling error in the ith study.
30.2.10 Heterogeneity Between Study Results Sometimes, the variance between the overall effect sizes in each study might not be due to random sampling variation, but rather due to the presence of other factors inherent within individual studies. This effect size variation due to slightly different study designs is termed heterogeneity. If the result of each study differs greatly from each other and is deemed to be largely due to heterogeneity, then it may not be appropriate to conduct a meta-analysis in the first place. If a test for homogeneity shows homogeneous results, then the differences between studies are assumed to be a consequence of sampling variation, and a fixed effects model is appropriate. If, however, the test shows that significant heterogeneity exists between study results, then a random effects model is advocated. If there is excess heterogeneity, then not even the random effects model could compensate for this and the viability of the metaanalysis should be questioned. A major limitation with heterogeneity tests is that these statistical tests lack the power to reject the null hypothesis of homogeneous results, even if substantial differences between studies exist. This is because there are only a limited N number of studies available in each meta-analysis study. Although there is no statistical solution to this issue, heterogeneity tests should not be abandoned, as heterogeneity between study results can also provide an opportunity for examining why treatment effects differ in different circumstances. The causes and sources to explain for heterogeneity need to be explored in detail after heterogeneity and the degree of heterogeneity has been identified [1].
30.2.10.1 Assessing for the Presence of Heterogeneity There are three ways to assess heterogeneity. First, one can assess the between-studies variance – t2. However,
this depends mainly on the particular effect size metric used. The second is Cochrane’s Q test, which follows a chi-square distribution to make inferences about the null hypothesis of homogeneity. The problem with Cochrane’s Q test is that it has poor power to detect true heterogeneity when the number of studies is small. As neither of the above-mentioned methods has a standardised scale, they are poorly equipped to make comparisons of the degree of homogeneity across meta-analyses [21]. A third, more useful statistic for quantifying inconsistency is I2 = [(Q − df)/Q] × 100%, where Q is the chi-squared statistic and df is its degrees of freedom[22]. This statistic is easier to utilise because it defines variability along a scalefree range as a percentage from 0 to 100%. This describes the percentage of the variability in effect estimates which is due to heterogeneity rather than sampling error (chance). Heterogeneity could be considered substantial when this value is greater than 50% [27].
30.2.10.2 Graphical Display – Forest Plot Results from each trial, together with their CIs, can be graphically displayed in a useful manner on a forest plot. Each study is represented by a black square and a horizontal line, which corresponds to the point estimate and the 95% CIs of the outcome measure, respectively. The dotted vertical line corresponds to no effect of treatment (e.g. an OR or RR of 1.0). If the CI includes 1, then the difference in the effect of experimental and control treatment is not significant at nominally tolerated levels (p > 0.05). The size (or area) of the black squares reflects the weight of the study in the meta-analysis, whilst the diamond represents the combined OR, calculated using a fixed effects model, at its centre with the 95% CI being represented by its horizontal [14]. Most of the studies, if they are homogeneous in design and population, would have overlapping CIs. However, if the CIs of the two studies do not overlap at all, then there is variation between the two studies which is not likely due to chance and likely due to the presence of heterogeneity. Other than graphically using a forest plot, a numerical method could be achieved using the chi-squared test [25]. Most statistical packages will give values for the chi-square and its corresponding p value. This is shown and explained in Fig. 30.3. This will help to assess how heterogeneous the results are. Furthermore, the combined outcome measure (OR/RR/WMD) will have an absolute value, its 95% CI and its corresponding p
30
Systematic Reviews and Meta-Analyses in Surgery
value (Z-effect p value) to see whether the results are statistically significant.
30.2.10.3 Sensitivity Analysis The robustness of the findings of the meta-analysis needs to be assessed by performing a sensitivity analysis. As alluded to previously, both fixed and random effects modelling should be used. Second, the methodological quality of studies needs to be assessed by scoring the quality of the studies on an arbitrary scoring scale or using the scales mentioned earlier. The meta-analysis can be repeated for high-quality and low-quality studies. Third, significant results are more likely to get published than non-significant findings, and this can distort the findings of meta-analyses [24]. The presence of such publication bias can be identified by stratifying the analysis by study size – smaller effects can be significant in larger studies. If publication bias is present, it is expected that, among the published studies, the largest ones will report the smallest effects. However, exclusion of the smallest studies has little effect on the overall estimate. The sensitivity analysis thus shows that the results from a meta-analysis are valid and not affected by the exclusion of trials of poorer quality or of studies stopped early. It also takes into account publication bias [14].
30.2.10.4 Sub-Group Analysis The principle aim of a meta-analysis is to produce an estimate of the average effect seen in trials of a particular treatment [7]. The clinician must make a decision as to whether his/her patient is comparable with the patient group used in the meta-analysis. For example, off-pump coronary artery bypass (OPCAB) surgery is shown to be more beneficial that on-pump coronary artery bypass (ONCAB) surgery in high-risk groups or sub-groups such as the elderly and diabetics. Subgroup analysis shows a benefit, whereas a meta-analysis comparing OPCAB with the ONCAB technique in a general population may result in no superiority being shown by either technique. However, this method can produce findings which are conflicting. One of the OPCAB RCTs used in the meta-analysis that primarily recruited females may show that OPCAB surgery is harmful in the female
387
population, yet the overall message of the meta-analysis is that OPCAB surgery is superior to ONCAB in females. Stein’s paradox must be invoked here [12]. Common sense suggests that gender has no bearing on the outcome; hence, this RCT is discounted and should female patients come to the clinic, they would still be offered OPCAB surgery. The assumption is that inconsistent results are purely due to chance. However, even if some real differences exist, the overall estimate may still provide the best estimate of the effect in that group. Sub-group analysis could also be used to explain for heterogeneity by determining which component of the study design might be contributing to treatment effect.
30.2.11 Meta-Regression Meta-regression is an extension to sub-group analyses. It is the analysis of any significant effects between different sub-group populations of individual trials. Multiple continuous and categorical variables could be investigated simultaneously at the same time. Using meta-regression, a better understanding of the causes for heterogeneity between study groups could be undertaken. However, meta-regression has a number of significant limitations. First, the initial decision to perform a meta-regression on a certain variable is entirely observer-dependent and hence, is also prone to selection bias. This is the case for the meta-analysis of RCTs. Furthermore, meta-regression uses the aggregate outcome in each study as its source data and, hence, might fail to detect genuine relationships between individual variables or might not be able to ascertain the true effect size. Lastly, meta-regression requires many studies (>10) in order, as fewer studies have the risk of obtaining a spurious correlation for a variable, especially when there can be many characteristics that are being studied.
30.2.12 Conducting a Meta-Analysis in the Surgical Context Most of the differences between meta-analyses in surgery and those in other fields originate from the reproducibility in treatment and the addition of confounding
388
factors, inherent and unaccounted for in the modality of treatment. For example, while a pharmaceutical drug acts more or less uniformly in a controlled manner, the success or failure of an operation depends on the expertise of the surgeon with possible differences in the outcomes of surgery, depending on seniority and experience. Most surgeons do not perform their operations in a fully standardised and reproducible manner. Slight imperceptible modifications can occur over time that may lead to variable treatment effects. Surgeons are also less likely to reveal the results of operations with poor clinical outcomes and this precludes to the problem of publication bias in surgery.
30.2.13 The Learning Curve The performance of motor programming involving a set of many repeated set tasks can change with experience, equipment and time. Significant improvements tend to occur in the early stages and then tail off eventually, until a plateau phase is reached. This constitutes a learning curve. The early assessment of a new technology gives a distorted picture of its true efficacy. If a new treatment, say “A” is developed and compared with a conventional treatment “B”, then in the early stages as surgeons learn how to practice A, B will always appear superior. Similarly, once A has been adopted widely, it will appear superior to B based on poor evidence in the earlier years when it had just been pioneered [42]. The correlation is that a meta-analysis done too early comparing two interventions could be disastrous. Meta-analysis done once the newer technology has been accepted is also concerning. For example, in the use of b-blockers versus placebo in the treatment of patients who had a myocardial infarction, from 1967 to 1980, most head to head trials did not show any significant benefit of b-blockade. A meta-analysis done during these earlier years would probably have yielded negative results. A more recent meta-analysis that included studies from 1967 to 1997 has instead shown that b-blockade reduces premature mortality after a myocardial infarction by 20% [19]. These issues can lead to more heterogeneity among surgical trials compared with medical ones, thus threatening the validity of a meta-analysis based on these
S. S. Panesar et al.
data. Those who use systematic reviews for clinical decision-making and evidence-based medicine must take into account the possible extra attendant shortcomings of a surgical meta-analysis [44].
30.3 Assessing the Quality of a Meta-Analysis Two instruments are commonly used to assess the quality of meta-analysis, namely the overview quality assessment questionnaire (OQAQ) scale and the quality of reporting of meta-analyses (QUOROM) checklist [46]. The OQAQ was selected because it has strong face validity, provided data on several essential elements of its development and had a published assessment of its construct validity available. The OQAQ scale measures across a continuum using nine questions (items 1–9) designed to assess various aspects of the methodological quality of systematic reviews and one overall assessment question (item 10). When the scale is applied to a systematic review, the first nine items are scored by selecting either yes, no or partial/cannot tell. The tenth item requires assessors to assign an overall quality score on a 7-point scale [40]. The QUOROM statement was chosen for assessing reporting quality. Although this checklist has not yet been fully validated, extensive work has been conducted and reported. The QUOROM statement comprises a checklist and flow diagram and was developed using a consensus process designed to strengthen the reliability of the yielded estimates when applied by different assessors. It estimates the overall reporting quality of systematic reviews. The checklist asks whether authors have provided readers with information on 18 items, including searches, selection, validity assessment, data abstraction, study characteristics, quantitative data syntheses and trial flow. It also asks whether authors have included a flow diagram with information about the number of RCTs identified, included and excluded and the reasons for any exclusion. Individual checklist items included in this instrument are also answered in either yes, no or partial/ cannot tell [36]. Not only is the QUORUM used to assess for reporting quality, but it also helps to mark out potential
30
Systematic Reviews and Meta-Analyses in Surgery
pitfalls in a meta-analysis, as discussed further in this chapter.
30.4 Pitfalls in Conducting a Meta-Analysis Although the aim of a meta-analysis is to reduce uncertainty, there are instances in which the opposite can be true. In the hierarchy of evidence, the systematic review is placed rightly at the top. However, similar systematic reviews with opposite conclusions or which contradict well-powered high-quality double-blind RCTs are still possible [41].
30.4.1 Conflicting Results Between Meta-Analyses Compared with Large-scale RCTs Two important questions need to be answered. The first is whether meta-analyses of small trials agree with the results of large trials. There exists no absolute definition of what constitutes a large trial; hence, separating small trials from large trials is not easy. Moreover, in the big picture, all trials add to the current base of evidence. The extent to which small trials agree or disagree with larger ones is a multifactorial process. Selection bias tends to skew the results. Large trials appearing in high-impact journals may have been selected as they provide new insight into the merits and demerits of a particular treatment. Furthermore, there may be less consistency for secondary end points than for primary end points in different trials. The second important question is whether metaanalyses can in fact validly substitute large trials. It is known that meta-analyses and large trials tend to disagree 10% to 23% of the time, beyond chance. Clinical trials are likely to be heterogeneous, as they address different populations with different protocols. Patients, disease and treatments are likely to change over time. Future meta-analyses may find an important role in addressing potential sources of heterogeneity rather than always trying to fit a common estimate among diverse studies [24].
389
With these criteria, meta-analyses and RCTs must be scrutinised in detail for the presence of biases and diversity.
30.4.1.1 Why Is There Bias in Meta-Analysis Then? Most of the factors responsible for this bias are due to the assumptions used when combining RCTs. The assumptions are that: • The results of trials are true approximations to the actual true value of the outcome of the study, being different between trials due to the presence of random chance and not due to bias. • The trials selected for combination are representative of all trials possible, whether published or unpublished. • The studies being combined are sufficiently homogeneous in population and methodology such that they are combinable in the first place.
30.5 Pitfalls in the Variable Quality of Included Trials 30.5.1 The Importance of Quality The quality of RCTs has a direct impact on the eventual quality and output produced by the meta-analysis. If not properly designed, flaws within RCTs can produce aberrant results which might not be a true reflection of the overall treatment effect. Hence, when incor porated into a meta-analysis, these flaws can trickle down to directly compromise and invalidate both metaanalysis results and subsequent findings. This dependency of RCT results in a meta-analysis is aptly termed the “garbage in garbage out” or GIGO effect.
30.5.1.1 So what is Quality in an RCT? Quality is a multi-faceted idea, which could relate to the design, conduct and analysis of a trial, its clinical relevance or quality of reporting [29]. It is important to assess how valid a study is and it is this validity that
390
has a huge bearing on the quality of the study. Two types of validity have been proposed, namely internal and external validity [6].
30.5.2.2 Internal Validity Internal validity implies that the differences observed between groups of patients allocated to different interventions may, apart from random chance, truly be due to the treatments under investigation. Bias primarily affects internal validity and is defined as “any process at any stage of inference tending to produce results that differ systematically from [their] true values [6].” In effect, it causes a systematic difference in outcome which leads to a GIGO effect on meta-analytic results. Hence, in the conduct of a meta-analysis, a key assumption will be that any variability between individual RCTs is due to random variation and not from the presence of bias. This topic is covered elsewhere in the book [11, 15, 17, 20, 26, 51] and so are the solutions [2, 8, 10, 13, 30–32, 34, 48, 50].
30.5.2 Quality of Reporting The assessment of the methodological quality of a trial and the quality of reporting go hand in hand. For metaanalysis researchers, it is a joy when a paper provides adequate information about the design, conduct and analysis of the study in question [35]. However, when inadequate information is provided, the difficulty lies in whether one should assume that the quality was inadequate or formally assess it by using different scales of quality.
30.5.2.1 Assessing the Quality of Reporting in RCTs Many reviewers formally assess the quality of RCTs by using guidance from expert sources including the Cochrane collaboration. Recently, in the last decade, the concepts discussed earlier have been ratified into the consolidated standards of reporting trials statement (CONSORT). The CONSORT statement is an important research tool that takes an evidence-based approach to improve the quality of reports in randomised trials.
S. S. Panesar et al.
It offers a standard way for researchers to report trials and is composed of a standardised checklist and flow diagram for detailing the required conduct and reporting of methodology and results in individual RCT reports. The inclusion of the CONSORT guidelines into journal publication criteria has improved the quality of reporting in articles, made the peer review and editorial process more objective and has enabled systematic reviewers to have a greater ability in judging methodological quality for themselves [37].
30.5.2.2 Dealing with Small Studies Effects In effect, the CONSORT guidelines would become a natural addition to inclusion and exclusion criteria and there should be no qualms in rigorously applying these criteria and dropping low-quality trials from a metaanalysis. This would help to reduce small studies effect in a trial. However, this action of rejection should be done in a way such that it can be assessed itself and it is recommended for a reject log to be kept for peer review if necessary. In some instances, one might need to declare the total exclusion of current research and express the need for better quality trials in the future in order to perform an adequate meta-analysis!
30.5.2.3 External Validity External validity gives a measure of the applicability of the results of a study to other “populations, settings, treatment variables, and measurement variables” [9]. It deals with the ability to generalise, where focus is on whether the results of a study can provide a correct basis for generalisations to other circumstances. It should be noted that internal validity is a requirement for external validity, as when the results of a flawed trial become invalid, the question of its external validity automatically becomes redundant [29, 35]. In recent years, large meta-analyses based on data from individual patients have shown that important differences in treatment effects may exist between patient groups and settings. For example, antihypertensive treatment reduces total mortality in middleaged patients with hypertension, but this may not be the case in an elderly population [23]. The baseline characteristics of studies included in the meta-analysis
30
Systematic Reviews and Meta-Analyses in Surgery
391
must be similar. It would only be appropriate to compare apples with apples and not apples with mangoes!
30.7 Which Meta-Analyses Should be Published?
30.5.2.4 Why Is Study Quality Important?
The investigation of heterogeneity between the different studies is the main task in each meta-analysis. A major limitation of formal heterogeneity tests is their low statistical power to detect any heterogeneity, if present. Both informal methods such as comparing results with different designs within different geographical regions and visual methods such as funnel and radial plots should be used. Authors must make every attempt to deal with heterogeneity. Failure to do so should be looked at unkindly by the editors of journals. As discussed previously, the issue of publication bias is inherent to meta-analysis. Studies with non-significant or negative results are less published than those with positive results. Also, “replication studies” conducted in epidemiology are less published in international journals as they do not add anything new to the literature. Future research in the field of meta-analysis needs to focus on the deficiencies of various meta-analytic methods. The influence of different baseline risks, the different quality and type of exposure measurements made and the methods for pooling studies that have measured different confounding variables all need to be taken into account. There is a need for refined protocols for the undertaking and reporting of meta-analysis. The statistical methods used in complex meta-analysis also need to be refined. Rigorous standards must be deployed, as public health regulators will base their decision more and more on the results of meta-analyses [5].
The quality of reporting and the methodological quality of meta-analysis must always be of high quality [47]. It is worth remembering that the inclusion of poorly conducted studies in the meta-analysis will result in poor results. Full use must be made of quality scales, appreciation of the hierarchical structure of studies (RCTs or observational studies) and sensitivity analyses.
30.6 Pitfalls in Biased Inclusion Criteria Similar to selection bias in trials, bias could also occur inherently within a Systematic Review where systematic reviewers with foreknowledge of individual result studies can manipulate the inclusion and exclusion criteria in order to preferentially select or exclude positive or negative studies, respectively, leading to the shaping of an asymmetrical skewed sample of data. The introduction of subjectivity is dependent on the investigators’ familiarity with the subject, their own pre-existing opinions and their conflicts of interests.
30.6.1 Dealing with Personal Bias and Inclusion Bias A number of techniques in systematic review design could aid in the reduction of this form of bias. This can include:
30.8 Systematic Review of Observational Studies
• Prior agreement on selection criteria through consensus. • Pooling of search results between individual systematic reviewers. • Selection of results with the use of distinct inclusion criteria by two individuals consisting of experts and non-experts in the field of study with a third one acting as an arbiter. • Use of a reject log for all excluded articles. • Blinding of systematic reviewers at the stage of selection criteria and critical appraisal.
RCT results are the most objective form of evidence in the ladder of evidence available for a particular intervention. However, there are instances where RCTs are unfeasible and only observational studies like crosssectional studies, case–control studies and cohort studies are possible. This is especially so in studies which might involve small disease prevalence and incidence, moderate effect sizes or long latency periods. In observation studies, the aim is to confirm the association and quantitative assessment of degree of relation.
392
S. S. Panesar et al.
30.8.1 Use Cases for Observational Studies
30.8.2 Problems in Systematic Review of Observational Studies
An aetiological hypothesis generally cannot be tested in a randomised controlled setting. The aetiological risk contributing to a particular disease might be low but still clinical significant. When compounded with rare diseases of small incidences and prevalence, the resultant fraction of individuals with the disease in a study directly attributed to the aetiology risk might be extremely low. This makes the study of these individuals in a prospective RCT very difficult due to costs in terms of monetary expenditure, manpower and time. Moreover, the ramifications of exposing individuals to risk factors make the randomisation of individuals to exposure and control groups unethical. In these circumstances, a cohort or case–control studies could be a better study design [16]. Even with the use of RCTs, observational studies would have to be undertaken in order to ensure comprehensive cover in medical effectiveness research. RCTs can only establish the efficacy of treatment and the more common adverse effects. This is because an RCT can only be performed within a finite amount of time and hence, less common adverse effects might not be picked up once the trial ends. Owing to the lack of long-term follow-up, late onset adverse effects which have a long latency before presentation might not be identified. Once late onset adverse effects are discovered, ethical, political and moral obstacles would prevent the approval for a new prospective trial from being conducted. In these circumstances, either case–control studies or post-marketing surveillance scheme analysis could aid in following up the patients [16]. The findings in RCTs might not be as applicable in clinical practice too. The population enrolled into RCTs might differ from the average patient seen in clinical practice. Furthermore, the environment in which the trial is conducted might differ from clinical practice, as most trials are conducted in a tertiary university hospital setting where more services and specialist advice could be attained. Demographically, an RCT could have excluded women, elderly and minority ethnic groups, which could prove to form a sizable bulk of patients in the real clinical setting. Observational studies could plug in the gaps left by trials in this case.
The meta-analysis of observational studies would aid in combining results allowing for increased power in the study of very rare exposures and risks. However, apart from the different forms of bias as described elsewhere, there are further sources of bias in observational studies due to the nature of observational study designs. The potential for bias to affect results in the meta-analysis of observational studies is much greater than the one using RCTs, such that even when metaanalysed, meta-analysis results could be implausible or worse, spuriously plausible. The major forms of bias in observational studies include: • Confounding bias • Subject selection bias/recall bias • Heterogeneity in methodology and study populations • Misclassification bias
30.8.2.1 Confounding Bias Confounding bias occurs when a factor, which is related to both the exposure and the disease under study and its influence, is not accounted for during analysis. It can be statistically removed by careful study design where individual variables thought to affect the exposure and disease outcome are well documented and measured and with the use of variance analysis methods (NOVA and ANOVA), its influence removed from the findings. However, the correction for confounding bias is still dependent on the ability to measure the magnitude of the confounding variable with sufficient precision. If imprecise, residual confounding bias could still be present. Moreover, unless actively looked out for, the confounding variable might be overlooked entirely. This problem predominates in prospective and especially retrospective cohort studies.
30.8.2.2 Selection Bias and Other Forms of Bias For case–control studies, it is bias within the study which is more of a problem. In order to create the necessary
30
Systematic Reviews and Meta-Analyses in Surgery
study and control groups, selection criteria would need to be made. As the selection process is a non-blinded one, there is a great possibility of selection bias. Furthermore, recall bias can occur whereby individuals in both study group and control group might preferentially or subconsciously remember or forget key factors in their individual recollections of exposure risk. This bias is dependent on the knowledge of group allocation which cannot be blinded from either investigators or patients.
30.8.2.3 Heterogeneity in Study Methodology and Populations Lastly, there is increased between-study heterogeneity due to the usage of different methodologies in different observational study designs. This poses a more complex undertaking in the combination of individual summary results compared with RCTs. Moreover, some types of studies, i.e., ecological studies, only have data with regard to populations already exposed to the risk factor with no data for controls. This problem is in establishing comparable groups for combination of results between ecological, cohort and case–control studies. The diversity of populations in epidemiology studies also makes useful summaries for a given population difficult.
30.8.3 Solutions to Problems in Observational Studies In view of this, a number of strategies have been recommended: Similar to CONSORT guidelines for RCTs, a similar set of guidelines have been established to allow for assessment of study quality of observational studies – meta-analysis of observational studies in epidemiology [49]. As with other guidelines described earlier, it is also being established as the criterion for publication in a variety of journals. Egger et al. advocated more detail in individual subject data versus overall study size. The collection of more detailed data on a smaller number of participants is a better strategy for obtaining accurate results than from a study that is collecting cruder data from a larger number of participants. More detail from individual subjects would allow for the easier identification and measurement of potential confounding factors and its
393
statistical removal from the association between exposure and disease. If a precise measurement of the confounding variable is not possible, the use of other studies to derive external estimates of the confounding variable could be used in order to adjust out its influence [16]. Overall, the use of meta-analysis – quantitative statistical synthesis – should not be a prominent component of reviews of observational studies. To enable combination of data, comparable groups must exist. Due consideration to possible sources of heterogeneity between observational study results must be made. Heterogeneity due to methodological quality like the composition of population under study, the level of exposure, the definition of disease in study, the presence of potential bias and confounding should be expected and accounted for. This can be achieved via the use of sensitivity analysis or by stratification or regression. In sensitivity analysis, the influence of different aspects of study methodology to meta-analysis results could be fully realised. However, if heterogeneity is in excess, a meta-analysis is not recommended. Deciding on whether the differences are significant enough to warrant formal combination of results is dependent on both scientific necessity and statistical significance [16]. The meta-analysis of observation studies via the use of independent patient data could allow for better focus on rarer conditions due to a larger available pool of subjects. Moreover, subject selection bias could be removed via the application of stricter inclusion and exclusion criteria. The standardisation of removal of confounding factors is possible and with reanalysis, more valid and precise conclusions with regard to the exposure–disease relationship could be obtained. Even then, this pooling of data allows for optimal sub-group analysis and sensitivity analysis to be undertaken. The meta-analysis of individual patient data, in general, is detailed in the next section.
30.9 Other Types of Meta-Analyses 30.9.1 Data: Meta-Analysis of Individual Patient Data In a typical meta-analysis, the summary data from individual studies are aggregated and combined together. If the required summary data is not present, it could either
394
be derived from pre-existing published data or if inadequate, a request can be made directly to the original authors. The meta-analysis of individual patient data or “pooled analysis” does not utilise summary results. Instead, the datasets of individual trials are directly requested from researchers; the data are standardised and merged together and overall summary overviews are calculated from this merged dataset.
30.9.1.1 Advantages The advantages of this form of meta-analysis are legion including the ability to check and assess the implementation of trial methodology, the adequacy of randomisation, the demographics of individual test groups, the presence of gaps in data recording, the presence of loss from follow-up and how the intention to treat policies are implemented. Further to this, the dataset could also be updated with a new round of follow-ups repeated with the aim to complete any incomplete data records and if not possible, a standard intention to treat policy could be implemented across all study results. The use of stricter inclusion and exclusion criteria could also aid in standardisation. This is followed up by updated methods in the analysis and derivation of summary results where sub-group analysis, the charting of survival endpoints and survivorship studies could be performed. For all these, the meta-analysis of independent patient data is considered a gold standard to other forms of systematic reviews.
30.9.1.2 Disadvantages In view of the need for patient datasets and follow-up of a large population of patients, the implementation and conduct of IPD studies could be very costly monetarily and time-consuming. The provision of datasets is entirely dependent on the cooperation of individual study authors and in some cases, not all original data can be kept and some could have been destroyed.
30.9.2 Study Type: Meta-Analysis of Observational and Epidemiological Studies As described in the previous section.
S. S. Panesar et al.
30.9.3 Study Type: Meta-Analysis of Survival Data Survival study poses a unique problem with regard to meta-analysis. In survivorship studies, the primary outcome of interest is the time from study initiation to an event occurrence (e.g. morbidity or mortality). Usually, these data are plotted in a survivorship curve (Kaplan– Meier curve) and then a summary statistic describing the survival trend is derived (e.g. Hazard ratio/log hazard ratio). A meta-analysis involves the combination of such summary measures of survivorship data. However, there are a variety of methods and different types of summary measures available. Unfortunately, not all trials report a log hazard ratio, and hence, this must be derived in order to allow for comparable results for combination. As most survivorship data are censored, the extraction of accurate data is difficult, which can involve either estimations via mathematical conversions from the provided summary data or via direct measurement off a survival curve. There might not be adequate summary data of sufficient quality in trial reports to derive out hazard ratios. Furthermore, the extraction information directly from Kaplan-Meier curves in papers introduces random measurement error and hence reduces the accuracy of results. In this case, the use of individual patient data is ideal as individual datasets could be merged fully to create a mega dataset and the summary statistic derived with far better precision.
30.9.4
Method: Cumulative Meta-Analysis
When compared with the traditional meta-analysis that only covers a topic at a particular snapshot in time, cumulative meta-analysis encourages the process of performing a new or updated meta-analysis prospectively every time a new trial is published. An example of cumulative meta-analysis is the Cochrane Register of Systematic Reviews with each individual systematic review being revised either after new trials have been published or within a specified period of time (i.e. a time to expiry). The benefits of undertaking a cumulative metaanalysis are two-fold. If done prospectively, it allows for the continual updating of overview estimates, such
30
Systematic Reviews and Meta-Analyses in Surgery
that if there are any changes to the estimates, it would be noticed earlier, which leads to a faster, more timely response in changing the clinical practice. This can potentially save more lives and resources. If done retrospectively, it allows for the exploration of effects of change in the meta-analysis results when trials are added sequentially. When added by chronology, timelag bias and effect of temporal changes on current practice and population could be observed. If additions are undertaken by study size, influence of small study effects on the results of a meta-analysis could be elucidated.
30.9.5 Method: Mixed-Treatment Comparison (MTC) Meta-Analysis A relatively new concept is that of MTCs or “network” meta-analysis. This method is used where several treatment regimes, e.g., A, B, C, D and E exist for a particular condition and the surgeon wants to rank the benefits and harms of each treatment modality, so that the most superior one can be picked for patient care. The concept of heterogeneity is expanded in MTCs. There may be inconsistencies between direct and indirect comparisons of the same treatment. These may be due to sheer difference or bias. There are several modelling methods used in MTC meta-analysis. These are beyond the scope of this book. Use of multi-parameter evidence synthesis methodology for MCTs allows the surgeon to incorporate all the available evidence as opposed to the best available evidence. Answers obtained from MCTs can be used to design new studies for which no direct evidence is yet available [43].
30.10 What Is the Use of Meta-Analyses? A well-conducted systematic review and/or meta-analysis is invaluable for practitioners. Many of us feel overwhelmed by the volume of medical literature and, as a result, often prefer summaries of information to publications of original investigations. Thus, such types of evidence keep us abreast of the proceedings on a particular clinical topic. High-quality systematic
395
reviews and meta-analyses can define the boundaries of what is known and what is unknown and can help us avoid knowing less than has been proven. They are extremely useful in health technology assessments and cost-effectiveness analysis. Furthermore, they identify gaps in medical research as well as beneficial or harmful interventions. Investigators need systematic reviews and meta-analyses to summarise existing data, refine hypotheses, estimate sample sizes and help define future research agendas. Without these, researchers may miss promising leads or may embark on studies of questions that have been already answered. Industry is particularly interested in meta-analyses as it helps to direct resources to viable and beneficial health interventions. Administrators and purchasers need integrative publications to help generate clinical policies that optimise clinical outcomes using available resources. For consumers and health policymakers who are interested in the bottom line of evidence, systematic reviews and meta-analyses can help harmonise conflicting results of research. They can be used as the basis for other integrative articles produced by policymakers, such as risk assessments, practice guidelines, economic analyses and decision analyses. However, meta-analysis is only one of the pillars of evidence-based healthcare that can be used to make clinical, professional and policy decisions.
30.11 Meta-Analysis Software The number of available packages has nearly doubled over the last decade. A detailed cross-comparison of all the available software to conduct meta-analysis is out of the scope of this book. However, a number of generalisations can be applied with the choice of package being dependent on use requirements. Pre-existing commercial general statistic program suites like SAS, STATA and SPSS have been enhanced by the provision of add-on third-party macro programmes that provide a limited set of basic functions for meta-analysis. Standalone packages are purpose built for meta-analysis and tend to have a greater variety of functions available and have greater methods of input, processing and output modes. Some softwares are free, such as RevMan provided by the Cochrane Centre. Others are commercial.
396
30.12 Conclusion Like primary research, meta-analysis involves a stepwise approach to arrive at statistically justifiable conclusions. It has the potential to provide an accurate, quantitative appraisal of the literature. It may objectively resolve controversies. The greatest challenge in conducting a meta-analysis on a surgical topic is often the lack of available data on the subject, because there are few high-quality, published studies with an acceptable degree of heterogeneity. As the number of published meta-analysis in the surgical literature is increasing, surgeons need to understand the concepts behind metaanalyses. Furthermore, surgeons must also have the ability to critically judge the findings, as the results of a meta-analysis have the potential to influence clinical practice across the board.
References 1. Bailey KR (1987) Inter-study differences: how should they influence the interpretation and analysis of results? Stat Med 6:351–360 2. Begg CB, Mazumdar M (1994) Operating characteristics of a rank correlation test for publication bias. Biometrics 50:1088–1101 3. Berlin JA, Laird NM, Sacks HS et al (1989). A comparison of statistical methods for combining event rates from clinical trials. Stat Med 8:141–151 4. Berman NG, Parker RA (2002). Meta-analysis: neither quick nor easy. BMC Med Res Methodol 2:10 5. Blettner M, Sauerbrei W, Schlehofer B et al (1999) Traditional reviews, meta-analyses and pooled analyses in epidemiology. Int J Epidemiol 28:1–9 6. Campbell DT (1957) Factors relevant to the validity of experiments in social settings. Psychol Bull 54:297–312 7. Davey Smith G, Egger M et al (1997) Meta-analysis. Beyond the grand mean? BMJ 315:1610–1614 8. DeAngelis CD, Drazen JM, Frizelle FA et al (2004) International Committee of Medical Journal Editors. Clinical trial registration: a statement from the International Committee of Medical Journal Editors JAMA 292: 1363–1364 9. DerSimonian R, Laird N (1986) Meta-analysis in clinical trials. Control Clin Trials 7:177–188 10. Duval S, Tweedie R (2000) Trim and fill: a simple funnelplot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics 56:455–463 11. Easterbrook PJ, Berlin JA, Gopalan R et al (1991) Publication bias in clinical research. Lancet 337:867–872 12. Efron B, Morris C (1977) Stein’s paradox in statistics. Sci Am 236:119–127 13. Egger M, Davey Smith G et al (1997) Bias in meta-analysis detected by a simple, graphical test. BMJ 315:629–634
S. S. Panesar et al. 14. Egger M, Smith GD, Phillips AN (1997) Meta-analysis: principles and procedures. BMJ 315:1533–1537 15. Egger M, Smith GD (1998) Bias in location and selection of studies. BMJ 316:61–66 16. Egger M, Smith GD, Schneider M (2001) Systematic reviews of observational studies, In: Egger M, Smith G, Altman D (eds) Systematic reviews in health care: meta-analysis in context. BMJ Publishing Group, London, pp 211–227 17. Ezzo J (2003) Should journals devote space to trials with no results? J Altern Complement Med 9:611–612 18. Fleiss JL (1993) The statistical basis of meta-analysis. Stat Methods Med Res 2:121–145 19. Freemantle N, Cleland J, Young P et al (1999). Beta blockade after myocardial infarction: systematic review and meta regression analysis. BMJ 318:1730–1737 20. Gluud LL (2006) Bias in clinical intervention research. Am J Epidemiol 163:493–501 21. Greenland S and Robins JM (1985) Estimation of a common effect parameter from sparse follow-up data. Biometrics 41:55–68 22. Greenland S (1990) Randomization, statistics, and causal inference. Epidemiology 1:421–429 23. Gueyffier F, Bulpitt C, Boissel JP et al (1999) Antihypertensive drugs in very old people: a subgroup meta-analysis of randomised controlled trials. INDANA Group. Lancet 353:793–796 24. Higgins JP, Thompson SG, Deeks JJ et al (2003) Measuring inconsistency in meta-analyses. BMJ 327:557–560 25. Higgins JPT, Green S (eds) (2008) Cochrane handbook for systematic reviews of interventions version 5.0.0. – The Cochrane Collaboration 2008. Available from www.cochranehandbook.org 26. Hopewell S, McDonald S, Clarke M et al (2007) Grey literature in meta-analyses of randomized trials of health care interventions. Cochrane Database Syst Rev 2:MR000010 27. Huedo-Medina TB, Sanchez-Meca J, Marin-Martinez F et al (2006) Assessing heterogeneity in meta-analysis: Q statistic or I2 index? Psychol Meth 11:193–206 28. Jadad AR, Moore RA, Carroll D et al (1996) Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control Clin Trials 17:1–12 29. Jüni P, Altman D and Egger M (2001) Assessing the quality of randomized controlled trials In: Egger M, Smith G, Altman D (2001) Systematic reviews in health care: metaanalysis in context. BMJ Publishing Group, London, pp 87–108 30. Krleza-Jeri K, Chan AW, Dickersin K et al (2005) Principles for international registration of protocol information and results from human trials of health related interventions: Ottawa statement (part 1) BMJ 330:956–958 31. Ioannidis JP, Trikalinos TA (2007) The appropriateness of asymmetry tests for publication bias in meta-analyses: a large survey. CMAJ 176:1091–1096 32. Laupacis A, Sackett DL, Roberts RS (1988) An assessment of clinically useful measures of the consequences of treatment. N Engl J Med 318:1728–1733 33. Mantel N and Hanezsel W (1959) Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst 22:719–748 34. McCray AT (2000) Better access to information about clinical trials. Ann Intern Med 133:609–614
30
Systematic Reviews and Meta-Analyses in Surgery
35. Moher D, Jadad AR, Nichol G et al (1995) Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists. Control Clin Trials 16:62–73 36. Moher D, Cook DJ, Eastwood S et al (1999) Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Quality of reporting of meta-analyses. Lancet 354:1896–1900 37. Moher D, Jones A, Lepage L; CONSORT Group (consolidated standards for reporting of trials) (2001) Use of the CONSORT statement and quality of reports of randomized trials: a comparative before-and-after evaluation. JAMA 285:1992–1995 38. Ng TT, McGory ML, Ko CY et al (2006). Meta-analysis in surgery: methods and limitations. Arch Surg 141: 1125–1130 39. Olkin I (1995) Meta-analysis: reconciling the results of independent studies. Stat Med 14:457–472 40. Oxman AD (1994) Checklists for review articles. BMJ 309:648–651 41. Petticrew M (2003) Why certain systematic reviews reach uncertain conclusions. BMJ 326:756–758 42. Ramsay CR, Grant AM, Wallace SA et al (2001) Statistical assessment of the learning curves of health technologies. Health Technol Assess 5:1–79 43. Salanti G, Higgins J, Ades AE et al (2007) Evaluation of networks of randomized trials. Stat Methods Med Res 17:279–301 44. Sauerland S, Seiler CM (2005). Role of systematic reviews and meta-analysis in evidence-based medicine. World J Surg 29:582–587 45. Sankey SS et al (1996) An assessment of the use of the continuity correction for sparse data in meta-analysis. Commun Stat Simul Comput 25:1031–1056 46. Shea B, Dube C, Moher D (2001) Assessing the quality of reports of systematic reviews: the QUOROM statement compared to other tools. In: Egger M, Smith G and Altman D (eds) Systematic reviews in health care: meta-analysis in context. BMJ Publishing Group, London, pp 122–129 47. Shea B, Boers M, Grimshaw JM et al (2006) Does updating improve the methodological and reporting quality of systematic reviews? BMC Med Res Methodol 6:27
397 48. Sterne JA, Egger M, Smith GD (2001) Systematic reviews in health care: Investigating and dealing with publication and other biases in meta-analysis BMJ 323:101–105 49. Stroup DF, Berlin JA, Morton SC et al (2000) Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group. JAMA 283:2008–2012 50. Stuck AE, Rubenstein LZ, Wieland D (1998) Bias in metaanalysis detected by a simple, graphical test. Asymmetry detected in funnel plot was probably due to true heterogeneity. BMJ 316:469 51. Sutton A (2000) Publication bias. In Sutton A (ed) Methods for meta-analysis in medical research. Wiley, pp 109–132 52. Thompson SG, Pocock SJ (1991) Can meta-analyses be trusted? Lancet 338:1127–1130 53. Williams CJ (1998) The pitfalls of narrative reviews in clinical medicine. Ann Oncol 9:601–605
Further Reading Egger, GD, Smith, Altman D (eds) (2001) Systematic reviews in healthcare. British Medical Association Sutton AJ, Abrams KR, Jones DR et al (2000) Methods for meta-analysis. In: Sutton A (ed) Medical research. Wiley Cochrane handbook. Available at http://www.cochrane.dk/ cochrane/handbook/hbook.htm Cochrane Open Learning Materials for Reviewers. Available at http://www.cochrane-net.org/openlearning/ The Cochrane Collaboration and Trial Registry (CENTRAL). Available at www.cochrane.org/ International Standard Randomised Controlled Trial Number Register. Available at http://www.controlled-trials.com/ Clinical Trials – US. Available at http://www.clinicaltrials.gov/ WHO Clinical Trial Search Portal. Available at http://www.who. int/trialsearch/
31
Decision Analysis Christopher Rao and Thanos Athanasiou
Contents
Abbreviations
Abbreviations ..................................................................... 399 31.1
Introduction ............................................................ 399
31.2
The Role of Decision Analysis in Healthcare Evaluation ....................................... 400
31.3
The Principles of Decision Analysis ...................... 400
31.3.1 31.3.2 31.3.3 31.3.4
Identifying and Bounding the Problem .................... Structuring the Problem ........................................... Acquiring the Model Parameters.............................. Determine the Value of each Alternative Strategy .................................................. 31.3.5 Investigating Uncertainty ......................................... 31.4
400 400 401 401 402
Introducing Time Dependence .............................. 405
31.4.1 Constructing a Markov Model ................................. 406 31.4.2 Analyzing a Markov Model ..................................... 407 31.5
Critical Appraisal ................................................... 407
31.6
Limitations of Decision Analysis ........................... 407
31.7
Conclusions ............................................................. 408
CABG IHD PTCA
Coronary artery bypass graft Ischemic heart disease Percutaneous transluminal coronary angiogram
Abstract An increasing number of studies are being published that use decision analytical techniques, particularly in the fields of cost-effectiveness analysis and the evaluation of emerging surgical technology and practice. As decision analytical techniques are unfamiliar to clinicians, interpretation and critical appraisal is often difficult. In this chapter, we explain, with the use of examples, the fundamental methodology, technique, and potential applications of these techniques in academic surgery. We also discuss more advanced decision analytical techniques that may be encountered within the literature.
References ........................................................................... 408
31.1 Introduction
C. Rao () Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust at St Mary’s Hospital Campus, Praed Street, London, W2 1NY, UK e-mail: [email protected]
“Decision Analysis is a formalization of the decisionmaking process” [23]. By using mathematical relationships to explore the consequences of alternative courses of action, it can facilitate complex decision-making in conditions of uncertainty. Decision analysis uses analytical tools that can be used to combine information from several sources, synthesize data when empirical data are absent or scarce, and explicitly explore the uncertainty associated with a decision [3]. Decision analysis has its theoretical foundations in statistical decision theory [14, 15], a derivative of game theory described by von Neumann in the 1920s [22], and shares common theoretical origins with
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_31, © Springer-Verlag Berlin Heidelberg 2010
399
400
expected utility theory [7]. It also has very close associations with Bayesian statistics analysis, which is often applied to decision-making [21]. It has been widely used in economics since the 1940s [13]; however, it was not until 1967, when decision analysis was used to evaluate the outcomes of radical neck dissection in patients with oral cancer [10], that it was first used to evaluate a healthcare intervention. While decision analysis has still not been widely used in healthcare research [8], it is increasingly being used to estimate the long-term outcomes of healthcare interventions [16, 17] and has become an important element of cost-effectiveness analysis (Chapter 32) [3]. In this chapter, we will discuss the methods, strengths, and limitations of decision analysis, in order to demonstrate its potential applications in healthcare evaluation.
31.2 The Role of Decision Analysis in Healthcare Evaluation Healthcare evaluation has two facets [7]; the first is the process of measurement. It is characterized by an interest in using experimental studies to assess the efficacy of healthcare interventions. The process of measurement often involves focusing on estimation and hypothesis testing, focusing on relatively few parameters, and the relationships between these parameters. Often, measured parameters are not of direct clinical importance, but are used as proxies for more relevant clinical outcomes because they are easier to measure. For example, the effect of hypertension therapy on blood pressure is often measured rather than the effect of the therapy on myocardial infarction or mortality, as larger studies would be required to detect statistically significant differences in these outcomes. The second facet of clinical evaluation, decision analysis, involves using measured information on current practice from multiple sources to inform future practice. Optimal decision-making requires identification of each possible strategy, knowledge of the likelihood of future events, and an analytical framework for balancing future risk and benefits. Decisions should be based on the expected outcomes of each course of action rather than individual parameters (e.g. in the case of hypertension therapy, the decision should be based on the impact that therapy has on
C. Rao and T. Athanasiou
morbidity and mortality rather than the effect it has on blood pressure). There should be an acceptance that there will always be some degree of uncertainty associated with a decision because of variation between individual patients, uncertainty associated with measured parameters, and uncertainty associated with analytical assumptions [3, 7]. Skilled clinicians may analyze the decision-making process intuitively; however, it is useful to formalize this process for more complex healthcare problems [12].
31.3 The Principles of Decision Analysis It is often helpful to divide the process of decision analysis into five sequential components [23]:1. Identifying and bounding the problem 2. Structuring the problem, often using a decision analytical model 3. Acquiring necessary information or populating the model 4. Analyzing the problem 5. Investigating the uncertainty associated with results of the analysis (sensitivity analysis and alternative analysis)
31.3.1 Identifying and Bounding the Problem The first step is identifying the problem and breaking it down into manageable sections (often referred to as bounding the problem). All alternative courses of action, events that follow the initial courses of action, and relevant outcome measures should be identified [12].
31.3.2 Structuring the Problem To structure the problem, a decision analytical model is often constructed. This often takes the form of a decision tree. The decision tree can be thought of as a flow diagram that links actions with outcomes. It can
31
Decision Analysis
401
then be used to calculate the probability and the value of outcomes [19]. It is a useful tool as it forces the clinician to consider all possible outcomes, their desirability, and the likelihood that they will happen. Figure 31.1 illustrates a simple decision tree that was used to combine information on mortality and complications following laparoscopic and open obesity surgery [20]. The decision tree is governed by a number of conventions; it is constructed from left to right: earlier events and choices are depicted on the left, later ones are depicted on the right [18]. The decision tree consists of nodes and branches (the lines that join the nodes). The squares, or decisions nodes, represent clinical decisions (e.g. whether a patient is treated with open or laparoscopic obesity surgery). The circles represent chance occurrences (e.g. whether a patient may suffer complications of their surgery). Each possible chance occurrence has a probability assigned to it, called a path probability that represents the likelihood that the particular event will occur. The triangles or terminal nodes represent final outcomes such as death or a permanent cure. All terminal nodes have payoffs associated with them (e.g. an improved quality of life or a reduction in mortality) [12]. The payoffs and path probabilities are collectively called model parameters.
31.3.3 Acquiring the Model Parameters Studies should use the best quality, most relevant evidence of clinical effectiveness. When evidence has
been synthesized, the methodology should be robust (Chapter 30) and the search strategy should be explicit and comprehensive, as a model can easily be biased by neglecting to include important studies [8]. The same rigor should be applied to ensure that the primary experimental or observational data are of a good quality. If data are used from randomized controlled trials, its relevance in a “real-world” setting should be examined. Data from observational trials should be examined for sources of potential bias [3, 7]. Often where there is an absence of information from other sources, it is necessary to use expert estimation. Methods frequently used to formalize the process of expert estimation such as the Modified Nominal Group (NG) and Delphi methods focus on achieving consensus [11] and not exploring the uncertainty associated with estimates; this can limit the usefulness of estimates in decision analytical models. If expert estimation is used, the reasons for doing so, the values of the estimation, and the uncertainty associated with those values should be clearly justified.
31.3.4 Determine the Value of each Alternative Strategy To calculate the expected value for alternative courses of action, the values associated with each outcome, weighted by the likelihood that each outcome will occur are summed. This is often called rolling-back the decision tree. Death
Alive Long-term Complications
Immediate Surgical Complications Alive
Dead No Complications
Open Bypass Surgery Alive Long-term Complications No Complications
Dead No Complications
BMI 35-49 Death
Alive Long-term Complications
Immediate Surgical Complications Alive
Dead No Complications
Laproscopic Bypass Surgery Alive Long-term Complications No Complications
Dead No Complications
Fig. 31.1 A decision tree. Adapted from Siddiqui et al. [21]
402
C. Rao and T. Athanasiou
This is achieved by first calculating the probability that each outcome will occur. As all chance nodes represent theoretically independent events, this is achieved by multiplying all probabilities together between the decision node and the terminal node (Fig. 31.2). The expected value of each outcome consists of the product of the probability that the outcome will occur and the value or payoff associated with that outcome. The expected value associated with every course of action that could result from a course of action is then summed to calculate the expected value of that course of action (Fig. 31.3). Depending on whether a payoff has negative implications for a patient or healthcare system such as a monetary cost that must be incurred or positive implications such as an improvement in quality of life, either a low or high expected value is desirable [12].
31.3.5 Investigating Uncertainty Uncertainty associated with the true values of parameters used to calculate the value and likelihood of
0.98 x 0.31 = 0.3038
events following each course of action (often called parameter or second-order uncertainty) and uncertainty associated with individual patient variation (often called first-order uncertainty, heterogeneity when explained or variability when unexplained) will result in there being some degree of uncertainty associated with the results of decision analysis. There will also be uncertainty associated with the model structure. The explicit exploration of uncertainty is consequently fundamental to decision analysis [3, 12]. Different methods are commonly used to explore different sources of uncertainty. The effect of modeling assumptions and heterogeneity can be explored by adopting a reference case and conducting alternative analysis in which the effect of adopting alternative modeling assumptions are explored. Parameter uncertainty is explored by conducting sensitivity analysis. Sensitivity analysis can take the form of univariate (in which uncertainty associated with the value of one input parameter is explored), multivariate (in which the combined uncertainty associated with two or more parameters is explored), or probabilistic sensitivity
Pay off
Probability
0.774
0.3038
0.930
0.6762
0.000
0.0200
0.774
0.3564
0.930
0.6336
0.000
0.0100
Recurrence of Angina Survives P=0.98
CABG
P=0.31 Asymptomatic P=0.69
Perioperative Mortality P=0.02 IHD
Recurrence of Angina Survives
P=0.36
P=0.99
Asymptomatic
PTCA P=0.64 Perioperative Mortality P=0.01
Fig. 31.2 “Rolling back the tree” – stage 1. A hypothetical example comparing coronary artery bypass grafting (CABG) and percutaneous transluminal coronary angiogioplasty (PTCA) for the treatment of stable ischemic heart disease (IHD)
31
Decision Analysis
403
0.3038 x 0.774 = 00.2351
1.
Pay off
Probability
EV
0.774
0.3038
0.2351
0.930
0.6762
0.6289
0.000
0.0200
0.0000
0.774
0.3564
0.2759
Recurrence of Angina Survives P=0.98 CABG
P=0.31 Asymptomatic 0.8640
P=0.69 Perioperative Mortality P=0.02 Recurrence of Angina
IHD Survives
P=0.36
P=0.99
Asymptomatic
PTCA
2.
0.930
0.6336
0.5892
0.000
0.0100
0.0000
0.8651
P=0.64 Perioperative Mortality P=0.01
0.2759 + 0.5892 + 0.0000 = 0.8651 Fig. 31.3 “Rolling back the tree” – stage 2. A hypothetical example comparing coronary artery bypass grafting (CABG) and percutaneous transluminal coronary angiogioplasty (PTCA) for the treatment of stable ischemic heart disease (IHD)
analysis (in which the combined uncertainty of all model parameters is explored) [5]. 31.3.5.1 Univariate Sensitivity Analysis The simplest form of sensitivity analysis is univariate sensitivity analysis. In univariate sensitivity analysis, a single input parameter is varied from its highest value to its lowest value. The expected value of each alternative course of action is recalculated as the value of the input parameter is varied. If the optimum decision changes as the input parameter varies, the result is said to be sensitive to the uncertainty associated with this input parameter. This process can then be repeated for all the input parameters. 31.3.5.2 Multivariate Sensitivity Analysis Often, the combined uncertainty associated with two or more variables can cause the optimum decision to
change when the uncertainty associated with these variables individually would not. To investigate the combined effect of the uncertainty associated with two input parameters, two-way or bivariate sensitivity analysis is performed. Both input parameters are varied from their highest to lowest value and the model is recalculated for every combination of values. Figure 31.4 shows how these results can be represented graphically. The different shaded areas represent the most effective strategy for different combinations of the two input parameters. Two-way sensitivity analysis is most useful when the result is not sensitive to either input parameters being investigated individually. It is possible to perform three- or four-way sensitivity analysis; however, it is time-consuming, difficult to interpret the results, and difficult to represent the results graphically. Unless there are relatively few model parameters with associated uncertainty, threeor four-way sensitivity analysis is of questionable value.
404
C. Rao and T. Athanasiou
31.3.5.3 Probabilistic Sensitivity Analysis
Body Mass Index 50-60
Open Immediate Complications
30%
Open 20% Lap.
10%
0% 0%
20%
10%
30%
Laproscopic Immediate Complications Fig. 31.4 Graphical representations of the results of bivariate sensitivity analysis. Adapted from Siddiqui et al. [20]
In probabilistic sensitivity analysis, each parameter is assigned a probability distribution that reflects the uncertainty associated with that parameter. The expected value of each course of action is then recalculated several times (often 1,000 or even 10,000 times) with each of the parameters randomly sampled from the probability distributions for every recalculation [9]. The results can be presented as probability density distributions of the expected values of the alternative courses of action, or as the incremental expected value of one course of action when compared with another. Figure 31.5a shows the distribution of the incremental expected value (in this case measured in qualityadjusted life years) adapted from a published decision analytical model [17]. Figure 31.5b shows the corresponding cumulative probability distribution suggesting that bypass surgery can be said to be the optimum intervention in terms of quality adjusted life years with approximately 80% certainty.
0.14
a
0.12 Probability
0.1 0.08 0.06 0.04 0.02 0 -0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.6
0.7
Incremental Effectiveness Bypass vs. Stenting (QALY)
1
b
0.9 0.8
Fig. 31.5 Graphical representations of the results of probabilistic sensitivity analysis. Adapted from Rao et al. [17]. (a) Probability density function. (b) Cumulative probability density function
Probability
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Incremental Effectiveness Bypass vs. Stenting (QALY)
31
Decision Analysis
405
Many authors argue that probabilistic sensitivity analysis represents the most robust method for exploring and quantifying the uncertainty associated with a decision [13].
31.3.5.4 Alternative Analysis The uncertainty associated with model structure and variables such as population demographics cannot easily be investigated in sensitivity analysis. Consequently, it is common in decision analysis to perform several alternative analyses to investigate the uncertainty associated with these factors. For example, alternative analysis could be performed for male and female cohorts of differing ages, with appropriate morbidity and mortality [16].
a
31.4 Introducing Time Dependence While decision trees are powerful tools for analyzing clinical decisions, they are cumbersome when the effects of a clinical decision must be examined over several years [3]. In an example from the literature, outcomes following two surgical techniques for replacing the mitral valve are compared [16]. Figure 31.6a shows a simple decision analytical model, designed to compare outcomes after mitral valve replacement, with and without preservation of the subvalvular apparatus over a 1-year time horizon. Figure 31.6b, in contrast, demonstrates that even when a relatively simple clinical problem is addressed over a longer time horizon, the tree becomes complex and analysis becomes difficult. This would make analysis of more complicated clinical problems impossible.
Survive Operation
With Preservation of Subvalvular appararus
Perioperative Mortality Mitral Valve Repalcement Survive Operation With Preservation of Subvalvular appararus Perioperative Mortality
Survive Year 10
b
Survive Year 9 Dies
Survive Year 8 Dies
Survive Year 7 Dies
Survive Year 6 Survive Year 5 Survive Year 4
Dies
Survive Year 2
Survive Operation With Preservation of Subvalvular appararus
Dies Dies
Survive Year 3
Survive Year 1
Dies
Dies Dies
Dies Perioperative Mortality Survive Year 10 Survive Year 9 Dies
Survive Year 8 Dies
Survive Year 7 Dies
Survive Year 6 Mitral Valve Replacement
Dies
Survive Year 5 Survive Year 4 Survive Year 3 Survive Year 2 Survive Year 1 Survive Operation With Preservation of Subvalvular appararus
Dies Dies
Dies Dies
Dies Dies
Perioperative Mortality
Fig. 31.6 Decision tree to investigate outcomes following mitral valve replacement. (a) One-year time horizon. (b) Ten-year time horizon
406
C. Rao and T. Athanasiou
A more effective way to deal with the problem of modeling the long-term consequences of a healthcare intervention is to use Markov modeling or simulation [12].
31.4.1 Constructing a Markov Model In a Markov model (Fig. 31.7), a theoretical patient can exist in one of several mutually exclusive health states that have payoffs associated with them. At the end of a defined period (e.g. 1 year or 6 months) called a cycle, a patient can continue to exist in that state, or can move to another state. The likelihood that a patient
will change states is called a transition probability. There are two sorts of Markov models, a Markov– Chain simulation where the transition probabilities remain constant in all cycles and a Markov process where the transition probabilities can vary as the cycles vary. For example, the baseline population mortality may increase with increasing patient age. To calculate transition probabilities from incidence that are reported in literature, obtained from experimental, observational, or meta-analytical data, the following formulas are used: R = −1/P × ln(1−I) T = 1 – exp−R × C
Mitral Valve Dysfunction
Mitral Valve Replacement
Survive Valve Replacement
Perioperative Mortality
Valve Related Mortality
ALIVE
Fig. 31.7 A simple Markov model. Adapted from Rao et al. [17]
DEAD
Baseline Mortality
31
Decision Analysis
where I is the incidence of the event of interest in the study population, P the period over which the data were collected, R the rate at which the event of interest occurs, T the transition probability, and C is the cycle length. Some states are called absorbing states, as when a patient enters this state they cannot leave, for example death. The simulation continues until the patient enters an absorbing state or other defined criteria are fulfilled. For example, one of the termination conditions could be that the simulation should terminate after 10 cycles. If each cycle was 1 year in duration, the time horizon would be said to be 10 years. The rewards that the hypothetical patients have accumulated during the simulation are averaged to calculate the payoffs and costs associated with the outcome of interest [3, 5, 7].
31.4.2 Analyzing a Markov Model The Markov simulation is run several thousand times. The rewards that the hypothetical patients have accumulated during the simulations are averaged to obtain results for a hypothetical cohort of patients. This approach is often called microsimulation. This is an important approach for infectious diseases when interaction between individual patients is important. A more computationally efficient method is to sample expected values for all model input parameters and calculate the payoffs for a whole patient cohort. In both cases, new values of the model input parameters are then resampled and the process is repeated several thousand times to perform probabilistic sensitivity analysis, similar to the way a decision tree is analyzed [3]. It is also possible to calculate the proportion of patients who were in each of the Markov states during each cycle. This information can be used to construct survival curves for patient cohorts following each intervention (Fig. 31.8).
31.5 Critical Appraisal When reading a published decision analysis, it is important to ensure that the author has dealt with all the issues addressed in Section 31.3 and 31.4 in a comprehensive and explicit fashion. Several texts, however, discuss
407
critical appraisal and quality assessment of decision analytical models in more depth. Phillip’s dimensions of quality [13] represents a comprehensive framework for assessment of the quality of decision analytical models. Rigorous guidelines on the methodological quality of decision modeling and sensitivity analysis have also been published by the National Institute of Clinical excellence [1] in the UK, and the National Committee on the Cost-effectiveness of Medicines [9] in the USA.
31.6 Limitations of Decision Analysis The challenges faced by those seeking to use decision analysis can be thought of as either relating to populating the model or structuring the model: • Many of the challenges faced by those seeking to undertake decision analysis relate to the difficulty in acquiring estimates of model parameters. This is a particular problem for new diagnostic tests, procedures, or treatments where long-term data may be limited and interventions may not have been directly compared. It is important to ensure that the model is generalizable and avoids bias when estimating model input parameters; consequently, a robust search strategy for estimates of model parameters is essential. Particularly, care must be taken when evidence is synthesized or expert estimation is used to ensure that the methodology is robust and explicit [2, 4, 13]. • Decision analysis is a powerful tool that can be used to simplify or structure complex clinical problems in order to assist decision-makers. Often clinical problems are so complex that they cannot be readily distilled into a simple list of outcomes, or the complexity of the problem makes obtaining the necessary information and analyzing the model exceedingly laborious [12]. Validating and exploring the uncertainty associated with model structure when there is limited empirical evidence about clinical effects and natural history can also be difficult [2]. While structuring and populating decision analytical models is challenging, decision-makers have always faced challenges relating to the synthesis and interpretation of clinical evidence. Arguably, the explicit structuring of clinical problems, synthesis, and weighting of evidence inherent in decision analysis is an improvement on the implicit and opaque criteria previously
408
C. Rao and T. Athanasiou
Fig. 31.8 Survival curves constructed using the results of a Markov model. Adapted from Rao et al. [17]
1 0.9 0.8
% Surviving
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
1
2
3
4
5
6
7
8
9
10
Years Conventional MVR All SAP techniques Complete SAP
used by decision-makers [2, 4]. In the words of the twentieth century statistician George Box, all models are wrong …some are useful [3]
31.7 Conclusions In this chapter, we have outlined some of the fundamental principles in decision analysis. We have also discussed some of the more advanced techniques and methods that may be encountered in the literature. Decision analysis is a powerful tool that can be used to combine the best available information in order to aid clinicians in making rational decisions. It is most useful in complex clinical situations where there is uncertainty about model parameters or where many different outcomes can occur over different time scales. It can also be used as a tool to incorporate patient preference for the desirability of different outcomes into the decision-making process [8]. Decision analytical modeling is not applied in many situations in which it could prove most useful, arguably because many within the medical profession, clinical research, and research funding bodies are unfamiliar or even resistant to decision analytical techniques [5]. However, the potential of decision analytical modeling is being explored by the Medical Research
Council, one of the largest funding bodies in the United Kingdom. It is already fundamental to the National Institute of Clinical and Healthcare Excellence, and National Coordinating Centre for Health Technology Assessment programs [5, 6]; given its increasingly frequent appearance in the medical literature [12], it is very likely that it will become more widely applied in clinical practice.
References 1. National Institute for Clinical Excellence (2004) Guide to the methods of technology appraisal (reference N0515). National Institute for Clinical Excellence, London 2. Brennan A, Kharroubi SA, O’Hagan A et al (2007) Calculating partial expected value of information in costeffectiveness models. Med Decis Making 27:448–470 3. Briggs A, Sculpher M, Claxton K (2006) Decision modelling for health economic evaluation. Oxford University Press, Oxford 4. Claxton K, Cohen JT, Neumann PJ (2005) When is evidence sufficient? A framework for making use of all available information in medical decision making and for deciding whether more is needed. Health Aff 24:93–101 5. Claxton K, Eggington S, Ginnelly L et al (2005) A pilot study of value of information analysis to support research recommendations for the National Institute for Health and Clinical Excellence, CHE Research Paper 4. Centre for Health Economics, University of York, York
31
Decision Analysis
6. Claxton K, Ginnelly L, Sculpher M et al (2004) A pilot study on the use of decision theory and value of information analysis as part of the NHS Health Technology Assessment programme. Health Technol Assess 8:31 7. Drummond MF, Sculpher MJ, Torrance GW et al (2005) Methods for the economic evaluation of health care programmes, 3rd edn. Oxford University Press, Oxford 8. Friedland DJ, Go AS, Davoren JB et al (1998) Evidencebased medicine: a framework for clinical practice. Appleton & Lange, Stamford 9. Gold MR, Siegel JE, Russell LB et al (1996) Costeffectiveness in health and medicine. Oxford University Press, New York 10. Henschke UK, Flehinger BJ (1967) Decision theory in cancer therapy. Cancer 20:1819–1826 11. Hutchings A, Raine R, Sanderson C et al (2006) A comparison of formal consensus methods used for developing clinical guidelines. J Health Serv Res Policy 11: 218–224 12. Petitti DB (2000) Meta-analysis, decision analysis and costeffectiveness analysis: methods for quantitative synthesis in medicine, 2nd edn. Oxford University Press, New York 13. Philips Z, Ginnelly L, Sculpher M et al (2004) A review of good practice in decision analytical modelling in health technology assessment. Health Technol Assess 8:1–158 14. Raiffa H (1968) Decision analysis: introductory lectures on choices under uncertainty. Addison-Wesley, Reading
409 15. Raiffa H, Schlaifer R (1959) Probability and statistics for business decisions. McGaw-Hill, New York 16. Rao C, Hart J, Chow A et al (2008) Does preservation of the mitral valve apparatus have an effect on long term survival and quality of life? A microsimulation study. J Cardiothorac Surg 3:17 17. Rao C, Aziz O, Panesar SS et al (2007) Cost effectiveness analysis of minimally invasive internal thoracic artery bypass versus percutaneous revascularisation for isolated lesions of the left anterior descending artery. BMJ 334: 621 18. Schwartz WB, Gorry GA, Kassirer JP et al (1973) Decision Analysis and Clinical Judgement. Am J Med 55: 459–472 19. Sox HC Jr, Blatt MA, Higgins MC et al (1998) Medical decision making. Butterworth Heinemann, UK 20. Siddiqui A, Livingston E, Huerta S (2006) A comparison of open and laparoscopic Roux-en-Y gastric bypass surgery for morbid and super obesity: a decision-analysis model. Am J Surg 192:e1–e7 21. Spiegelhalter DJ, Abrahams KR, Myles JP (2003) Bayesian approaches to clinical trials and health care evaluation. Wiley, Chichester 22. Von Neumann J, Morgenstern O (1947) Theory of games and economic theory. Wiley, New York 23. Weinstein MC, Fineberg HV (1980) Clinical decision analysis. Saunders, Philadelphia
Cost-Effectiveness Analysis
32
Christopher Rao and Thanos Athanasiou
Abbreviations
Contents Abbreviations ..................................................................... 411 32.1
Introduction ............................................................ 411
32.2
What is Cost-Effectiveness Analysis? ................... 412
32.3
Perspective .............................................................. 412
32.4
Measures of Effect .................................................. 412
32.4.1 32.4.2 32.4.3 32.4.4
Cost-Minimisation Analysis..................................... Cost-Effectiveness Analysis ..................................... Cost-Benefit Analysis............................................... Cost-Utility Analysis ................................................
32.5
The Quality-Adjusted Life Year ........................... 413
32.6
Discounting ............................................................. 414
32.7
Interpreting the Results of Economic Analysis ... 414
32.7.1 32.7.2 32.7.3 32.7.4
The Incremental Cost-Effectiveness Ratio ............... Ranking Cost-Effectiveness Ratios .......................... The Cost-Effectiveness Threshold ........................... The Willingness-to-Pay Threshold ...........................
32.8
Handling Uncertainty ............................................ 416
413 413 413 413
414 415 415 416
32.8.1 Sensitivity Analysis .................................................. 416 32.8.2 Interpreting the Results of Probabilistic Sensitivity Analysis ........................ 417 32.9 Accessing the Quality and Relevance of Cost-Effectiveness Analysis ............................... 417 32.10 Limitations of Cost-Effectiveness Analysis .......... 419
CET HRQoL ICER NHS NICE NMB QALY WTP
Cost-effectiveness threshold Health-related quality of life Incremental cost-effectiveness ratio National Health Service National Institute of Clinical Excellence Net monetary benefit Quality-adjusted life years Willingness-to-pay
Abstract Cost-effectiveness analysis is a fundamental aspect of the evaluation of health care interventions; however, it remains poorly understood by clinicians. This makes the interpretation and assessment of the quality of studies difficult. In this chapter, we undertake to explain, using examples from the literature, the fundamental methodology and techniques used in cost-effectiveness analysis. We seek to demonstrate and explain common graphical representations of the results of cost-effectiveness analysis. Finally, we will discuss the strengths and limitations of cost-effectiveness analysis.
32.1 Introduction
32.11 Conclusions ............................................................. 419 References ........................................................................... 420
C. Rao () Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust at St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
In the developed world ageing populations, increasing expectations of health care and the cost of modern medical practice have put pressure on the finite resources available for health care [10]. In the United States, health-care spending, as a proportion of gross domestic product, has increased from 5% in 1965 to 15% in 1995 and now exceeds $1 trillion [8, 10]. This is a pattern mirrored across the developed world. In the United Kingdom, spending has increased fivefold in real terms
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_32, © Springer-Verlag Berlin Heidelberg 2010
411
412
between 1945 and 1996 to an estimated £42 billion, accounting for approximately 14.5% of total public expenditure [8, 10] or 5.4% of gross domestic product [18]. It is predicted that health-care spending will rise to 7.8% of gross domestic product by 2008 [18]. Rapidly increasing health-care expenses have undoubtedly resulted in a need to rationalise health-care spending and assess the cost-effectiveness of individual health-care interventions. Cost-effectiveness analysis (also referred to as economic analysis) is a new, evolving and expanding field of health research [8, 15]. First applied to the treatment of end-stage renal failure in the 1960s [20], between 1987 and 1997, the number of publications indexed on Medline increased by more than 400% [15]. The term “cost-effective” is used by third-party payers such as governments and insurance companies to convey careful assessment of relative costs and benefits; by manufacturers to support marketing claims and by clinicians and other patient advocates to justify resource investment [8]. Despite the increasingly widespread use of the term “cost-effective,” it is often misunderstood and misused [15]. In this chapter, we aim to explain what the term cost-effective means. We will outline the fundamental principles of cost-effectiveness analysis and the different types of analysis. We will discuss the interpretation of cost-effectiveness data and the graphical representations of the results of cost-effectiveness analysis. Finally, we will discuss the strengths, limitations and applications of cost-effectiveness analysis in surgical research. It is not the aim of this chapter to provide the reader with detailed descriptions of the methodological tools used to perform cost-effectiveness analysis as this lies out of the scope of this chapter and several existing texts do this well [4, 5].
32.2 What is Cost-Effectiveness Analysis? Cost-effectiveness analysis must deal with both inputs (or costs) and outputs (or effects). The term cost-effective is often mistakenly applied in the medical literature to an intervention that has been demonstrated to be effective in the absence of cost data or cost-saving without consideration of its effectiveness. The frequent misuse of the term undoubtedly contributes to the widespread confusion about its meaning.
C. Rao and T. Athanasiou
Cost-effectiveness studies can be classified according to what perspective costs are calculated and how the effects of competing interventions are measured [15].
32.3 Perspective Costs and effects are seen differently from different points of view. If we consider the cost of a surgical intervention to the United Kingdom National Health Service (NHS), we must consider the cost of pre-operative care, the procedure itself, the cost of outpatient follow-up, the costs incurred in primary health care and the costs of complications and recurrence. If we consider the cost to the patient for the same surgical procedure, all of the previously stated costs would be irrelevant, but we would need to consider costs such as the patient’s loss of earnings and travel expenses. These two estimates of cost for the same procedure could be radically different. When costs relevant to a particular organisation or group are considered, the cost-effectiveness analysis is said to have been performed from the perspective of that particular organisation or group, for example, a patient perspective, third-party payer perspective (either private such as insurance companies or public such as government health programmes) or societal perspective (all costs and consequences to all stakeholders within the boarders of a country) [13]. Although a standard perspective or reference case has been advocated by some authors [8], as a result of practical and conceptual problems [5], there is currently no consensus on what perspective should be used in cost-effectiveness analysis. In the United Kingdom, the National Institute of Clinical and Healthcare Excellence (NICE) recommends the use of an NHS (third-party payer) perspective [1]; others, however, favour a societal perspective [8]. Ultimately, whatever perspective is adopted, it is important that it is explicitly stated as interpretation of costeffectiveness data is impossible unless we know from what perspective the analysis has been performed [5].
32.4 Measures of Effect Studies can be classified according to how the effects of alternative interventions are measured. In this section, we will discuss the most frequently used [15]:
32 Cost-Effectiveness Analysis
• • • •
Cost-minimisation analysis Cost-effectiveness analysis Cost-benefit analysis Cost-utility analysis
32.4.1 Cost-Minimisation Analysis In the simplest form of analysis, cost-minimisation analysis, it is assumed that all interventions are equally effective, and they are compared simply on the basis of cost. This method does not facilitate examination of uncertainty associated with the relative effectiveness interventions. As we can rarely be sure that two interventions are equally effective on every occasion in all patients, costminimisation analysis is no longer widely used [3].
413
The disadvantages of cost-benefit analysis are related to the ethical and practical problems associated with valuing morbidity and mortality. Many authors object to cost-benefit analysis, arguing that valuing health in monetary terms implicitly favour health-care interventions for diseases of the affluent, others simply find valuing human life, however rationally it is done, distasteful. For these reasons, cost-benefit analysis is rarely used in health care [5, 13, 15].
32.4.4 Cost-Utility Analysis
The term cost-effectiveness analysis is confusingly also commonly applied to a subset of economic analysis in which the effect can be any non-monetary parameter that relates to the effect of the interventions, for example, “cost per episode-free day” for asthma interventions [17] or “cost per case detected” for diagnostic tests [11]. This allows comparison of alternative interventions for the same disease within the same field, for example, coronary stenting and coronary artery bypass surgery for occlusive coronary artery disease. It is not, however, possible to compare interventions from different fields [5].
Cost-effectiveness analysis uses natural, programmespecific, measures of effect such as episode-free day [17], whereas cost-utility analysis uses quality-adjusted measures of effect, for example, quality-adjusted life years (QALY), health-adjusted life years or disability-adjusted life years [13]. Many authors, particularly in the United States, either make no distinction between cost-effectiveness and cost-utility analysis [8] or argue that cost-utility analysis is a subtype of cost-effectiveness analysis [15]. Cost-utility analysis has several advantages: it allows us to consider mortality and the morbidity associated with an intervention, it allows us to consider the morbidity from all causes when evaluating the effectiveness of an intervention, it facilitates comparison of costeffectiveness between health-care disciplines and allows us to attach values to outcomes, which are considered good or bad [5, 13]. As QALY are one the most commonly used measures of effect in cost-utility analysis, we will discuss them in more detail in the next section.
32.4.3 Cost-Benefit Analysis
32.5 The Quality-Adjusted Life Year
In cost-benefit analysis, both the costs and effects are expressed in monetary units. This has significant advantages and disadvantages. Unlike other forms of analysis, if it allows us to assess the absolute benefits of a programme without comparison with other interventions as if the costs are less than the effects, then the intervention is clearly cost-effective [5]. The other significant advantage of cost-benefit analysis is that it facilitates comparison of health-care interventions with other public spending, for example, civil engineering or infrastructure projects [13].
The concept of QALY was first introduced in 1968 [12]; however, it was not until 1977 [21] that the term “QALY” became widely used. QALY account for the effect an intervention has on either length or healthrelated quality of life (HRQoL) (quantified using a utility score) by multiplying the change in utility score by the change in the length of life, as follows;
32.4.2 Cost-Effectiveness Analysis
DQALYs = DUtility of health state × D Length of life [15]
Utility scoring can be thought of as a method of quantifying the strength of a patient’s preference for a
414
particular health state or outcome. Conventionally, a utility of 1 is deemed to be equivalent to perfect health and 0 is deemed to be equivalent to death. Most health states are assigned utility values that are between 1 and 0, with health states considered worse than deathassigned negative values. There are several methods for assigning utility values to different health states, such as interval scaling, the standard gamble and the time tradeoff method. As they are all grounded in economic theory, they make the basic assumption that patients will behave rationally to maximise their personal satisfaction or utility. These methods are, however, time-consuming and often conceptually difficult for the patient. Alternatively, pre-scored multi-attribute health-status classification systems such as the EQ-5D system formulated by the EuroQol Group (www.euroqol.org), the quality of well-being questionnaire, the Health Utilities Index and the Short Form 6D (SF-6D) can be used. These consist of questionnaires with scoring systems that have been validated in large population groups [5, 8, 15]. The validity of QALY rely on the assumption that patients are risk neutral and are willing to trade years of life in a given health state for fewer years in a better health state, while this may not be the case in practice, QALYs probably represent a close enough representation of actual patient preference to justify their use in cost-effectiveness analysis [8]. Implicit in cost-utility analysis is the assumption that QALY are equally valuable no matter at what age and to whom they are assigned. While this may appear egalitarian, society may prefer to assign QALY to someone who is very ill rather than to someone who is comparatively well or to someone has been ill most of their life, rather than someone who has been well into old age [5].
32.6 Discounting Most people would rather have $100 now rather than $100 in 10 years time, even when adjusted for inflation. This concept is called positive time preference and occurs for a number of reasons. Future financial gains are less valuable than current gains, as society is becoming wealthier and individuals are risk averse, preferring a definite return now to a possible return in the future. Furthermore, there is no opportunity to use or invest future financial gains (so called opportunity cost) [5].
C. Rao and T. Athanasiou
For these reasons, costs incurred in the future are devalued or discounted using the following formula. X = Cy/(1 + r)y where X is the discounted future cost, Cy, the future cost incurred at year y and r is the annual discount rate. Discounting effects are more controversial. While most people would generally rather have a year of perfect health now than a year of perfect health at the end of their life, health benefits are not as readily transferable as monetary benefits. The current convention is, however, to discount treatment effects. In the United Kingdom, the discount rate is currently set by the treasury at 3.5% for all public-service projects, and NICE recommend using the limits of 0–6% in sensitivity analysis [1, 5]. The United States Panel on the Cost-Effectiveness of Medicine recommended a discount rate of 3% (0–5% for sensitivity analysis) [8], while the World Health Organisation recommends 3% (0–6% for sensitivity analysis) [19]. Many authors, however, continue to use 5% to facilitate comparison with the large body of published cost-effectiveness data based on discount rates of 5%. Most cost-effectiveness analysis do not account for inflation, as it is assumed that all costs will inflate at the same rate [5].
32.7 Interpreting the Results of Economic Analysis 32.7.1 The Incremental CostEffectiveness Ratio The incremental cost-effectiveness ratio (ICER) is the most commonly used method of summarising the results of cost-effectiveness analyses. It represents the ratio of the difference in costs to the difference in effectiveness between the two interventions and is calculated as follows: ICER =
Cx − Cy , Ex − Ey
where Cx is the cost of intervention X, Cy, the cost of intervention Y, Ex, the effectiveness of interventions X and Ey is the effectiveness of interventions Y [15]. There are two main ways in which to interpret the ICER. It can be compared with the ICERs of other
32 Cost-Effectiveness Analysis
415
established health-care interventions or it can be compared with a cost-effectiveness threshold (CET) [5]. We will discuss these methods in more depth in the following sections.
32.7.2 Ranking Cost-Effectiveness Ratios There was a trend in the literature of the 1980s and 1990s to rank the cost-effectiveness of interventions as a means of informing decision-makers and allowing readers to put reported cost-effectiveness ratios into perspective (Table 32.1). Cost-effectiveness ratios are now rarely ranked in this way as in order to interpret “league tables” there must be homogeneity between studies in key areas [5]; • How the studies deal with expenditure and gains obtained over a period of time (discounting) • Methods for estimating utility or effect • The costing perspective • Perhaps most importantly the choice of comparison programme(s). A 1993 review of 21 published cost-utility analyses demonstrated that there was little methodological homogeneity [6]. Despite efforts to promote homogeneity in study design [1, 8], there is still sufficient variation in study design to make the ranking cost-effectiveness ratios problematic. Ranking cost-effectiveness ratios is also criticised because it does not account for the uncertainty associated with the ratios [5].
32.7.3 The Cost-Effectiveness Threshold Inherent in the “league-table” is the idea that, at some point, as we go down the table, the interventions will Table 32.1 Cost-effectiveness league table. (Adapted from Drummond et al. [5]) Intervention ICER (£/QALY) Cholesterol testing and diet therapy only (all adults, aged 40–69 years)
220
Pacemaker implantation
1,100
Cholesterol testing and treatment
1,480
Kidney transplant
4,710
Neurosurgical intervention for malignant inter-cranial tumours
107,780
cease to be cost-effective. The cost-effectiveness ratio at this point could be termed the CET. The CET represents the amount that the health-care provider is prepared to pay for an improvement of one QALY [5]. It reflects the provider’s willingness to pay for health improvement, budget size and the level of health-sector inflation [2]. Critics argue that the application of a universal CET fails to account for social priorities such as equity or disease burden; furthermore, it is argued that the amount that health-care providers are willing to pay for health improvement is not independent of the size of the programme [5]. There is also evidence that different CETs are applied in different areas of health care [2]. It is argued that there is no place for CET in costeffectiveness analysis as researchers should not state whether or not an intervention is cost-effective, as their expertise is in research and not policy making [13]. The desire by researchers to put their results into context is quite legitimate [5]. Furthermore, the distinction between researcher and policy-maker is often not so clearly demarcated, and the adoption of new technology is often driven by clinicians and not policy-makers. Despite these criticisms of the CET, it is a useful tool that is widely used and accepted in the published literature. The CET of US$50,000/QALY is often used in the published literature. A study of the decisions of the Australian Pharmaceutical Benefit Advisory Committee suggests that they are unlikely to accept interventions with an ICER in excess of AUS$76,000/ QALY and unlikely to reject an intervention with an ICER of less than AUS$42,000/QALY. NICE loosely apply CET of £30,000/QALY in cases where efficacy is proven and £20,000/QALY in cases where clinical effectiveness is more controversial [5]. When the incremental cost is plotted against the incremental effect, this is called the cost-effectiveness plane [13]. The CET and ICER can be plotted on the cost-effectiveness plane (Fig. 32.1). The quadrants of the cost-effectiveness plane are often numbered from I to IV, starting in the top right-hand quadrant. If we plot the incremental cost against the incremental effect for an intervention and if it lies in quadrant II, the intervention is cost-effective and said to be dominant, as it is both cheaper and more effective. If it lies in quadrant IV, it is said to be dominated as the alternative intervention is both cheaper and more effective than the intervention. If it is in quadrant I above the CET, then the intervention, despite being more effective, is said to be too
416
C. Rao and T. Athanasiou
Incremental Cost
IV
I
NMB of intervention versus existing care
ICER for Programme A ∆E=T Net Monetary Benefit
Cost-Effectiveness Threshold
Incremental Effect
ICER plot for CABG vs. PCTA III
Value of threshold ratio, T
Negative value of the incremental costs of intervention II
Fig. 32.1 The cost-effectiveness plane
Fig. 32.2 Threshold analysis
expensive. Conversely, if it is below the CET, greater effectiveness is thought to justify the extra costs, and the intervention is cost-effective [5, 13]. It is more problematic if the plot lies in quadrant III, as it has been suggested that the compensation that patients expect when they forego a more effective intervention is considerably more than they are prepared to pay for the same programme. Consequently, the CET would probably look more like the solid grey line in quadrant III [14].
linear function of the WTP and the NMB is that it is easier to manipulate and interpret than a ratio of incremental costs and effects [5].
32.7.4 The Willingness-to-Pay Threshold The notion that an intervention is cost-effective if the ICER is less than the CET can be expressed as follows: DC/DE < T where DC is the incremental cost, DE, the incremental effect and T is the CET. This can be rearranged to: TDE−DC > 0 “TDE−DC” is called the net monetary benefit (NMB). As it represents the amount a health-care provider is willing to pay for an increase in effectiveness of DE, minus the associated increase in costs, DC, an intervention can be said to be cost-effective if:
32.8 Handling Uncertainty Uncertainty associated with the true values of parameters used to calculate input cost and effects (often called parameter or second-order uncertainty) and individual patient variation (often called first-order uncertainty, heterogeneity when explained or variability when unexplained) will result in there being some degree of uncertainty associated with the results of a cost-effectiveness analysis. In studies based on decision analytical models, there will also be uncertainty associated with the model structure (Chap. 31), while in experimental studies, there will be uncertainty associated with the study design. The explicit exploration of uncertainty is consequently fundamental to cost-effectiveness analysis [4, 5, 8, 13]. Different methods are commonly used to explore different sources of uncertainty. The effect of modelling assumptions and heterogeneity can be explored by adopting a reference case and conducting alternative analysis in which the effect of adopting alternative modelling assumptions are explored. Parameter uncertainty is explored by conducting sensitivity analysis.
NMB = TDE−DC > 0 As the willingness-to-pay (WTP) threshold (which is analogous to the CET) is generally unknown, the results of cost-effectiveness analyses can be presented graphically in the following way (Fig. 32.2). The advantage of presenting the results in the form of a
32.8.1 Sensitivity Analysis Sensitivity analysis can take the form of univariate (in which uncertainty associated with the value of one input
32 Cost-Effectiveness Analysis
parameter is explored), bivariate (in which the combined uncertainty associated with two parameters is explored) or probabilistic sensitivity analysis (in which the combined uncertainty of all model parameters is explored) [13]. Univariate and bivariate sensitivity analysis now appear less frequently in the literature than probabilistic sensitivity analysis, which is recommended in several guidelines for economic analysis [1]. In probabilistic sensitivity analysis, each parameter is assigned a probability distribution that reflects the uncertainty associated with that parameter. The incremental cost and effect is then recalculated several times (often 1,000 or even 10,000 times) with each of the parameters randomly sampled from the probability distributions every recalculation [8]. The ICER can then be calculated from the mean incremental cost and effect.
32.8.2 Interpreting the Results of Probabilistic Sensitivity Analysis The results of each model recalculation from the probabilistic sensitivity analysis are often plotted on to the cost-effectiveness plane in the published literature (Fig. 32.3). This allows the reader to make rough approximations of the proportion of points that fall into each quadrant or lie below the cost-effectiveness plane. The number and density of plots, however, make it impossible for the reader to quantify the results of sensitivity analysis. It is also conceptually difficult to quantify the uncertainty associated with the ICER as
Fig. 32.3 The costeffectiveness plane showing the results of probabilistic sensitivity analysis. (Adapted from Rao et al. [16])
417
plots in quadrant II and IV, for example, may have the same ICER, despite suggesting different things about an intervention’s cost-effectiveness. This method is also problematic because the CET is rarely known. Figure 32.4 illustrates that the position of the CET on the plane affects the proportion of plots that lie below the cost-effectiveness plane and consequently the likelihood that the intervention is cost-effective [4, 5]. The (WTP) or cost-effectiveness acceptability curve (Fig. 32.5) is a useful alternative to the cost-effectiveness plane for presenting the results of a probabilistic sensitivity analysis. For each WTP, the proportion of the model recalculations that are below the CET is plotted. The (WTP) curve represents a more intuitive representation of the uncertainty associated with the costeffectiveness of an intervention. It is usually interpreted in a Bayesian fashion; thus, for every WTP, the curve represents the probability that the intervention is most cost-effective. The (WTP) curve can be plotted for several competing health-care interventions to illustrate to decision-makers how the cost-effectiveness and associated uncertainty is affected by the amount they are willing to pay for health improvement (Fig. 32.6) [4, 5].
32.9 Accessing the Quality and Relevance of Cost-Effectiveness Analysis The academic surgeon will confidently be able to critically appraise a scientific paper. When accessing the quality and relevance of cost-effectiveness studies,
418
C. Rao and T. Athanasiou
Fig. 32.4 The costeffectiveness plane showing the effect of different cost-effectiveness thresholds. (Adapted from Rao et al. [16])
1
Probability Cost-Effective
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
Fig. 32.6 Willingness-to-pay curve illustrating the cost-effectiveness of several interventions. (Adapted from Griffin et al. [9])
0 0
5000
10000
15000
20000
25000
30000
35000
40000
Willingness to pay (£/QALY)
Probability Cost-Effective
Fig. 32.5 Willingness-to-pay curve. (Adapted from Rao et al. [16])
1 0.9 0.8 0.7 0.5
Bypass Surgery Stenting
0.4
Medical Therapy
0.6
0.3 0.2 0.1 0 0
10000
20000
however, different elements of study design must be considered. In the following section, we aim to briefly outline the important elements of a cost-effectiveness analysis, to allow the reader to identify the strengths and weaknesses of published studies. Should the reader require more information several authors have published more detailed guidance on the critical appraisal of cost-effectiveness studies [1, 5].
30000
40000
50000
60000
70000
80000
Willingness to pay (£/QALY)
• Introduction – The decision-making context should be clear – The study should pose well-defined questions – Description of comparators should be sufficient for the reader to understand exactly what interventions are compared. • Methods – All relevant alternatives should be considered. – Studies should use the best quality, most relevant
32 Cost-Effectiveness Analysis
evidence of clinical effectiveness. If data are used from randomised controlled trials, its relevance in a “real-world” setting should be examined. Data from observational trials should be examined for sources of potential bias. When evidence has been synthesised the search strategy should be explicit and the methodology robust. – The costing perspective should be explicitly stated, and all relevant costs should be identified and accurately measured. – Effects of the competing interventions should be valued credibly. – Costs and effects should be discounted and an incremental analysis should be performed. – Sensitivity analysis should be performed. Ideally, this should be probabilistic, consider all main areas of uncertainty and include justification of ranges and distributions used. • Results – The results should be presented using an appropriate summary index (e.g., ICER expressed in $/QALY). – Appropriate graphical methods should be used. – It should be clear how sensitive the study results are to uncertainty. • Discussion – The presentation and discussion of the results should include the issues that are required to inform decision-makers, for example, including budget impact analysis or value of information calculations (Chap. 33). – The conclusions of the evaluation should be justified by the evidence presented. – The generalisability of the study should be considered.
32.10 Limitations of Cost-Effectiveness Analysis The adoption of a new health-care intervention is not purely an economic decision and social, political, ethical, and moral considerations are often also important. For example, a decision-maker who is concerned with alleviating health inequality may feel an intervention is important, as it targets a disease associated with social deprivation, while a risk averse decision-maker may be
419
more reluctant to adopt new or unproven technology [5, 15]. Cost-effectiveness analysis is criticised, as it does not account for such social, political, ethical and moral considerations. Furthermore, it is suggested that cost-effectiveness analysis often does not even account of all important economic considerations, failing, for example, to account for the effect of the total size of a programme on a decision-maker’s threshold for accepting a health-care intervention [14]. The limitations of cost-effectiveness analysis may appear problematic. The results of cost-effectiveness analysis, however, are not intended to be applied in a mechanistic fashion to determine whether resources should be allocated to new health-care interventions and even the most ardent advocate of cost-effectiveness analysis would argue that it is merely a tool for decision-makers and not a replacement [5, 8, 15]. In addition to the limitations discussed earlier, there are often practical problems in obtaining sufficient information to compare interventions, in particular, valuing health states and obtaining detailed costing data for newer interventions or interventions when long-term follow-up costs and outcomes need to be considered [7, 15]. Cost-effectiveness analysis, however, has a robust analytical framework for explicitly exploring the effect of parameter uncertainty, which can be extended to assess the need and value of further research (Chap. 33) [4].
32.11 Conclusions In this chapter, we discussed the methods, critical appraisal and limitations of cost-effectiveness analysis as well as the importance and socio-political context of cost-effectiveness analysis. Despite the potential applications of cost-effectiveness analysis, it has historically not been applied in situations when it would most impact on patient care. Cost-effectiveness analysis in health care is a relatively new field of health-care research, and many of the concepts may be unfamiliar and poorly understood by clinicians and researchers. There may also be some reluctance to accept that clinical practice should be influenced by external factors such as cost [7, 13, 15]. Although cost-effectiveness analysis is challenging for clinicians and researchers, there is considerable external pressure to rationalise health-resources allocation
420
[10, 18]. As the methods and concepts become more widely understood and accepted, the range of clinical situations where cost-effectiveness analysis is applied will increase. Finally, the potential population health benefits of rationalising health-resource allocation could arguably place an ethical responsibility on clinicians and researchers to consider these issues.
References 1. NICE (2004). Guide to the methods of technology appraisal (reference N0515). National Institute for Clinical Excellence, London 2. Appleby J, Devlin N, Parkin D (2007) NICE’s cost effectiveness threshold. BMJ 335:358–359 3. Briggs AH, O’Brien BJ (2001) The death of cost-minimisation analysis? Health Econ 10:179–184 4. Briggs A, Sculpher M, Claxton K (2006) Decision modelling for health economic evaluation. Oxford University Press, Oxford 5. Drummond MF, Sculpher MJ, Torrance GW et al (2005) Methods for the economic evaluation of health care programmes, 3rd edn. Oxford University Press, Oxford 6. Drummond MF, Torrance GW, Mason JM (1993) Costeffectiveness league tables: more harm than good? Soc Sci Med 37:33–40 7. Friedland DJ, Go AS, Davoren JB et al (1998) Evidencebased medicine: a framework for clinical practice. Appleton & Lange, Stamford 8. Gold MR, Siegel JE, Russell LB et al (1996) Costeffectiveness in health and medicine. Oxford University Press, New York 9. Griffin SC, Barber JA, Manca A et al (2007) Cost effectiveness of clinically appropriate decisions on alternative treatments for
C. Rao and T. Athanasiou angina pectoris: prospective observational study. BMJ 334:624 10. Ham C (2004) Health policy in Britain, 5th edn. Palgrave Macmillan, London 11. Hull R, Hirsh J, Sackett DL et al (1981). Cost-effectiveness of clinical diagnosis, venography and non-invasive testing in patients with symptomatic deep-vein thrombosis. N Engl J Med 304:1561–1567 12. Klarman H, Francis J, Rosenthal G (1968) Cost-effectiveness analysis applied to the treatment of chronic renal disease. Med Care 6:48–54 13. Muennig P (2002) Designing and conducting cost-effectiveness analysis in health and medicine. Jossey-Bass, San Francisco 14. O’Brien BJ, Gertsen K, Willan AR et al (2002) Is there a kink in consumers’ threshold value for cost-effectiveness in health care? Health Econ 2:175–180 15. Petitti DB (2000) Meta-analysis, decision analysis and costeffectiveness analysis: methods for quantitative synthesis in medicine, 2nd edn. Oxford University Press, New York 16. Rao C, Aziz O, Panesar SS et al (2007) Cost effectiveness analysis of minimally invasive internal thoracic artery bypass versus percutaneous revascularisation for isolated lesions of the left anterior descending artery. BMJ 334:621 17. Sculpher MJ, Buxton MJ (1993) The episode-free day as a composite measure of cost-effectiveness. PharmacoEconomics 4:345–352 18. Talbot-Smith A, Pollock AM (2006) The new NHS: a guide. Routedge, Abingdon 19. Tan Torres T, Baltussen RM, Adam T et al (2003) Making choices in health: WHO guide to cost-effectiveness analysis. World Health Organization, Geneva 20. Warner KE, Luce BR (1982) Cost-benefit and cost-effectiveness analysis in health care: principles, practice, and potential. Health Administration Press, Ann Arbor 21. Weinstein M, Stason W (1977) Foundations of cost-effectiveness analysis for health and medical practices. N Engl J Med 296:716–721
Value of Information Analysis
33
Christopher Rao and Thanos Athanasiou
Contents Abbreviations ..................................................................... 421 33.1
Introduction ............................................................ 421
33.2
Calculating the Expected Value of Perfect Information ............................................................. 422
33.3
Population Expected Value of Perfect Information ............................................................. 423
33.4
Expected Value of Partial Perfect Information ... 425
33.5
Expected Value of Sample Information................ 426
33.6
Challenges and Limitations of EVPI Analysis..... 426
33.7
Conclusions ............................................................. 427
References ........................................................................... 427
Abbreviations HTA NMB NICE
Health Technology Assessment Programme Net monetary benefit National Institute of Clinical and Healthcare Excellence EVPI Expected value of perfect information EVPPI Expected value of partial perfect information EVSI Expected value of sample information ENBS Expected net benefit of sampling PEVPI Population expected value of perfect information PEVPPI Population expected value of partial perfect information UK United Kingdom VOI Value of information WTP Willingness-to-pay threshold
Abstract Despite its frequent use in engineering, industry and commerce, Value of Information (VOI) Analysis is a recent development in health care research. In this chapter, we will discuss the techniques and mathematical basis of VOI Analysis. We will discuss how VOI Analysis can be used to evaluate future research and how it has the potential to inform and refine the design of future studies. Finally, we will discuss the challenges and limitations of this emerging field of research.
33.1 Introduction C. Rao () Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London, W2 1NY, UK e-mail: [email protected]
In Chapter 32, we demonstrated that cost-effectiveness analysis can be used to assist decision-makers in determining whether potential benefits offered by health care interventions justify extra investment.
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_33, © Springer-Verlag Berlin Heidelberg 2010
421
422
Decisions about resource allocation are rarely made in conditions of complete certainty [1] and in Chapter 31 we also discussed techniques that could be used to explore and quantify decision uncertainty. It is argued that decision-makers should only consider the expected benefits and not the uncertainty associated with implementing a health care intervention, as failing to adopt an intervention that does not fulfil conventional standards for statistical significance would result in an opportunity cost being incurred by patients [2]. Decision-makers, however, usually do not simply have to choose whether to accept or reject an intervention, they also need to determine whether further information is necessary to reduce the chance of making a wrong decision [3, 4]. Intuitively, the greater the uncertainty associated with a decision, the more valuable further information would be. Value of Information (VOI) analysis is an analytical framework that can be used to determine the cost associated with decision uncertainty and quantify the value of further research. Despite being used in engineering, industry and commerce for some time [5–7], VOI analysis has only recently been applied to health care research [8]. It has been applied to a number of different health care interventions [9] and has been piloted as a means of setting research priorities in the United Kingdom (UK) by National Institute of Clinical and Healthcare Excellence (NICE) [4] and the National Institute for Health Research’s Health Technology Assessment (HTA) Programme [10]. VOI analysis is founded on the premise that the cost of allocating resources to an inferior health care intervention can be thought of in terms of forgone health benefits and wasted resources. By adjusting these costs to reflect the probability of choosing the wrong intervention, the cost of the uncertainty associated with a decision can be calculated. If perfect information was available, there would be no uncertainty and therefore no cost associated with the uncertainty; consequently, the cost associated with uncertainty is termed the Expected Value of Perfect Information (EVPI). When the EVPI is calculated for an entire patient population (The Population Expected Value of Perfect Information – PEVPI), it can be said to represent the maximum cost of undertaking future research [11]. In this chapter, we will discuss EVPI calculation and explore how VOI analysis can be extended to determine research priorities and future study design.
C. Rao and T. Athanasiou
33.2 Calculating the Expected Value of Perfect Information To calculate the EVPI, a probabilistic sensitivity analysis must be conducted using a decision analytical model (Chapters 31and 32). Early methods of evaluating the EVPI associated with a decision rely on the incremental Net Monetary Benefit (NMB) (Chapter 32) being normally distributed [6, 8]. Many decision analytical models have a non-linear structure, contain discontinuities generated by logical functions, and combine parameters from several different sources. It is consequently unlikely that the NMB will be normally distributed or even parametric at all [1]. This requires us to calculate the EVPI directly from the simulated output of the model as follows: If there are x alternative interventions, the NMB of the optimum intervention given current information, with uncertainty q, after n model iterations would be maxxEqNMB (x, q). If perfect information was available, q would not be uncertain, and the decision-maker would choose the intervention that had the highest expected NMB depending on the value of q, maxxNMB (x, q). However, as q is uncertain, the NMB with perfect information is calculated by taking the maximum NMB for each model iteration, and taking the mean after n iterations, EqmaxxNMB (x, q). The EVPI is then calculated by subtracting the NMB of the optimum intervention given current information, with uncertainty q, after n model iterations, from the mean of the maximum NMB for the n model iterations, EVPI = EqmaxxNMB(x, q) − maxxEqNMB(x, q) [1] For example, the results of 10 model iterations generated by a decision analytical model comparing interventions A, B and C at Willingness-To-Pay (WTP) threshold of w is shown in Table 33.1 Despite A having the highest NMB in most interventions, B is the most
33 Value of Information Analysis
423
Table 33.1 Example EVPI calculations Treatment NMB Optimal Maximum EVPI choice NMB A B C Iteration 1
10
11
10
B
11
Iteration 2
12
11
10
A
12
Iteration 3
11
15
8
B
15
Iteration 4
13
11
9
A
13
Iteration 5
11
13
7
B
13
Iteration 6
13
12
9
A
13
Iteration 7
10
9
11
C
11
Iteration 8
11
15
10
B
15
Iteration 9
13
11
10
A
13
Iteration 10 12
11
11
A
12
Mean
11.9
9.5
11.6
12.8
Table 33.2 Example EVPI calculations Treatment Optimal NMB choice A B
Maximum NMB
Iteration 1
10
11
B
11
Iteration 2
12
11
A
12
Iteration 3
11
15
B
15
Iteration 4
13
11
A
13
Iteration 5
11
13
B
13
Iteration 6
13
12
A
13
Iteration 7
10
9
A
10
Iteration 8
11
15
B
15
Iteration 9
13
11
A
13
Iteration 10
12
11
A
12
Mean
11.6
11.9
12.7
NMB after 10 iterations. However, by excluding it from the analysis (Table 33.2), the uncertainty associated with the decision and consequently the EVPI, is reduced EVPI = 12.7–11.9 = 0.8.
0.9
EVPI
This can make calculation of the EVPI difficult in practice as usually several different interventions will need to be compared if the uncertainty is to be accurately estimated [12]. As the EVPI is a function of the NMB, it will vary according to the WTP threshold. Figure 33.1a, b shows the EVPI and the cost-effectiveness acceptability curve based on an economic evaluation of minimally invasive vein harvesting techniques [13]. Figure 33.1 demonstrates how the EVPI is maximum when the uncertainty associated with the decision is maximum. When there are only two alternative interventions, this occurs when the WTP is equal to the incremental costeffectiveness ratio. Whilst the EVPI curve in Fig. 33.1b is relatively intuitive, EVPI curves can become quite complicated when several interventions are being compared (Fig. 33.2).
33.3 Population Expected Value of Perfect Information
0.8
cost-effective intervention at a WTP of w as it has the highest expected NMB. In this example, the EVPI is EVPI = 12.8–11.9 = 0.9. This example illustrates the importance of including all comparators when calculating EVPI. Intervention C is unlikely to be cost-effective, as it has the highest NMB in only one model iteration and has the lowest expected
The EVPI for a population, or PEVPI, can be said to represent a maximum cost of undertaking future research [11]. It has a similar relationship with the costeffectiveness and associated uncertainty of an intervention as the EVPI except that it is scaled up to include estimated patient numbers involved in the policy decision and the likely lifetime of the technology [1]. Figure 33.3 shows the PEVPI for minimally invasive vein harvesting, assuming that approximately two-thirds of the 25,000 patients a year who undergo coronary artery bypass surgery in the UK will undergo minimally invasive vein harvesting, a technology life span of 3 years, and applying an annual discount rate of 3.5%. Methodological research would be useful for PEVPI, as it is often more complicated than scaling up the EVPI. Estimation of the incidence and prevalence of the disease can be problematic, either because it is difficult to measure, because the disease is rare, or because it may vary over the time horizon of the
424
C. Rao and T. Athanasiou 1 0.9
Probability Cost-Effective
Fig. 33.1 Expected value of perfect information for minimally invasive vein harvesting. Using data from Rao et al. [13]. (a) Costeffectiveness acceptability curve. (b) Expected value of perfect information curve
0.8 0.7
Open Vein Harvesting
0.6 0.5
Minimally Invasive Vein Harvesting
0.4 0.3 0.2 0.1 0 0
a
10000
20000
30000
40000
50000
60000
50000
60000
Willingness to Pay ($/QALY)
140 120
EVPI ($)
100 80 60 40 20 0
b
0
10000
20000
30000
40000
Willingness to Pay ($QALY)
£5,000,000
£4,500,000
Expected value of perfect information
£4,000,000
£3,500,000
£3,000,000
£2,500,000
£2,000,000
£1,500,000
£1,000,000
£500,000
£0 £0
£10,000
£20,000
£30,000
£40,000
£50,000
Threshold for cost-effectiveness
Fig. 33.2 Complex population expected value of perfect information curve. Adapted from Claxton et al. [10]
£60,000
33 Value of Information Analysis
425
analysis. Furthermore, predicting the life time of technology and the emergence of competitor strategies is usually very difficult [14]. The population net benefit is probably also affected by health care technology implementation and dissemination strategies, for example whether a new health care intervention is phased in over a period of time or fully adopted in a short time frame. In addition to the trade-offs between investment in health care interventions and investment in research, which can currently be explored using VOI analysis, many authors are also investigating how implementation strategies can affect the population net benefit [15].
33.4 Expected Value of Partial Perfect Information Using the EVPI, we can estimate the maximum value of future research. The Expected Value of Partial Perfect Information (EVPPI) uses similar methodology to estimate the maximum value of future research to reduce the uncertainty associated with particular model parameters [1]. In the UK, pilot studies have investigated whether these techniques could be used to focus investment on research that would mostly reduce the uncertainty associated with decisions to adopt health care interventions [4, 10]. For example, if EVPPI analysis shows that much of the uncertainty relates to the utility of different health states, then decision-makers could invest in qualitative research to value these states; if it shows that much of the uncertainty relates to the cost of intervention, then decision-makers could undertake costing studies; if there is uncertainty associated with the relative efficacy of competing interventions, then
experimental studies could be undertaken, and if there is uncertainty associated with the prevalence or incidence of a disease, epidemiological studies could be undertaken [3, 4, 10]. Calculation of the EVPPI relies on similar methodology to the calculation of the EVPI; however, an extra level of simulation is required, which can make it computationally difficult depending model complexity. A Monte Carlo simulation is performed with the parameter(s) of interest, qi, held constant and the other parameters, qc, sampled (the inner loop). New values for qi are then sampled (the outer loop) and then the Monte Carlo simulation is repeated. This is repeated until we have sampled sufficiently from qi [1, 16], EVPPIqi = Eqi|qcmaxxNMB(X, qi, qc) − maxxEqi|qcNMB(X, qi, qc)
[1]
Figure 33.3 shows the Population Expected Value of Partial Perfect Information (PEVPPI) for minimally invasive vein harvesting. It is important to note that the EVPPI attributable to the combined uncertainty of two or more model parameters is not the same as the sum of their EVPPI. For this reason, parameters that will be informed by the same type of research are often grouped together, for example, a costing study might reduce the uncertainty associated with all costs; therefore, for EVPPI calculation all cost parameters could be grouped together [4, 10]. It is also important to note that the relative EVPPI of different model parameters may change as the WTP changes. Cost parameters are often more important at lower WTP thresholds, and utility parameters are often more important at higher WTP thresholds [1]. Figure 33.4 shows how the relative EVPPI differs at different WTP thresholds in our example.
PEVPI
5000000
PEVPPI ( Effectiveness Parameters ) EVPI ( $ )
4000000
Fig. 33.3 Population expected value of partial perfect information for minimally invasive vein harvesting. Using data from Rao et al. [13]
PEVPPI ( Cost Parameters )
3000000
PEVPPI ( Utility Parameters )
2000000 1000000 0 0
10000
20000
30000
40000
Willingness to Pay ($/QALY)
50000
60000
426
C. Rao and T. Athanasiou
Fig. 33.4 The effect of the willingness-to-pay threshold on the population expected value of partial perfect information on different groups of model parameters following minimally invasive vein harvesting. Using data from Rao et al. [13]
2500000 WTP = $10,000 WTP = $45,000
2000000 1500000 1000000 500000 0 PEVPI ( $ )
PEVPPI ( $ ) Effectiveness Parameters
33.5 Expected Value of Sample Information The EVPI (and EVPPI) can be used to assist decisionmakers in determining what type of research and in what clinical areas further information would be most valuable. As they represent estimates of the maximum value of conducting further research, the cost of conducting further research must not exceed the PEVPI (or PEVPPI). However, to establish whether further research is worthwhile, and identify efficient research design, we need to consider the marginal benefits (The Expected Value of Sample Information – EVSI) and marginal costs of further research (cost of sampling – CS) [16]. The difference between the EVSI and the CS is termed The Expected Net Benefit of Sampling (ENBS). Thus, for a sample size n on qi, which would provide a sample result D,
PEVPPI ( $ ) Cost Parameters
PEVPPI ( $ ) Utility Parameters
further level of sampling. First, a simulation is performed with D sampled each model iteration while qc and qi | D are held constant (the inner loop). qc is sampled (the second loop) and the inner loop is repeated. Then, qi is sampled (the outer loop) and the inner two loops are repeated until we have sampled sufficiently from qi. This process is then repeated for multiple sample sizes, for each alternative study design being considered. Several methods exist to predict D|n, depending on the distribution of qi, these lie out of the scope of this chapter; however, they add further computational demands to the calculation of EVPI. Finally, if D is not conjugate to the qi, then there is no simple analytical solution to obtain qi|D and numerical methods are required to calculate qi|D making calculation of the EVSI even more computationally intensive [1, 16].
33.6 Challenges and Limitations of EVPI Analysis
ENBS\n = EVSI\n–CS\n, where EVSI\n = Eqc, qi|DmaxxNMB(x, qi, qc) – maxxEqc, qi|DNMB(x, qi, qc).
[1]
The ENBS can be regarded as the societal payoff of further research. It has been piloted as a means of determining optimum sample size, allocation of patients, appropriate follow-up and the end points of clinical trials, and can be calculated for a range of study designs to maximise the value of further research [16]. The calculation of the EVSI and therefore ENBS is, however, exceedingly computationally intensive. Whilst it relies upon similar methodology to the EVPPI, it requires a
In the previous sections, we discussed the underlying methodology and many potential applications of VOI analysis; there are however several challenges and technical limitations associated with VOI analyses that need to be addressed. These can be thought of as problems specific to VOI analysis and general problems related to applying any decision analytical framework (Chap. 31). For example, synthesising evidence from a variety of sources, accessing potential generalisability and bias, and estimating parameter uncertainty can be problematic. Particularly, when there is limited information, no direct comparison of interventions or analysis relies upon expert opinion [3].
33 Value of Information Analysis
Whilst VOI can be seen as a natural methodological development of the probabilistic sensitivity analysis recommended by NICE [14, 17], and can usually be undertaken by researchers who are able to perform economic analysis using decision analytical modelling [4], it does present additional challenges that may limit its use; • Robust results of VOI analysis rely upon accurate characterisation of decision uncertainty, which is difficult for a number of reasons. Significantly, the results of VOI analysis have been shown to be very sensitive to the comparators included in the analysis. This has led some authors to suggest that the scope of analysis should be defined prior to analysis and should include all relevant alternative interventions. This can make both systematic assembly and synthesis of data, and the analysis itself problematic [4]. • Estimating the effective population that may benefit from additional evidence is problematic, as estimates need to be made of disease incidence and prevalence, the life span of competing interventions, the emergence of competitors and the effect of implementation strategies [14, 15]. It is also problematic estimating overall VOI based on VOI estimates for different patient subgroups [4]. • The computational burden of VOI analysis, particularly EVSI and ENBS calculation, can be significant, especially if patient level simulation is performed or the decision model is complicated [3, 4]. If EVSI and ENBS are to be more widely applied in the published literature, more efficient computational methods will need to be developed. For example, developing methods to assess the stability of VOI estimates generated by probabilistic analysis could reduce the number of model iterations required [14]. Attempts have been made to establish relationships between model inputs and outputs using linear and Gaussian methods in order to reduce the computational burden of VOI analysis; however, neither of these approaches are completely satisfactory [4]. • Despite its firm foundation in statistical decision theory [18] and its early adoption by NICE [4] and the HTA programme [10], many of the bodies that fund research in the UK are unfamiliar or even sceptical about VOI analysis and the decision analytical framework on which it is based [4]. In the USA, VOI methodology is even more poorly disseminated [3]. Currently, this may limit the use of VOI analysis as a tool for rationalising research funding.
427
33.7 Conclusions The importance of research has traditionally been accessed informally by loosely gauging the clinical burden of a particular disease, the predicted effect of a treatment and the ability of further research to measure this effect reliably. This is inadequate as formal evidence synthesis and a quantitative understanding of available data and current uncertainty is impossible. It is based on implicit weighting of available evidence and consequently highly vulnerable to potential bias [4]. VOI analysis offers an alternative framework for assessing the value of research. It allows a consistent and rational analytical framework to be applied to all aspects of health care investment whether in research or clinical practice [15]. This is particularly important given that research funding, like expenditure in health care, is limited. Furthermore, VOI analysis can also be applied to optimise the design of future research [3, 10, 16]. VOI analysis has not been widely applied to prioritise research [3, 9] and although recommended in NICE guidelines on health technology assessment, it is not currently a requirement in assessment reports [17]. Given that NICE and HTA pilot papers have suggested that VOI analysis is both a practical and potentially useful method of prioritising research investment [4, 10] and the Medical research Council (one of the largest research funding bodies in the UK) is funding a 5-year program to conduct methodological and applied VOI research [4], the importance of VOI analysis to the academic surgeon cannot be underestimated.
References 1. Briggs A, Sculpher M, Claxton K (2006) Decision modelling for health economic evaluation. Oxford University Press, Oxford 2. Claxton K (1999) The irrelevance of inference: a decisionmaking approach to the stochastic evaluation of health care technologies. J Health Econ 18:341–364 3. Claxton K, Cohen JT, Neumann PJ (2005) When is evidence sufficient? A framework for making use of all available information in medical decision making and for deciding whether more is needed. Health Aff 24:93–101 4. Claxton K, Eggington S, Ginnelly L et al (2005) A pilot study of value of information analysis to support research recommendations for the National Institute for Health and Clinical Excellence. CHE Research Paper 4. Centre for Health Economics, University of York, York
428 5. Howard RA (1966) Information value theory. IEEE Trans Syst Sci Cybern SSC-2 1:22–26 6. Raiffa H, Schlaifer R (1959) Probability and statistics for business decisions. McGraw-Hill, New York 7. Thompson KM, Evans JS (1997) The value of improved national exposure information for perchloroethylene: a case study for dry cleaners. Risk Anal 17;253–271 8. Claxton K, Posnett J (1996) An economic approach to clinical trial design and research priority setting. Health Econ 5:513–524 9. Yokota F, Thompson KM (2004) Value of information literature analysis (VOILA): a review of applications in health risk management. Med Decis Mak 24:287–298 10. Claxton K, Ginnelly L, Sculpher M et al (2004) A pilot study on the use of decision theory and value of information analysis as part of the NHS Health Technology Assessment programme. Health Technol Assess 8:31 11. Drummond MF, Sculpher MJ, Torrance GW et al (2005) Methods for the economic evaluation of health care programmes, 3rd edn. Oxford University Press, Oxford 12. Colbourn TE, Asseburg C, Bojke L et al (2007) Preventive strategies for group B streptococcal and other bacterial
C. Rao and T. Athanasiou infections in early infancy: cost effectiveness and value of information analyses. BMJ 335:655 13. Rao C, Aziz O, Deeba S et al (2007) Is minimally invasive harvesting of the great saphenous vein for coronary artery bypass surgery a cost-effective technique? J Thorac Cardiovasc Surg 135:809–815 14. Brennan A, Kharroubi SA, O’Hagan A et al (2007) Calculating partial expected value of information in costeffectiveness models. Med Decis Mak 27:448–470 15. Fenwick E, Claxton K, Sculpher M (2005) The value of implementation and the value of information: combined and uneven development. CHE Research Paper 5. Centre for Health Economics, University of York, York 16. Ades AE, Lu G, Claxton K (2004) Expected value of sample information calculations in medical decision modeling. Med Decis Mak 24:207 17. National Institute for Clinical Excellence (2004) Guide to the methods of technology appraisal (reference N0515). National Institute for Clinical Excellence, London 18. Pratt JW, Raiffa H, Schlaifer R (1995) Statistical decision theory. MIT, Cambridge, MA
Methodological Framework for Evaluation and Prevention of Publication Bias in Surgical Studies
34
Danny Yakoub, Sukhmeet S. Panesar, and Thanos Athanasiou
Contents 34.1
Introduction on Publication Bias in Surgical Studies .................................................. 429
34.2
Importance of Identification of Publication Bias in Meta-Analyses and Evidence-Based Medicine .............................. 430
34.3
Types of Publication Bias ...................................... 431
34.3.1 34.3.2 34.3.3 34.3.4 34.3.5
Subject Selection Bias .............................................. Performance Bias ..................................................... Attrition Bias ............................................................ Detection Bias .......................................................... Time Lead Bias ........................................................
34.4
Methods to Detect Publication Bias ...................... 433
431 432 432 432 432
34.4.1 Graphical Methods ................................................... 433 34.4.2 Numerical Methods .................................................. 434 34.5
Prevention of Publication Bias .............................. 438
34.5.1 Trial Registries ......................................................... 438 34.5.2 Prospective Meta-Analyses ...................................... 438 34.5.3 Searching the Grey Literature .................................. 439 34.6
Abstract In a modern health system, clinical and healthcare decisions are more likely to be based on evidence than individual anecdotes. Doctors, unfortunately, are no better endowed with the judicial faculty than persons in other walks of life, and it is certainly not strange that distortions of facts, put out with apparent scientific accuracy, are everywhere in evidence. Publication bias may thereby mislead healthcare providers into one particular direction of care, and here is where testing for and identification of publication bias and study validity gain utmost importance. Assessment of publication bias can be attempted by either graphical or numerical methods. All tests are rather complimentary in that each contributes to a general indication of bias in included studies within a systematic review.
Conclusions ............................................................. 439
References ........................................................................... 439
D. Yakoub () Department of Surgery, Staten Island University Hospital, 475 Seaview Avenue, Staten Island, New York, NY 10305, USA e-mail: [email protected]
34.1 Introduction on Publication Bias in Surgical Studies “Publication bias”, the term, has been originally introduced to describe the phenomenon of publication or non-publication of studies depending on the direction and statistical significance of results. This could be called outcome bias as well. This leads to the fact that the published literature is systematically unrepresentative of the population of completed studies [1, 2, 8]. Failure to publish has been considered a form of scientific misconduct by some commentators [2], as it leads to misinformation, particularly when it applies to literature reviews including meta-analysis in which this may lead to ongoing investigation with further wasting of resources, both human and material, and may lead to unnecessary risk for future research subjects. In addition, it is unethical to ask participants to contribute to
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_34, © Springer-Verlag Berlin Heidelberg 2010
429
430
research without providing the yield they were expecting to result: a contribution to human knowledge [13]. Concern about publication bias started early when Withering in 1785 noted: “It would have been an easy task to have given select cases, whose successful treatment would have spoken strongly in favour of the medicine, and perhaps been flattering to my own reputation. But truth and science would condemn the procedure. I have therefore mentioned every case … proper and improper, successful or otherwise … After all, in spite of opinion, prejudice or error, time will fix the real value upon this discovery, and determine whether I have imposed upon myself and others, or contributed to the benefit of science and mankind” [19]. Albert Einstein was quoted in 1930 to say on academic freedom: “By academic freedom I understand the right to search for truth and to publish and teach what one holds to be true. This right implies also a duty: one must not conceal any part of what one has recognised to be true. It is evident that any restriction on academic freedom acts in such a way as to hamper the dissemination of knowledge among the people and thereby impedes national judgement and action”. Without publication in peer-reviewed, indexed biomedical journals, there can be no enduring dissemination of the results derived from the trial. Unfortunately, publication, and not research itself has been given a higher position in the academic award system, leading to the fact that funding for research often seeks satisfaction in publication of research product and fails to acknowledge the time and effort needed for writing up and disseminating results of properly conducted research projects. This is a fact that has led the publication not being directly bound to the performance of research. Subsequently, the natural separation between collecting and analyzing data and publication has turned into a dangerous disunion; that is, much of the results of research never get published probably owing to the subjectively so identified “importance” of study results. The “importance” of a study’s findings is inherent to the definition of publication bias; yet, importance itself is not easily defined. To some extent, results only exist once they are disseminated; therefore, importance has been roughly equated with newsworthiness; and newsworthy results mostly are those positive results supporting the efficacy of a new or experimental intervention, leading to easier acceptance of such results for publication. However, sometimes unexpected “negative results”, i.e. results nullifying benefit of a new or existing intervention over a standard one or no intervention at all, make newsworthy
D. Yakoub et al.
results leading to bias towards their publication though still less popular than positive results [9].
34.2 Importance of Identification of Publication Bias in Meta-Analyses and Evidence-Based Medicine In a modern health system, clinical and healthcare decisions are more likely to be based on evidence than individual anecdotes. Doctors, unfortunately, are no better endowed with the judicial faculty than persons in other walks of life, and it is certainly not strange that distortions of facts, put out with apparent scientific accuracy, are everywhere in evidence. Publication bias may thereby mislead healthcare providers into a one particular direction of care, and here is where testing for and identification of publication bias and study validity gain utmost importance. It is undoubted that publication of successful medical and surgical practice is more prevalent in the literature. It is rather natural that authors are more enthusiastic about publishing their successes rather than their failures. Reported studies seem to be written with a constructive motive, yet they present a rather distorted statement of actual facts. The case of certain surgical procedures is, for example, greatly exaggerated in the minds of less trained practitioners through the report of a large series of successful operations, which tends to increase the reputation of the writer, rather than rendering the public more secure.” A certain false security is thereby encouraged, to the end that operations of significance are no doubt performed by really incompetent persons [3, 8]. In the context of a systematic review, the validity of a study is the extent to which its design and conduct are likely to prevent systematic errors, or bias. An important issue that should not be confused with validity is precision. Precision is a measure of the likelihood of chance effects leading to random errors. It is reflected in the confidence interval around the estimate of effect from each study and the weight given to the results of each study when an overall estimate of effect or weighted average is derived. More precise results are given more weight. Variation in validity can explain variation in the results of the studies included in a systematic review [6]. More rigorous studies may be more likely to yield results that are closer to the “truth”. Quantitative analysis of
34
Methodological Framework for Evaluation and Prevention of Publication Bias in Surgical Studies
results from studies of variable validity can result in “false positive” conclusions (erroneously concluding an intervention is effective) if the less rigorous studies are biased towards overestimating an intervention’s effectiveness. They might also come to “false negative” conclusions (erroneously concluding no effect) if the less rigorous studies are biased towards underestimating an intervention’s effect. It is therefore important to systematically complete critical appraisal of all studies in a review even if there is no variability in either the validity or results of the included studies. For instance, the results may be consistent among studies, but all the studies may be flawed. In this case, the review’s conclusions would not be as strong as whether a series of rigorous studies yielded consistent results about an intervention’s effect.
34.3 Types of Publication Bias Publication bias can be due to characteristics of the research itself, the investigators, or editors responsible for the decision to publish [10]. On the researcher level collecting studies when preparing a meta-analysis or a systematic review, forms of bias include the following: language bias, inclusion of studies published in one language only; availability bias, selective inclusion of readily available studies to the researcher; cost bias, inclusion of freely available studies only and familiarity bias, selective inclusion of studies from one’s own discipline only [10, 13]. On the level of research itself, i.e. conducting the individual studies, there are more important types of bias. When outcomes are collectively analysed, these biases lead to invalidity of the overall analysis owing to distance of the individual study outcomes from the truth. such forms of bias include: selection bias, performance bias, attrition bias, detection bias and time lead bias. Editorial publication bias represents inclination of editors to publish particular results to render their respective journals more readable and wide spread.
34.3.1 Subject Selection Bias Subject selection bias describes errors in the way that comparison groups are recruited. Using an appropriate method for blinding to treatment assignment is crucially
431
important in trial design. Those who are recruiting participants and the participants themselves should remain unaware of the next assignment in the sequence until after the decision about eligibility for a trial has been made. Then, after assignment has been revealed, they should not be able to change the assignment or the decision about eligibility. The ideal is for the process to be away from any influence by the individuals making the allocation. This will be most securely achieved if an assignment sequence generated using true randomisation is administered by someone who is not in charge of recruiting subjects, such as someone based in a central trial office or pharmacy. If such central randomisation cannot be organised, then other precautions are required to prevent manipulation of the allocation process by those involved in recruitment [7, 15]. The process of concealing assignment until treatment has been allocated has sometimes been referred to as “randomisation blinding”. This term does not clearly distinguish concealed allocation from blinding of patients, providers, outcome evaluators and analysts and is unsatisfactory for three reasons. First, the reason for concealing the assignment schedule is to minimise selection bias. In contrast, blinding (used after the allocation of the intervention) reduces performance and detection biases. Second, from a practical point of view, concealing allocation up to the point of assignment is always possible, regardless of the study question, but blinding after allocation may be impossible, as in trials comparing surgical with medical treatment. Third, control of selection bias is relevant to the trial as a whole, and thus to all outcomes being compared. In contrast, control of detection bias is often outcome-specific and may be accomplished successfully for some outcomes in a study but not others. Thus, blinding up to allocation and blinding after allocation are addressing different sources of bias, are inherently different in their practicability and may apply to different components of a study. Empirical research has also shown that the lack of adequate allocation concealment is associated with bias. Concealment has been found to contribute more to preventing bias than other components of allocation, such as the generation of the allocation sequence (e.g. computer, random number table, alternation). Thus, studies can be judged on the method of allocation concealment. Information should be presented that provides some confirmation that allocations were not known until, at least, the point of allocation. Inadequate approaches to allocation concealment include alternation, the use of case record numbers, dates of birth or
432
day of the week, and any procedure that is totally transparent before allocation, such as an open list of random numbers. When studies do not report any concealment approach, adequacy should be considered unclear. Examples include merely stating that a list or table was used; only specifying that sealed envelopes were used and reporting an apparently adequate concealment scheme in combination with other information that leads the author to be suspicious [8, 13].
34.3.2 Performance Bias Performance bias refers to systematic differences in the standard or type of care provided to the participants in the control groups other than the intervention under investigation. To protect against that, those providing and receiving care can be “blinded” so that they do not know the group to which the recipients of care have been allocated. Many authors suggested that such blinding is important in protecting against bias. Studies have shown that contamination (provision of the intervention to the control group) and co-intervention (provision of unintended additional care to either comparison group) can affect study results. Furthermore, there is evidence that participants who have foreknowledge of their assignment status report more symptoms (recall bias), leading to skewed results. For these reasons, researchers may want to consider the use of “ blinding” as a criterion for validity [5, 10]. Authors working on topics where blinding is likely to be important may want to develop specific criteria for judging the appropriateness of the method that was used for blinding. In some areas, it may be desirable to use the same criterion across reviews, in which case, a collaborative review group (CRG) might want to agree to a standard approach for assessing blinding.
D. Yakoub et al.
and protocol deviations) are handled, authors should be cautious about implicit accounts of follow-up. The approach to handling lost subjects has great potential for biasing the results, and reporting inadequacies confuse this problem. What is reported, or more frequently implied, in study reports on attrition after allocation, it has not been found to be consistently related to bias. Thus, authors should be cautious about using reported follow-up as a validity criterion, especially when it is only implied rather than explicitly reported. This is a general recommendation, however, and may not apply to certain topic areas that have higher quality reporting or where it is possible to obtain missing raw information from authors.
34.3.4 Detection Bias Detection bias refers to systematic differences between the comparison groups in terms of outcome assessment. Trials that blind the people who will assess outcomes to the intervention allocation should expectedly be less likely to be biased than trials that do not. This is clearly important in studies with subjective outcome measures such as pain. However, at least two empirical studies have failed to correlate blinding of outcome assessment and study results. This may be due to inadequacies in the reporting of these studies. Bias due to the selective reporting of results is somewhat different from bias in outcome assessment. This source of bias may be important in areas where multiple outcome measures are used, such as evaluations of treatments for rheumatoid arthritis. Therefore, authors may want to consider specification of predefined primary outcomes and analyses by the investigators as indicators of validity. Alternatively, selective reporting of particular outcomes could be taken to suggest the need for better reporting and efforts by authors to obtain missing data [3, 9].
34.3.3 Attrition Bias Attrition bias refers to systematic differences between the comparison groups in terms of loss of participants from the study. Other reports referred to it as “exclusion bias”. It is called attrition bias here to prevent confusion with pre-allocation exclusion and inclusion criteria for recruiting subjects. Because of inadequacies in reporting how losses of participants (e.g. withdrawals, drop-outs
34.3.5 Time Lead Bias Time lead bias refers to invalid results when comparing two groups of patients undergoing a more sensitive diagnostic study in different set of years, leading to misleading reporting of longer survival while it was merely higher detection rate of early cases.
34
Methodological Framework for Evaluation and Prevention of Publication Bias in Surgical Studies
34.4 Methods to Detect Publication Bias Assessment of publication bias can be attempted by either graphical or numerical methods (Fig. 34.1). All tests are rather complimentary where each contributes to a general indication of bias in included studies within a systematic review [11, 17]. We will detail some of these methods below.
34.4.1 Graphical Methods
433
like a funnel, with the effect sizes homing in on the true underlying value as n increases… [If there is publication bias] there should be a bite out of the funnel”. The funnel plot has be a rather standard means of detection of publication bias as it visually display the direction of bias, guiding further refinement methods to adjust for it, e.g. trim and fill. Yet it has the disadvantage of being more a subjective measure, as the only way to assess the plot is visually, with no accurate numerical measure of magnitude or causative studies of bias. The causes of asymmetry in funnel plots are presented in Fig. 34.2.
34.4.1.1 Funnel Plot
34.4.1.2 Galbraith Plot
The funnel plot is a scatter plot of the component studies in a meta-analysis, with the treatment effect on the horizontal axis, and a weight, such as the standard error, inverse standard error, p value or sample size, on the vertical axis. Light and Pillemer, originators of the funnel plot, explained, “If all studies come from a single hypothetical study population, this graph should look
Very similar to the funnel plot except that is uses a standardised measure of effect. In addition, the horizontal and vertical axes are switched leading to better means of assessment of heterogeneity due to directionality of the symmetry axis. Galbraith plots are used as the graphical means of displaying regression analysis [12].
Graphical methods: Funnel plot Galbraith plot Ordered forest plot
• • • •
Assessment of publication bias
Normal quantile plot
Numerical methods
Effect assessment methods
Sensitivity analysis methods
Simplified imputation methods • Fail – Safe N • Trim – and – fill
Rank correlation tests • Begg & Mazumdar • Schwarzer
Fig. 34.1 Methods of assessment of publication bias
Regression tests • Egger • Harbord
•
Macaskill
Selection modelling
• •
Weight models Sensitivity analysis approach
434
D. Yakoub et al.
Fig. 34.2 Causes of asymmetry in funnel plots
Causes of asymmentry in funnel plots 5. Chance Publication bias
4. Artefactual 3. True heterogeneity
1. Biases Location biases
Language bias
Citation bias
34.4.1.3 Ordered Forest Plot It is a forest plot where the standard error has been used as an ordering factor. The trends in the effect can therefore be subsequently viewed. It is useful only for general inspection of included studies for possible heterogeneity and bias. Nonetheless, its overall performance and accuracy has not been properly investigated in published literature.
34.4.1.4 Normal Quantile Plot It is a plot where the quantiles of the standard normal distribution are plotted on the x-axis, against the normalised quantiles (Z-scores) of the observed studies on the y-axis. This plotting method has been favoured by many authors as they allow assessment of data normality when a straight alignment of data is noted. When studies included contain inter study heterogeneity, the distribution of studies is plotted in the form of multiple curves or lines.However, distributions as a U or an inverted U-shaped regression lines indicate presence of small study effects.
2. Poor methodological quality of smaller studies
Multiple publication bias
Poor methodological design
Inadequate analysis
Fraud
of these methods is that they cannot overcome effects of study heterogeneity leading to unreliable results concluded [14, 16].
Rank Correlation Tests Begg and Mazumdar Method In this method, it is assumed that presence of bias in included studies will induce a rank correlation between a standardised effect statistic and its variance or standard error. A low p value indicates that the data are unlikely to be compatible with a null hypothesis of no correlation. Subsequently, this could be considered a sign that small studies did not effect the overall estimation of outcomes:
(
q i t * = qi - q
(
)
)
ui * .
-1 where q = å qiui å ui-1 is the usual fixed-effect esti-1 mate of the summary effect, and ui * = ui - å ui-1 is the variance of (qi - q ) . The test is based on deriving Kendall’s rank correlation between q it * and ui * , which is based on comparing the ranks of the two quantities.
(
)
34.4.2 Numerical Methods 34.4.2.1 Effect Assessment Methods The aim of these methods is to test the hypothesis of absence of bias rather than its presence. Results come out as confidence intervals and p values. The drawback
Schwarzer Method This is one of the more recently developed methods for analysis of publication bias. It is similar to Begg and Mazumdar method, yet with a cell count effect statistic and a variance estimate based on a non-central
34
Methodological Framework for Evaluation and Prevention of Publication Bias in Surgical Studies
hyper-geometric distribution. Evaluation of the performance of this method has shown that it appears to have better power than the original rank correlation test; it seems to be more efficient especially with datasets with rare events. Nevertheless, it is more difficult computationally and has not been integrated in any metaanalysis software yet.
Regression Tests Egger Test
435
bias would be unlikely to play a role. The disadvantages of this method are: precision and heterogeneity, tolerance level, why zero effect … A further development of this method was done, the use of which helps specify firstly the mean effect of studies that are presumed missing; secondly, a critical level of the meta-analysis association measure. In this method, the number of extra studies that are necessary to bring the current result to the critical level is calculated. The Trim and Fill Method
In this test, the z-score is regressed on the inverse standard error. In this way, the zero slope hypothesis (no asymmetry) is tested. Low p value indicates absence of small study effects, making the data unlikely to be compatible with a null hypothesis of zero slope. The advantage of this method is that it has a reasonable power, while the disadvantage is that it has a high incidence of false positives (type I error), especially in datasets with large effects [11]. E [ Z i ] = b 0 + b1preci where Zi = qi/Si, preci = 1/Si
Harbord Test This is a form of an adapted Egger test where the hypothesis of zero intercept is tested when z/SQRT (V) is regressed on SQRT (V). It has the same power as Egger’s test but with less type I error rate.
34.4.2.2 Sensitivity Analysis Methods These methods were developed to assess the sensitivity of results to levels of bias that have been regarded as reasonable.
The trim and fill method is an iterative non-parametric method based on the asymmetry of a funnel plot that estimates an adjusted pooled effect. The idea behind the trim and fill method of correcting for publication bias is to fill in the sparse corner of the funnel plot with imputed treatment effect estimates; and then to pool all studies, actual and imputed. If a smaller value favours the treatment, as it does for the log-odds ratio, the trim and fill method imputes studies on the right side of the funnel plot. The key assumption of the trim and fill method is that studies with the most extreme effect sizes are suppressed. Assuming studies with extreme effects on the left-hand side of the funnel plot are suppressed, the method works by estimating k0, the number of asymmetric studies on the right-hand side of the funnel, i.e. the number of studies that have no counterpart in the left-hand side of the funnel plot. The k0 asymmetric studies are then “trimmed” from the righthand side of the funnel and a pooled estimate is calculated using the remaining “symmetrical” studies. The “trimmed” studies and their left-hand side counterparts (i.e. the “missing” studies) are replaced and a pooled estimate is calculated. This process is repeated until estimates of the number of “missing” studies and the pooled effect are stable. Two estimators for k0 are described as preferred by Duval and Tweedie: L0 and R0. They are given by: L0 =
Simplified Imputation Methods Fail-Safe N This method calculates the number of studies with no effects that would make a significant result insignificant. Where that is more than the tolerance level, publication
4S rank - n(n + 1) 2n - 1 R0 = g - 1
where n is the number of studies in the meta-analysis, Srank is the Wilcoxon statistic and g is the length of the right most run of ranks when studies are ranked by their
436
distance to the pooled effect. At each iteration, L0 and R0 are rounded up to the nearest integer and provide estimates of the number of missing studies, k0. In Stata, the “metatrim” command, carries out the trim and fill method, using L0 as the default estimator although R0 can also be specified. Finally, actual and imputed studies are pooled using a random effects model. The “trimmed” set of data is used to re-estimate, and the trimming process continues until no more studies can be “trimmed”. In addition, the observed and imputed studies are used to estimate a standard error for the effect sizes corresponding to that would have been seen had all the studies been identified (i.e. no publication bias). A drawback of this method is that when there is significant between-study heterogeneity, the trim and fill method can under-estimate the true positive effect when there is no publication bias. In contrast, when publication bias is present the trim and fill method can give estimates that are less biased than the usual metaanalysis models [4, 13, 17].
Selection Modelling The selection process is modelled by assigning a weight to the estimated effect from each study. The premise is that the probability a result is selected for inclusion in a meta-analysis depends only on the p value. This assumption is not equivalent to the assumption made by funnel plot methods, that inclusion is based on the magnitude of treatment benefit, unless all studies have the same sample size. Selection methods estimate a pooled effect, adjusted for selection bias. A random effects model is assumed for the study effects. Each study’s contribution to the likelihood is a weighted normal density. There are several selection models in the literature. These can be categorised by (i) the form of the weight function, which may be parametric or non-parametric; (ii) whether maximum likelihood or Bayesian estimation is used and (iii) whether covariates are incorporated in the model. Weight Models These were developed by DerSimonian and Laird in 1986 to address the difficulty in integrating the results from various studies stems with diverse natures [5], both in terms of design and methods employed; where some are carefully controlled randomised experiments
D. Yakoub et al.
while others are less well controlled. Due to the difference in sample sizes and patient populations, each study has a different level of sampling error. Thus one problem in combining studies for integrative purposes is the assignment of weights that reflect the relative “value” of the information provided in a study. A more difficult issue in combining evidence is that one may be using incommensurable studies to answer the same question. It has been emphasised that the need for careful consideration of methods in drawing inferences from heterogeneous but logically related studies. In this setting, the use of a regression analysis to characterise differences in study outcomes may be more appropriate. In the weight model approach, it is assumed that there is a distribution of treatment effects and utilisation of the observed effects from individual studies to estimate this distribution is done. The approach allows for treatment effects to vary across studies and provides an objective method for weighting that can be made progressively more general by incorporating study characteristics into the analysis. We consider the problem of combining information from a series of k comparative clinical trials, where the data consist of the number of patients in treatment and control groups, nT and no, and the proportion of patients with some event in each of the groups, rT and ro. Letting i index the trials, we assume that the numbers of patients with the event in each of the study groups are independent binomial random variables with associated probabilities pTi and pci, i = 1…k. The basic idea of the random effects approach is to parcel out some measure of the observed treatment effect in each study, say yi, into two additive components: the true treatment effect, qi, and the sampling error, ei. The variance of ei is the sample variance, s i2 , and is usually calculated from the data of the ith observed sample. The true treatment effect associated with each trial will be influenced by several factors, including patient characteristics as well as design and execution of the study. To explicitly account for the variation in the true effects, the model assumes: qi = m + d i where qi is the true treatment effect in the ith study, m is the mean effect for a population of possible treatment evaluations, and di is the deviation of the ith study’s effect from the population mean. We regard the trials considered as a sample from this population and use the observed effects to estimate m as well as the population variance [var(d ) = D2]. Here, D2 represents
34
Methodological Framework for Evaluation and Prevention of Publication Bias in Surgical Studies
both the degree to which treatment effects vary across experiments as well as the degree to which individual studies give biased assessments of treatment effects. Sensitivity and Sub-Group Analysis Approach Due to presence of different approaches to conducting a systematic review, sensitivity of the analysis results to changes in the way it was conducted should be done. This provides authors with an approach to testing how robust the results are relative to key decisions and assumptions that were made in the process of preparing the review [18, 20]. Generally, the types of decisions and assumptions that might be examined in sensitivity analyses include: • modifying the inclusion criteria for the types of studies (e.g. using different methodological cut-points), participants, interventions or outcome measures • including or excluding studies where there is some ambiguity as to whether they meet the inclusion criteria • recalculating the results using a reasonable range of included data for studies where there may be some uncertainty about the results • re-doing the analysis after trimming and filling, i.e. imputation of a reasonable range of values for missing data • reanalysing the data using different statistical approaches (e.g. using a random effects model instead of a fixed-effect model or vice versa). Sub-group analyses involve division of the participant data into sub-groups, to make comparisons between them. This approach may be used to investigate heterogeneous results, or to answer specific questions about particular patient groups, types of intervention or types of study. Nevertheless, results from multiple sub-group analyses may be misleading as they are observational and are not based on randomised comparisons. If their findings are presented as definitive conclusions there is clearly a risk of biased healthcare decisions for patients as well as invalid recommendations for future research. Sub-group analysis may result in qualitative interaction (direction of effect is reversed, i.e. if an intervention is beneficial in one sub-group while harmful in another) or quantitative interaction (i.e. size of the effect varies but not the direction). Since different sub-groups are likely to contain different amounts of information and thus have different abilities to detect effects, therefore,
437
it is extremely misleading simply to compare the statistical significance of the results. A simple approach for a significance test that can be used to investigate differences between two or more subgroups is described by Deeks et al. [4]; it is based on the test for heterogeneity chi-squared statistics that appear in the bottom left-hand corner of the forest plots, and proceeds as follows. Suppose a chi-squared heterogeneity statistic, Qall, is available for all of the trials, and that chi-squared heterogeneity statistics Q1 up to Qm are available for m sub-groups (such that every trial is in one and only one sub-group). Then the new statistic Qint = Qall − (Q1 + ... + Qm), compared with a chi-squared distribution with m − 1 degrees of freedom, tests for a difference among the sub-groups. A more flexible alternative to testing for differences between sub-groups is to use meta-regression techniques, in which residual heterogeneity (i.e. heterogeneity not explained by the sub-grouping) is allowed [18, 20]. Meta-regression is an extension to sub-group analyses that allows the effect of multiple continuous, as well as categorical, characteristics to be investigated simultaneously (although this is rarely possible due to inadequate numbers of trials). Meta-regression should generally not be considered when there are fewer than 10 trials in a meta-analysis. In meta-regression, the outcome variable is the effect estimate (e.g. a mean difference, a risk difference and a log-odds ratio or a log risk ratio). The explanatory variables are characteristics of studies that might influence the size of treatment effect. These are often called “potential effect modifiers” or covariates. One feature of meta-regressions is that the larger studies have more influence on the relationship than smaller studies, since studies are weighted by the precision of their respective effect estimate. Second, it is wise to allow for the residual heterogeneity among treatment effects not modelled by the explanatory variables. This gives rise to the term “random effects meta-regression”, since the extra variability is incorporated in the same way as in a random effects meta-analysis. The regression coefficient obtained from a meta-regression analysis will describe how the outcome variable (the treatment effect) changes with a unit increase in the explanatory variable (the potential effect modifier). The statistical significance of the regression coefficient is a test of whether there is a linear relationship between treatment effect and the explanatory variable. If the treatment effect is a ratio measure, the log-transformed value of the treatment effect should always be used in the regression model.
438
Meta-regression can also be used to investigate differences for categorical explanatory variables as done in sub-group analyses. If there are m sub-groups membership of particular sub-groups is indicated by using m − 1 dummy variables (which can only take values of 0 or 1) in the meta-regression model (as in standard linear regression modelling). The regression coefficients will estimate how the treatment effect in each sub-group differs from a nominated reference sub-group. The p value of each regression coefficient will indicate whether this difference is statistically significant. The same cautions discussed for sub-group analyses apply to sensitivity analyses. In particular, since many sensitivity analyses involve between study subgroup comparisons, these findings need to be interpreted very carefully [3, 17]. If the sensitivity analyses that are done do not materially change the results, it strengthens the confidence that can be placed in these results. If the results do change in a way that might lead to different conclusions, this indicates a need for greater caution in interpreting the results and drawing conclusions. Such differences might also enable authors to clarify the source of existing controversies about the effectiveness of an intervention, or lead them to hypothesise potentially important factors that might be related to the effectiveness of the intervention and warrant further investigation.
34.5 Prevention of Publication Bias In addition to previously mentioned cautions during study design, recruitment and conduction, three main methods have been suggested as a means of pre-emption of publication bias; these include trial registries and prospective meta-analyses and searching of grey literature.
34.5.1 Trial Registries These are accessible online databases of trials which have been registered prior to availability of their results or even prior to their commencement [1, 13, 15]. Published reports have provided evidence of the benefit of trial registration in terms of serving the aim of creating a framework capable of facilitating communication among researchers and consumers alike, i.e. both healthcare providers and patients can access
D. Yakoub et al.
data on clinical trials and experimental therapies that are being conducted regarding a particular health condition. On the one hand, this may stimulate collaborative research through joining effort of researchers working on the same subject; and on the other hand, it can incite physicians to refer their patients and patients to enrol themselves in a particular trial. Subsequently, this will help first to prevent duplication of trials which proved inefficiency of a health intervention, and secondly help formulate an unbiased sampling and recruitment frame leading to more efficient use of research funding. A final benefit of trial registration is providing the ability of conducting research on the research process itself in terms of recruitment, running of trials and tracking the timeline and methodology during conduction of the trial. Registration of trials has been proposed to best be done once the trial has been granted the necessary ethical committee approval. There are currently a large number of registries, links of which can be found on web sites as TrialsCentral (http://www.trialscentral.org) and also on current controlled trials meta-register (http://www.controlled-trials. com). In spite of the availability of these registries, there has been no complete listing of all trials being conducted in a single source; therefore researchers would still have to search multiple sources with possibility of missing unsearched or unregistered trials. Nevertheless, as results of these trials have not been reported yet, and as registries are as complete a source as currently available, using studies collected from registries still serves to minimise publication bias in systematic reviews to the best standard identified at the time.
34.5.2 Prospective Meta-Analyses These are systematic reviews planned to be conducted in the future on studies, usually randomised controlled trials, which are still running, wherein these studies have been identified, evaluated and determined to be eligible as candidates for the meta-analysis before the results of any of those trials become known [1, 3, 5, 6, 13, 15]. The advantage of prospective meta-analysis (PMA) over traditional retrospective ones is that hypotheses for the review can be specified beforehand, while ignorant of the results of the trials. In addition, it helps prospective application of selection criteria and analysis plans before being biased by trial results. Prospective
34
Methodological Framework for Evaluation and Prevention of Publication Bias in Surgical Studies
meta-analyses use individual level data collected as opposed to group data reported in conventional published trials, which in nature allows an analysis based on all randomised patients and as well exploration of sub-groups. PMA is distinguished from other forms of prospective data collection, e.g. multi-centre trials by the presence of autonomy in involved sites and the lack of requirement for identical protocols across data collection sites, which in turn validates collected data as they represent a realistic reflection of patient population being studied.
34.5.3 Searching the Grey Literature Grey literature is defined as research publications produced on all levels of government, academia, business and industry in print and electronic formats but which is not controlled by commercial publishers [6, 16]. Thus, grey literature includes those studies which could be overlooked in the initial search process for studies to include in a meta-analysis, these studies may be overlooked due to language of publication, duplicated studies, inaccessible studies to the researcher, unpublished trials, studies that were not sufficiently cited by other authors …, etc. Evidence of systematic difference between results of published studies and those found in grey literature has been well reported; therefore, validity of metaanalyses based on studies from published accessible studies only may be undermined by publication bias. Comprehensive search in grey literature helps minimise this bias. Methods to search grey literature include: • searching the Cochrane Register of Controlled Trials and The Cambell Collaboration Social, Psychological, Educational and Criminological Trials Register (C2-SPECTR and C-PROT) • searching research registries, e.g. BiomedCentral, ClinicalTrials.gov, TrialsCentral …, etc. • searching electronic databases, e.g. MEDLINE, EMBASE, Google Scholar …, etc. • hand searching library journals • dearching conference proceedings • contact with researchers for retrieval of raw and/or missing data • searching the Internet.
439
Study and inclusion of relevant data from these sources comprise a strong tool of validation of source data used in meta-analyses, thus ensuring the veracity and applicability of statements in these meta-analyses to help guide sound healthcare decision making.
34.6 Conclusions • Blinding and randomisation should be done whenever possible with the most strict criteria to ensure high standard and validity of individual study results. • Although there is a burgeoning literature on the statistical methods of meta-analysis, less has been published on the practical methods of carrying out such projects. • In meta-analyses, comparison should be like with like bearing in mind that not all studies are of equal importance. • Carry out detailed data checking and ensure the quality of randomisation and follow-up. • Sources of heterogeneity and publication bias in a meta-analysis should always be investigated. • Undertake sub-group analyses for important hypotheses about differences in effect. • There are multiple graphical and statistical means of testing for publication bias, each with merits and drawbacks and should therefore be used in a complementary fashion.
References 1. Antes G, Chalmers I (2003) Under-reporting of clinical trials is unethical. Lancet 361:978–979 2. Chalmers I (1990) Underreporting research is scientific misconduct. JAMA 263:1405–1408 3. Deeks JJ (1998) Systematic reviews of published evidence: miracles or minefields? Ann Oncol 9:703–709 4. Deeks JJ, Altman DG, Bradburn MJ (2001) Statistical methods for examining heterogeneity and combining results from several studies in meta-analysis. In: Egger M, Davey Smith G, Altman DG (eds) Systematic reviews in health care: metaanalysis in context. BMJ Publication Group, London 5. DerSimonian R, Laird N (1986) Meta-analysis in clinical trials. Control Clin Trials 7:177–188 6. Detsky AS, Naylor CD, O’Rourke K et al (1992) Incorporating variations in the quality of individual randomized trials into meta-analysis. J Clin Epidemiol 45:255–265
440 7. Dickersin K (1990) The existence of publication bias and risk factors for its occurrence. JAMA 263:1385–1389 8. Dickersin K, Recognizing the problem, understanding its origins and scope, and preventing harm. In: Rothstein HR, Sutton AJ, Borenstein M (eds) (2005) Publication bias in meta-analysis: prevention, assessment and adjustments. Wiley, Chichester 9. Dickersin K, Min YI, Meinert CL (1992) Factors influencing publication of research results. Follow-up of applications submitted to two institutional review boards. JAMA 267:374–378 10. Easterbrook PJ, Berlin JA, Gopalan R et al (1991) Publication bias in clinical research. Lancet 337:867–872 11. Egger M, Smith GD (1995) Misleading meta-analysis. BMJ 311:753–754 12. Galbraith RF (1988) A note on graphical presentation of estimated odds ratios from several clinical trials. Stat Med 7:889–894 13. Hall R, de Antueno C, Webber A (2007) Publication bias in the medical literature: a review by a Canadian Research Ethics Board. Can J Anaesth 54:380–388
D. Yakoub et al. 14. Ioannidis JP, Trikalinos TA (2007) The appropriateness of asymmetry tests for publication bias in meta-analyses: a large survey. CMAJ 176:1091–1096 15. Moher D, Pham B, Jones A et al (1998) Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet 352:609–613 16. Sterne JA, Gavaghan D, Egger M (2000) Publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature. J Clin Epidemiol 53:1119–1129 17. Sutton AJ, Duval SJ, Tweedie RL et al (2000) Empirical assessment of effect of publication bias on meta-analyses. BMJ 320:1574–1577 18. Thompson SG, Sharp SJ (1999) Explaining heterogeneity in meta-analysis: a comparison of methods. Stat Med 18:2693–2708 19. Withering W (1785) An account of the foxglove and some of its medical uses: with practical remarks on dropsy and other diseases. G. G. J. and J. Robinson, London 20. Yusuf S, Wittes J, Probstfield J et al (1991) Analysis and interpretation of treatment effects in subgroups of patients in randomized clinical trials. JAMA 266:93–98
Graphs in Statistical Analysis
35
Akram R. G. Hanna, Christopher Rao, and Thanos Athanasiou
Contents
35.7.3 Confidence Interval for the Proportion..................... 461 35.7.4 Bootstrapping ........................................................... 461
35.1
35.8
Introduction ............................................................ 442
35.1.1 Commonly Used Statistical Terms ........................... 442
Hypothesis Testing.................................................. 462
35.2.1 Types of Data ........................................................... 442 35.2.2 Data-Handling Techniques ....................................... 443 35.2.3 Checking for Errors .................................................. 444
35.8.1 Step 1: Defining the Null and Alternative Hypotheses ............................................................... 35.8.2 Step 2: Sampling the Data ........................................ 35.8.3 Step 3 & 4: Data Analysis ........................................ 35.8.4 Step 5: Interpreting the Results ................................ 35.8.5 Types of Statistical Tests ..........................................
35.3
35.9
Investigating the Relationship Between Two Variables ......................................................... 468
35.9.1 35.9.2 35.9.3 35.9.4 35.9.5
Correlation................................................................ Spearman’s Rank Correlation Coefficient ................ Univariate or Simple Linear Regression .................. Generalised Linear Models ...................................... Miscellaneous Problems in Regression Modelling ..........................................
35.2
Categorising Data and Data Handling ................. 442
Describing and Summarising the Data ................ 444
35.3.1 Types of Average ...................................................... 445 35.3.2 Measures of Spread .................................................. 446 35.3.3 The “Shape” of the Data .......................................... 448 35.4
Displaying Data Graphically ................................. 449
35.4.1 Categorical Data ....................................................... 450 35.4.2 Continuous Data ....................................................... 450 35.4.3 Multiple Variables .................................................... 450 35.5
Probability Distributions ....................................... 451
35.5.1 35.5.2 35.5.3 35.5.4
An Introduction to Probability ................................. Probability Distributions .......................................... Discrete Probability Distributions ............................ Continuous Distributions..........................................
35.6
Transformations ..................................................... 458
35.7
Confidence Intervals .............................................. 460
462 462 463 463 464
468 470 470 474 474
35.10 Summary ................................................................. 475 References ........................................................................... 475
451 453 454 454
35.7.1 Confidence Intervals for the Mean Using the Normal Distribution ................................. 460 35.7.2 Confidence Intervals for the Mean Using Student’s t-Distribution.................................. 461
A. R. G. Hanna () The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
Abstract This chapter is a brief introduction to the applications, strengths and weaknesses of the statistical methods commonly applied in surgical research. In this chapter, the terms and expressions commonly used in medical statistics are defined. The definition of data and methods used in data entry are discussed. Graphical methods for displaying different types of data are also described. The concepts of transformation, sampling and confidence intervals are discussed. Statistical tests commonly used in surgical research are discussed, including correlation, regression and hypothesis testing. Finally, other issues encountered in statistics, including interaction, confounding, jack-knifing and co-linearity, are described.
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_35, © Springer-Verlag Berlin Heidelberg 2010
441
442
35.1 Introduction Statistics is concerned with collecting, summarising, presenting and interpreting data [10] and arguably constitutes one of the cornerstones of modern surgical research. Using statistical tools in surgical research to organise information, determine normal values, estimate the magnitude of associations and test hypotheses has become increasingly prevalent in recent years [6]. Furthermore, there is evidence that the complexity of statistical tests used in the surgical literature is also increasing [6]. Many surgeons, however, struggle with common statistical concepts and techniques, which manifest in the poor quality of statistical analysis in the surgical literature [11, 14]. Unfamiliarity with common statistical techniques may also affect the ability of surgeons to translate academic research into clinical practice [11]. This chapter aims to demystify statistics for the surgeon and enables the surgeon to read and critically appraise the academic literature competently. Although it is not intended that this chapter will be a comprehensive text on medical statistics, it is hoped that it will give surgeons a foundation to engage in research, to innovate and challenge current practice. Often, surgeons struggle with statistical concepts because they are unfamiliar statistical nomenclature and notation. In the following section, we explain the meanings of the commonly used statistical terms. In subsequent sections, we discuss Data Handling, Graphical Tools, Distributions and Statistical Tests in more detail.
35.1.1 Commonly Used Statistical Terms • Population is a group of individuals or variables that must be studied to answer a research question. A clear definition of the study population is crucial for the quality of the completed research. The study population, however, is often hard to define. • A sample is a small number of individuals or variables representing a population, picked and studied by the researcher, to overcome the lack of time and resources that would be required to study the whole study population. • Sample selection or sampling should ensure that the sample characteristics are representative of the population characteristics. There are two types of sampling: random, which aims to give everyone in
A. R. G. Hanna et al.
the population an equal chance of being selected, and non-random, which does not give everyone in the population an equal opportunity to participate in the study but instead systematically selects participants who are representatives of the study population. • A statistic is a number that summarises information sampled from a population, for example, a percentage, an average or a percentile. Statistics are always based on information from samples, not from entire populations. • Misleading results can be caused by systematic favouritism or bias in the sampling or measurement process. • Variables are quantities that vary from one individual to another (e.g., blood pressure). In contrast, parameters do not relate to actual measurements or attributes, but quantities that define a theoretical model.
35.2 Categorising Data and Data Handling The actual measurements or observations made during a study are called data. In this section, we describe different types of data and methods for handling data.
35.2.1 Types of Data Data can be classified as either “Qualitative” Categorical Data (e.g., gender) or “Quantitative” Numerical Data (e.g., number of admissions or blood pressure). Numerical data can be further categorised as Discrete Data, which can take a finite number of values within a range (e.g., number of admissions), or Continuous Data, which can take an infinite number of values within a range (e.g., blood pressure). Descriptive numbers obtained from arithmetical manipulation of primary observational data are called derived data. For example, population figures are primary data, but the population-per-square-mile is a derived quantity. Types of derived data include: • A ratio (or quotient), which is a fraction that expresses the relative size of two numbers. For example, if for every 3 women there are 5 men, the ratio of women to men would be 3–5. Whether there are 3,000 women and 5,000 men, 750 women and 1,250 men or 150 women and 250 men, the ratio
35
•
•
• •
•
Graphs in Statistical Analysis
will still be 3–5, as ratios are always expressed in the lowest, simplest terms. The ratio of “3–5” can also be expressed as “3:5” or “3/5”. In this example, number 3 is called the numerator, and the number 5 is called the denominator. A proportion is a type of ratio in which the numerator is part of the denominator. For example, if there are 3,000 women and 5,000 men, the proportion of women would be the number of women divided by the number of women added to the number of men (=3,000/ [3,000 + 5,000] = 3,000/8,000 = 3/8 = 0.375). In probability, the odds in favour of an event are the quantity, p/(1–p), where p is the probability of the event occurring. The odds against the same event are (1–p)/p. A percentage is a type of proportion that is multiplied by 100, to give a number between 0 and 100. A rate is a type of ratio that reflects the occurrence of events over a specific unit of time. Rates are commonly used in longitudinal studies where a long, follow-up period may be required because a comparatively rare event is being investigated. Some of the subjects may be lost to follow-up, and others might start the study at a later date because of the long follow-up period. Consequently, follow-up times may vary at the end of the study. As those with a longer follow-up time are more likely to experience the event than those with shorter followup, we consider the rate at which the event occurs per period of time. Each individual’s length of follow up is usually defined as the time from when they enter the study until the time when the event occurs or the study draws to a conclusion. The total follow-up time is the sum of all the individuals’ follow-up times. The rate is referred to as an incidence rate when the event is a patient’s first presentation with a disease and as a mortality rate when the event is death. When the rate is very small, it is often multiplied by a convenience factor such as 1,000 and re-expressed as the rate per 1,000 personyears of follow-up. When calculating the rate, no distinction is made between person-years of followup that occur in the same individual and those that occur in different individuals. The risk of an event is the total number of events divided by the number of individuals included in the study at the start of the investigation, with no allowance for the length of follow-up. Consequently, the risk of the event will be greater when individuals are followed for longer, since they will have more
443
opportunity to experience the event. In contrast, the rate of the event should remain relatively stable in these circumstances, as the rate takes account of the duration of follow-up. • The relative rates (also referred to as the rate ratio or incidence rate ratio) is used when comparing the rate of disease in a group of individuals exposed to a factor of interest (Rate exposed) to the rate in a group of individuals not exposed to the factor (Rate unexposed). It is calculated as follows: Relative rate = Rate exposed / Rate unexposed . The relative risk and odds ratios are calculated and interpreted in a similar way to the relative rate. A relative rate of 1 indicates that the rate of disease is the same in the two groups, a relative rate greater than 1 indicates that the rate is higher in those exposed to the factor than in those who are unexposed and a relative rate less than 1 indicates that the rate is lower in the group exposed to the factor. The relative rate and the relative risk are often thought of as analogous; however, the relative rate and the relative risk will only be similar if the event (e.g., disease) is rare. When the event is not rare and individuals are followed for varying lengths of time, the rate, and therefore the relative rate, will not be affected by the different followup times. This is not the case for the relative risk as the risk, and thus the relative risk, will change as individuals are followed for longer periods. Hence, the relative rate is always preferred when follow-up times vary between individuals in the study. The odds ratio is also often misinterpreted as being analogous to the relative risk; however, it will only give a similar value if the outcome is rare. Although the odds ratio is less easily interpreted than the relative risk, it is used often in case–control studies when the relative risk cannot be estimated directly [1].
35.2.2 Data-Handling Techniques 35.2.2.1 Statistical and Data-Handling Software Packages The widespread availability of statistical software packages has significantly improved the accuracy and speed of data collection, storage and analysis. Furthermore, statistical software packages also make checking errors easier.
444
Entering data directly into statistical software packages can be laborious and prone to error. It is often easier to enter and store data using spreadsheet or database software packages; however, these packages have limited statistical function. Alternatively, collected data can be entered into spreadsheet and database packages and then transferred to statistical packages for analysis by using several file formats.
35.2.2.2 Data Coding The allocation of numerical values to categorical data is called data coding. This makes counting, tabulating and analysing the data easier. Coded categorical data can be classified as: • Single-coded variables in which data can only belong to one of two categories, for example, male or female, positive or negative and dead or alive. This makes the assignment of binary codes possible. • Multi-coded variables in which data can belong to one of several categories. This can be handled in two ways; for example, each symptom a patient experiences can be given a different numerical code. Alternatively, the presence or absence of each symptom can be assigned a binary code.
35.2.3 Checking for Errors
A. R. G. Hanna et al.
birth should correspond to their age or the date of an event should be after the date of entry into the study. • Missing data may also suggest that whole or part of the data set is unreliable. It can occur either because of errors in collecting or entering the data. Data can be analysed with missing data simply excluded. This is problematic; however, as missing data may be systematically missing, so biasing the results of analysis. For example, patients who are unhappy with their treatment may not attend follow-up, resulting in missing data. If these data are excluded from the analysis, then the effectiveness of the treatment will be overestimated. • Outliers are distinct and potentially incompatible observations from the main body of the data. These values may be genuine observations or due to some typing error; for example, the incorrect choice of units or placement of the decimal point. These values should be checked before including or excluding them from the analysis, as they may have serious effect on the final results. If outliers are genuine observations, then analysis can simply be repeated with and without inclusion of the value. If the results are similar, then the outlier does not have a great influence on the result. If the results change significantly, however, it is important to use appropriate methods to analyse the data, such as transformations and nonparametric tests. Outliers are most easily detected using graphical methods, such as histograms and scatter plots (Sect. 35.4).
Several techniques can be used to detect errors in data sets. • Double entry is a method that can be used to check for typing errors that are common source of errors when entering data. These can be detected by entering the data twice, and checking for a difference between the two data. Although this method is timeconsuming, there is no guarantee that the same error might not be made twice. • Impossible values are easily detected and can be used to assess the reliability of data sets. As categorical data can only take a limited number of values, the presence of other values indicates that part or whole of a data set is unreliable. Similarly, the presence of impossible dates such as the 32/13/2031 also indicates that part or whole of the data set is unreliable. • Logical checks can also be applied to assess the reliability of a data set. For example, a patient’s date of
35.3 Describing and Summarising the Data Any data can be described in terms of three key characteristics: the “Central Tendency” of the data, how “Spread-Out” the data are, and the “Shape” of the data, or how evenly distributed the data are between the highest and lowest values. Summary Statistics can be used to communicate this information numerically, for example, • Types of “Average” such as the mean, median or mode, which convey information about central tendency. • Measures of the “Statistical Dispersion” such as the standard deviation, variance or range, which convey information about the spread of the data. • Measures of “Shape” such as the skewness and kurtosis.
35
Graphs in Statistical Analysis
In the following section, we describe the descriptive terms and summary statistics most commonly encountered in the medical literature.
35.3.1 Types of Average 35.3.1.1 The Arithmetic Mean When the terms “Mean” or “Average” are used in the medical literature without qualification, they usually refer to the arithmetic mean. The arithmetic mean is calculated by adding all the observed values and then dividing the sum by the number of observations in the set. In conventional statistic notation, a generalised set is said to contain n observations. Each observation is numbered from 1 to n as follows: x1, x2, x3, …, xn. If, for example, x represents patient height (cm), x1 will represent the height of the first individual, x2 will represent the height of the second individual, and xi the height of the ith individual. Using this notation, the arithmetic mean can be written as x =
1 n i xi = (x1 + ... + x n ), å n i =1 n
445 Table 35.1 Statistical Notation Population
Sample
Mean
m
X
Standard deviation
s
Variance
s
Proportion of sample elements
P
P
Correlation coefficient
r
r
Number of elements
N
N
S 2
s2
data set by taking the logarithm of each value. To obtain a summary measure that has the same units as the original observations, we have to take the exponential of the arithmetic mean of the log values. We call this the geometric mean. In conventional statistical notation, it is expressed as follows: 1n
æ n ö ç Õ xi ÷ è i =1 ø
é1 n ù = n x1 . x 2 .... x n = exp ê å ln xi ú n ë i =1 û
where P (pronounced pi) indicates that we should take the product of all values of x from i = 1 to n, and exp is the exponential function.
35.3.1.3 The Harmonic Mean where S (pronounced sigma) means “the sum of”, and the sub- and superscripts on the S indicate that we sum the values from i = 1 to n. The sample arithmetic mean is usually written as x- (pronounced x-bar) while m (pronounced mu) usually describes the population mean. This is consistent with the convention in statistical notation where quantities relating to a sample are represented with lowercase roman letters and quantities relating to a population are represented with capital Roman letters or lowercase Greek letters (Table 35.1).
The harmonic mean (sometimes called the subcontrary mean) is rarely used in the medical literature, but is included in this section for completeness. It is the most appropriate mean when information on the central tendency of a rate is required. The harmonic mean of a sample is given by: n 1 1 ... 1 + + + x1 x 2 xn
=
n n
1 å x i =1 i
.
35.3.1.4 The Weighted Mean 35.3.1.2 The Geometric Mean The arithmetic mean is an inappropriate measure of location if the data are heavily skewed (see Sect. 35.3.3), as extreme values will result in the arithmetic mean being unrepresentative of the central tendency of most of the data. If the data set is skewed to the right, we can transform the data into a more symmetrical
The weighted mean is used when certain values in a data set are more important than others. For example, to determine the average time patients wait in an emergency medicine department nationally, the average time from each department needs to be weighted according to the number of patients who attend each department; otherwise smaller departments will have a
446
A. R. G. Hanna et al.
disproportionately large effect on the outcome of the study. Using standard statistical notation, the weighted mean of a sample is expressed as follows: x=
w1 x1 + w2 x 2 + ... + wn x n = w1 + w2 + ... + wn
åw x åw i
i
i
where the values x1, x2, x3, …, xn have corresponding weights w1, w2, w3, …, wn.
35.3.1.5 The Median The median divides the ordered values in a data set into two halves, with an equal number of values both above and below it. If the number of observations is odd, then the median is the (n + 1)/2th observation in the ordered data set. For example, if n = 11, then the median is the (11 + 1)/2 = 12/2 = 6th observation in the ordered set. If the number of observations is even, then logically, there is no median; however, in this case, we usually calculate it as the arithmetic mean of the two middle observations in the ordered set (i.e., the n/2th and the (n/2 + 1)th). For example, if n = 40, the median is the arithmetic mean of the 40/2 = 20th and the (40/2 + 1) = (20 + 1) = 21st observations in the ordered set. If the data set is symmetrically distributed, the median and the arithmetic mean are the same (Fig. 35.1a); however, the median will be less than the arithmetic mean if the data set is skewed to the right (Fig. 35.1b), and greater than the arithmetic mean if the data set is skewed to the left (Fig. 35.1c) (see Sect. 35.3.3 for a more detailed explanation of skewness). In a considerably skewed data set, the median may be a better measure of the central tendency than the arithmetic mean and is consequently often reported in the literature, especially in relation to survival data.
35.3.1.6 The Mode The mode is the most commonly observed value in a data set. If the data are continuous, the data can be grouped, and the modal group can be reported. Some data sets are said to have no mode if each value occurs with the same frequency; conversely, some data sets have more than one mode if more than one distinct value is observed more frequently.
35.3.2 Measures of Spread 35.3.2.1 The Range The range is the difference between the largest and smallest observations in the ordered data set. The range is easily calculated; however, it uses only two observations, and outliers may distort the range, making the data appear more “spread out” than it actually is. To overcome this limitation, the ordered data set is divided into percentiles. For example, the fifth percentile contains the lowest 5% of the data set, whereas the 95th percentile contains all of the data set, except the largest 5%. The median is the 50th percentile. The “inter-percentile” range or the difference between two percentiles is then reported instead of the range. The most commonly reported “inter-percentile” ranges in the medical literature are the interquartile range (the difference between the 75th and 25th centile), the interdecile range (the difference between the 90th and the 10th centile) and the reference range, reference interval or normal range (the difference between the 97.5th and 27.5th centile), which contains 95% of the data. The advantages of interpercentile ranges are that, unlike the standard range, they are independent of sample size, appropriate for skewed data and are unaffected by outliers. Interpercentile ranges, however, are laborious to calculate and are inappropriate for small sample sizes. They also rely on only two pieces of data like the standard range.
35.3.2.2 The Variance An alternative measure of the data spread is the extent to which each observation deviates from the arithmetic mean. This is called the variance. If the typical deviation from the arithmetic mean is larger, then the variability or spread of the data set will be greater. To calculate the variance, the arithmetic mean is taken of the squared difference between each observation and the arithmetic mean of the data set. The squared difference is used rather than the difference to ensure that the negative differences do not cancel out the positive differences. Consequently, the units of the variance are the square of the units of the original observations. For example, if the observed variable is weight
Graphs in Statistical Analysis
Fig. 35.1 Probability distributions, showing the relationship of the arithmetic mean, median and skewness of the data
447
a Probablity Density
0.2
Arithmetic Mean
Probablity Density
35
Median
0.15
0.1
0.05
0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1 X
b 0.2
Probablity Density Arithmetic Mean
Probablity Density
Median 0.15
0.1
0.05
0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1 X
c Probablity Density
0.2
Arithmetic Mean
Probablity Density
Median 0.15
0.1
0.05
0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1 X
measured in kilogram, the units of the variance are kg2. The advantage of the variance is that it is algebraically defined and uses every observation in the data set; however, it is sensitive to outliers and consequently inappropriate for skewed data. Furthermore, as the units of measurement are the square of the units of the raw data,
it can be difficult to interpret. In conventional statistical notation, the variance is expressed as follows:
v=s
2
å (x =
2
i
-x)
n -1
.
448
A. R. G. Hanna et al.
35.3.2.3 The Standard Deviation The standard deviation represents the “average” deviation of the observations from the observed arithmetic mean. It is calculated by taking the square root of the variance and, consequently, has similar strengths and weaknesses; however, it uses the same units and scale as the raw data. In a sample of n observations, it is conventionally expressed as follows:
s=
å (x
2
i
-x)
.
n -1
or simply the standard error. If we know the population standard deviation, then the SEM is given by: SDx =
35.3.2.4 The Standard Error If repeated samples of the same size are taken from a population to estimate a population parameter, it is unlikely that estimates of the parameter would be the same in each sample. By quantifying the variability of these estimates, information on the precision of the estimate can be obtained. If the parameter of interest is the population mean, several repeated samples could be taken from the population, and the mean could be calculated in each sample. The mean of the estimates corresponds to the population mean. The variability of the distribution is measured by the standard deviation of the estimates; this is known as the standard error of the mean (SEM),
n
.
In practice, however, there is usually only one sample available, and so the best estimate of the population mean is the sample mean. Furthermore, the population standard deviation is rarely known, and so the standard error is usually estimated using the sample standard deviation: SE x =
When the standard deviation is expressed as a percentage of the arithmetic mean, it is called the coefficient of variation. The coefficient of variation is a measure of spread that is independent of the units of measurement.
σ
s n
.
The standard deviation and the standard error appear similar; however, the standard deviation describes the variability of the data, whereas the standard error describes the precision of the sample mean. Consequently, the standard error should be used when estimates of the mean are of primary interest [4].
35.3.3 The “Shape” of the Data The choice of the most appropriate statistical method, for example, the most appropriate measure of central tendency, will often depend on the shape of the distribution. The distribution of the data is usually unimodal in that it has a single “peak” (Fig. 35.1); however, it can have more than one peak such as in the bimodal distribution in Fig. 35.2. If sampled data are bimodal, the sample size may be inappropriately small or may
0.3
Probability Density
0.25
0.2
0.15
0.1
0.05
Fig. 35.2 Bimodal Probability Distribution
0 0
0.1
0.2
0.3
X
35
Graphs in Statistical Analysis
449
not have been sampled from a single heterogeneous population. The shape of a unimodal distribution can be described in terms of its asymmetry or skewness and its “peakedness” or kurtosis.
35.3.3.1 The Skew of the Data Consider the distributions in Fig. 35.1. The sample shown in Fig. 35.1a is normally distributed. The distribution tapers symmetrically to the x-axis on both sides. These tapered ends of the distribution are called “tails”. If the distribution is symmetrical, then the arithmetic mean is equal to the median. In an asymmetrical or skewed distribution, the mean is located farther along the long tail than is the median. Whether a distribution is positively or negatively skewed can be determined using the tails. • In a positively skewed distribution (Fig. 35.1b), the right tail is longer. The mass of the distribution is concentrated on the left of the figure. It has a few relatively high values. The distribution is said to be right-skewed. In such a distribution, the mean is greater than the median. If the skew is quantified using a skewness coefficient, then the coefficient would be greater than zero. • In a negatively skewed distribution (Fig. 35.1c), the left tail is longer. The mass of the distribution is concentrated on the right of the figure. It has a few relatively low values. The distribution is said to be left-skewed. In such a distribution, the mean is lower than the median. The skewness coefficient would be less than zero.
35.3.3.2 The Kurtosis of the Data The kurtosis is a measure of the “peakedness” of the probability distribution. Higher kurtosis means more of the variance is due to infrequent extreme deviations, such as the blue distribution in Fig. 35.3, as opposed to frequent modestly sized deviations in the red distribution in Fig. 35.3. If the kurtosis is quantified using a kurtosis coefficient, normally distributed data have a kurtosis coefficient of 0, more “peaked” data have a positive kurtosis coefficient, whereas “flatter” distributions have a negative kurtosis coefficient.
35.4 Displaying Data Graphically Frequency distributions are powerful diagrams for conveying information about the data. They relate each possible observation, range or category of observation to how frequently it occurs. They can be produced for categorical, continuous and discrete numerical data. If the frequency is replaced with a relative frequency or percentage frequency, the frequency distributions of two or more sets of observations can be compared. Outliers and trends are often easily identified when information is presented in a visual format. This may facilitate more appropriate data analysis and interpretation. The method used to display the data depends on the number of data sets, the number of variables and the type of data. In the following section, we describe some of the methods most commonly encountered in the literature [10].
0.45 0.4
Probability Density
0.35 0.3 0.25 0.2 0.15 0.1 0.05
Fig. 35.3 Probability distributions with different Kurtosis
0 0.5
0.6
0.7
0.8
0.9
1
1.1 X
450
A. R. G. Hanna et al.
Fig. 35.4 A bar chart
90 80 70 60 50 40 30 20 10 0 a
35.4.1 Categorical Data • In a bar chart, a separate horizontal or vertical bar is drawn for each category, its length being proportional to the frequency in that category. The bars are separated by small gaps to indicate that the data are not continuous (Fig. 35.4). • In a pie chart, the “pie” is spilt into sections, one for each category, so that the area of each section is proportional to the frequency in that category (Fig. 35.5).
35.4.2 Continuous Data • A histogram is similar to a bar chart; however, there are no gaps between the bars to indicate that the
e 20%
a 20%
d 24%
b 26%
c 10%
Fig. 35.5 A pie chart
b
c
d
e
data set is continuous. The width of each bar of the histogram relates to a range of values for the variable, and these should be labelled clearly. The area of the bar is proportional to the frequency in that range. Therefore, if one of the groups covers a wider range than the others, its base will be wider and height shorter to compensate (Fig. 35.6). • The stem-and-leaf plot is similar to a histogram: a mixture of a diagram and a table; it looks similar to a histogram turned on its side and is effectively the data values written in increasing order of size. It is usually drawn with a vertical “stem”, consisting of the first digits of each observation, arranged in order. Protruding from this stem are “leaves”, or the final digit of each observation written horizontally in numerical order (Fig. 35.7). • In the dot plot, each observation is represented by one dot on a horizontal or vertical line. Often a summary measure of the data, such as the mean or median, is shown on the diagram. It is simple but impractical for large data sets (Fig. 35.8). • The box or box-and-whisker plot consists of a vertical or horizontal rectangle, with the ends of the rectangle corresponding to the upper and lower quartiles of the data values. A line drawn through the rectangle corresponds to the median value. Whiskers, starting at the ends of the rectangle, usually indicate minimum and maximum values but sometimes relate to particular percentiles, e.g., the 5th and 95th percentiles (Fig. 35.9).
35.4.3 Multiple Variables If more than one variable is categorical, then separate diagrams showing the distribution of the second variable can be drawn for each of the categories. Other
35
Graphs in Statistical Analysis
451
0.14
0.12
Probability Density
0.1
0.08
0.06
0.04
0.02
0 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
Fig. 35.6 A histogram 45
0
1.0
0
149
1.1
8
1.2
0
1289
1.3
38
026799
1.4
148
40
35
1.5
001669
1.6
12236889
2389
1.7
04899
278
1.8
3778
56
1.9
468
0
2.0
45
2.1
3
2.2
25
Age
23669 01126779
30
20
15
10
Males
0 Females
Fig. 35.7 A stem-and-leaf plot
plots suitable for such data include clustered or segmented bar or column charts (Figs. 35.10 and 35.11). If both the variables are numerical, then the relationship between the two can be illustrated using a scatter diagram. This plots one variable against the other in a two-way diagram. The vertical axis is referred to as the y-axis and the horizontal axis is referred to as the x-axis (Fig. 35.12). Conventionally, the independent variable (the variable that is controlled or selected in experimental studies) is plotted on the x-axis, and the dependant variable (the variable that is measured in experimental studies) is plotted on the y-axis.
5
0
Fig. 35.8 A dot plot
35.5 Probability Distributions 35.5.1 An Introduction to Probability Probability measures the likelihood or chance that an event will occur. Probabilities are always positive numbers lying between 0 and 1. If the probability is equal to 0, then the event cannot occur. If it is equal to
452
A. R. G. Hanna et al.
Fig. 35.9 A box or box-andwhisker plot
100
90
80
70
60
50
40
30
20
10 0
90 80 70 60 50 40 30 20 10 0 a
b
c
d
e
Fig. 35.10 A clustered bar chart
1, then the event is certain to occur. The probability of the complementary event (the event not occurring) is 1 minus the probability of the event occurring. If the likelihood (or probability) of one event occurring does not affect the likelihood that another
event will occur, then the two events are said to be independent. If two events cannot both occur, then they are said to be mutually exclusive, for example, if a coin is tossed the result cannot be both heads and tails.
35
Graphs in Statistical Analysis
Fig. 35.11 A segmented bar chart
453
160 140 120
64
71 72
100 34
80 60
38 85
40
78
67
20
64
32
0 a
Fig. 35.12 A scatter plot
b
c
d
e
1.2
1
0.8
0.6
0.4
0.2
0 0
0.1
0.2
Broadly, there are two ways in which probabilities can be interpreted: • Frequentists use probabilities when dealing with random, well-defined experiments. As the experiment is repeated, the relative frequency of an outcome will tend towards its probability. • Bayesians, however, assign probabilities to any statement based on the degree of belief in the statement, given the available evidence. This chapter focusses on the application of frequentist interpretations of probability. While Bayesian analysis and statistical inference is powerful and frequently applied in the medical literature, it lies outside of the scope of this chapter and is described in Chap. 29 in more detail. The mathematical notation used to manipulate statistics can be confusing; in the following table (Table 35.2), we explain some of the commonly used notations and rules used to manipulate probabilities.
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
35.5.2 Probability Distributions A random variable is a quantity that can take any one of a set of mutually exclusive values with a given probability. A probability distribution shows the probability that the random variable will take each possible value. Probability distributions, such as empirical data sets can be expressed mathematically, in terms of measures of central tendency, dispersion and shape (Sect. 35.4). The probability distribution can be either discrete or continuous, according to the type of random variable. In this chapter, the probability distributions commonly encountered in the medical literature are described. The focus is on explaining the principles and applications of each distribution as the widespread use of statistical software packages has rendered detailed knowledge of the underlying probability density functions largely redundant.
454
A. R. G. Hanna et al.
Table 35.2 Calculation of Probabilities P(A)
The probability of event “A” occurring
P(A )
The probability of event “A” not occurring = 1 − P(A)
P(A « B)
The probability of event “A” and “B” occurring If event A and B are independent then: = P(A)P(B)
P(A » B)
The probability of event “A” or “B” occurring If event A and B are independent then: = P(A) + P(B) − P(A « B) However, if event A and B are mutually exclusive then: = P(A) + P(B)
P (A|B)
The probability of event “A” occurring given that event “B” has occurred If event A and B are independent then: P (A Ç B ) = P (B )
35.5.3 Discrete Probability Distributions Discrete probability distributions such as the binomial and the Poisson distributions are frequently used to describe the probability that a random variable will take one of a finite set of mutually exclusive values. The sum of the probabilities for each value must add up to one. 35.5.3.1 The Binomial Distribution The binomial distribution is often applied in the medical literature in circumstances in which only one of two outcomes can occur. For example, we may be interested in how many patients will be free from the recurrence of a cancer after a novel chemotherapy regime. If we look at 20 patients undergoing treatment, the binomial random variable is the observed number of patients who are disease-free at maximum followup (“successes”). If the probability that each patient will not suffer recurrence is 0.1, the expected number of “successes”, E(S), will be given by E (S ) = np = 0.1 ´ 20 = 2, where S is the number of “success”, n is the sample size, p is the probability of success, and S ∼ B(n,p)
(i.e., S is a binomial distributed random variable defined by n and p). The variance of S, V(S) is given by V (S ) = n p (1 - p ) = 1.0 ´ 20 (1 - 0.1) = 1.8,
when n is small, such as in the above example, the distribution is skewed to the right if p < 0.5 (Fig. 35.13a) and to the left if p > 0.5 (Fig. 35.13b). As n increases the binomial distribution becomes less skewed. If both the expected number of “successes” np and “failures” n(1–p) is greater than 5, then the distribution approximates to a normal distribution (see Sect. 35.5.4.1) (Fig. 35.14c).
35.5.3.2 The Poisson Distribution The Poisson random variable is the count of the number of events that occur independently and randomly in space or (more frequently) time. For example, the number of surgical referrals per day typically follows the Poisson distribution. We can use our knowledge of the Poisson distribution to calculate the probability of a certain number of admissions on any particular day. The Poisson distribution is described by the parameter l (pronounced lambda), which describes not only the mean number of occurrences in a specified time but also the variance. The Poisson distribution is right-skewed if l is small (Fig. 35.14a), but becomes more symmetrical as l increases, when it approximates to a normal distribution (Fig. 35.14b).
35.5.4 Continuous Distributions If the observed random variable is continuous, the number of possible values is infinitely large. Consequently, the probability density function, such as the empirical frequency distribution, represents the relative frequency that a value will be observed. The area under the probability density function between any two possible values is equal to the probability that an observation will fall between these values. Consequently, the area under the probability density function between the minimum possible value, x0, and
35 Graphs in Statistical Analysis
a
n = 20, p = 0.1 0.3
P robabi l i ty
0.25
0.2
0.15
0.1
0.05
0 0
1
2
3
4
5
6
7
8
9
Value
b
n = 20, p = 0.9 0.3
Probability
0.25
0.2
0.15
0.1
0.05
0
10
11
12
13
14
15
16
17
18
19
20
21
Value
c
n = 100, p = 0.1 0.16 0.14 0.12
P roba bi li ty
Fig. 35.13 Binomial distributions
455
0.1 0.08 0.06 0.04 0.02 0 0
1
2
3
4
5
6
7
8
9
10
Value
11
12
13
14
15
16
17
18
19
20
456 Fig. 35.14 Poisson distributions
A. R. G. Hanna et al.
a
λ=1 0.4 0.35
P robabi l i ty
0.3 0.25 0.2 0.15 0.1 0.05 0 0
1
2
4
3
5
6
7
Val ue λ= 10
b 0.12
P robabi l i ty
0.1
0.08
0.06
0.04
0.02
0 0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Value
any other value, xi, will equal the probability that an observation, x, will be less than this value, P(x < xi). This is called the cumulative probability density function. For example, Fig. 35.15a shows a probability density function on the left-hand axis accompanied by the associated cumulative probability density function on the righthand axis. The solid vertical line at 0 encloses 50% of the density function, whereas the dashed vertical line at 1.96 encloses 97.5% of the density function (These numbers have special significance as we will explain in the Sect. 35.7.). As the probability that x will take a value between the maximum and minimum value of the probability density function is equal to 1, the total area under the probability density function will also equal 1. Consequently, we can calculate the probability that x will be greater than 1.96 by subtracting the probability that x will be less that 1.96 from 1 (1–0.975 = 0.25).
35.5.4.1 The Normal Distribution The normal distribution, also known as the Gaussian distribution (Fig. 35.15a), is one of the most important distributions in medical statistics because many observed phenomenon both in the physical and behavioural sciences can be approximated to a normal distribution. The normal distribution is completely described by two parameters, the mean, m, and the variance, s. The distribution is shifted to the right as the mean increases and to the left as the mean is decreased (Fig. 35.15b). It is also flattened as the variance increases (Fig. 35.15c). Importantly, the normal distribution is always symmetrical. The standard normal distribution is a normal distribution with a mean of 0 and standard deviation 1 for
35 Graphs in Statistical Analysis 0.4
a
0.35
µ=0, σ=1
1
Cumulative Probability {µ=1, σ=1}
0.9 0.8 0.7
Probability
0.25
0.6
0.2
0.5 0.4
0.15
0.3
Cumulative Probability
0.3
0.1 0.2 0.05
0.1
0
0 -5
.-4
-3
-2
-1
0
1
2
3
4
5
Value 0.4
b
0.35 0.3
Probability
0.25 0.2 µ-0,σ-1 0.15
µ-0,σ-1
0.1 0.05 0 −3
−5
−1
Value
1
5
3
0.4
c 0.35 0.3 0.25
Probability
Fig. 35.15 Normal distribution
457
0.2 µ=0,σ=1
0.15 µ=0,σ=√2
0.1 0.05 0 −10
−8
−6
−4
−2
−0 Value
2
4
6
8
10
458
A. R. G. Hanna et al.
like the t-distribution, is defined in terms of degrees of freedom. It is a skewed to the right, taking positive values. However, as the degrees of freedom increase, the probability density function becomes more symmetrical, approaching normality [5].
which the cumulative probability density has been calculated for every value of x. It can be used to calculate the area under any normal distribution, N(ma,sa), between any two values of xia and x(i + 1)a by subtracting ma from xia and x(i + 1)a, and then dividing the result by sa. The corresponding cumulative probability density function from the standard normal distribution for (xia−ma)/sa and (x(i + 1)a−ma)/sa can then be obtained. The cumulative probability density function from the standard normal distribution for (xia−ma)/sa can then be subtracted from the cumulative probability density function from the standard normal distribution for (x(i + −ma)/sa to give the area under the probability density 1)a function of N(ma,sa), between xia and x(i + 1)a.
The F-distribution also known as Snedecor’s F-distribution or the Fisher–Snedecor distribution is used when comparing two variances or more than two means using the analysis of variance (ANOVA) method (see Sect. 35.9.3.1) [4].
35.5.4.2 Student’s t-Distribution
35.5.4.5 The Lognormal Distribution
The parameter that characterises Student’s t-distribution or the t-distribution is the degrees of freedom, v, which is related to the sample size. Student’s t-distribution is used to calculate confidence intervals, test hypotheses and to estimate the mean of a normally distributed population when the sample size is small. It is more spread out with longer tails than the standard normal distribution. Its shape approaches normality as the degree of freedom increases. When v is equal to 30, the distribution is almost the same as the normal distribution [7].
The lognormal distribution (Fig. 35.17) is the probability distribution of a random variable whose log (to either base 10 or e) follows the normal distribution. It is highly skewed to the right. Many variables in medicine follow a Lognormal distribution. If a data set has a lognormal distribution, the geometric mean is a more appropriate a measure of central tendency than the arithmetic mean.
35.5.4.4 The F-Distribution
35.6 Transformations Many commonly used statistical tests require the data to be normally distributed, the variance to be constant or assume a linear relationship between variables. Sampled data, however, frequently does not comply with these requirements and consequently it is often necessary to
35.5.4.3 The Chi-Squared Distribution The Chi-squared (c2) distribution (Fig. 35.16) is used to analyse categorical data. The chi-squared distribution, 0.16 0.14
Probability
0.12 0.1 0.08 0.06 0.04 0.02 0
Fig. 35.16 A Chi-squared distribution
0
2
4
6
8 Value
10
12
14
16
35
Graphs in Statistical Analysis
459
Fig. 35.17 A lognormal distribution
0.8 0.7 0.6
Probability
0.5 0.4 0.3 0.2 0.1 0 0
500
1000
1500
2000
2500
3000
3500
4000
Value
Fig. 35.18 The logarithmic transformation
Before Transformation y
Frequency
y
x x x
x x x x x x
x
y
logy
x
Frequency
logy
x x x
x
logy
a
x x x x
x x x x x
b
x x x x
x x x x x
x x x x x x
x
c
After Transformation
transform the available data to satisfy the assumptions underlying the proposed analytical methods. The original data are converted or transformed by applying the same mathematical transformation to each observation. The transformation is then checked to ensure that it satisfies the assumptions of the planned statistical test, either using graphical or numerical statistical methods. After analysis has been performed the data can then be transformed back to the original scale. The following transformations are most frequently encountered in the medical literature: • The logarithmic transformation (z = logy) frequently used in medicine because of its logical interpretation and because many variables have right-skewed distributions (Fig. 35.18).
• The square root transformation (z = y ) has properties that are similar to those of the log transformation, although the results after they have been back-transformed are more complicated to interpret. In addition to its normalising and linearising abilities, it is effective at stabilising variance if the variance increases with increasing values of y. The square root transformation is often used if y is the count of a rare event occurring in time or space described by a Poisson variable. • The reciprocal transformation (z = 1/y) also has properties that are similar to the log transformation. In addition to its normalising and linearising abilities, it is more effective at stabilising variance than the log transformation if the variance increases very markedly with increasing values of y.
460
A. R. G. Hanna et al.
Fig. 35.19 The square transformation
Before Transformation
y
Frequency
y
x x x
x x x x x x
x x x x x
x x x
y
y2
Frequency
y2
x
x x x x x y2
a
x x x x x x
x x x x x
x
b
x
c
After Transformation
• The square transformation (z = y2) achieves the reverse of the log transformation (Fig. 35.19). • The logit (logistic) transformation (z = ln p/1–p) is used most often to transform a set of proportions or linearize a sigmoid curve (Fig. 35.20). We cannot take the logit transformation if either P = 0 or P = 1 because the corresponding logit values are −∞ and +∞. One solution is to take p as 1/(2n) instead of 0, and as 1–1/(2n) instead of 1, where n is the sample size.
Before Transformation p 1
0 x
logit p
35.7 Confidence Intervals Confidence intervals are widely used to quantify the precision of an estimated quantity as they are considered by many researchers to be more intuitive and easier to interpret than the standard error. Generally, the 95% confidence intervals can be considered to enclose the estimated quantity with 95% certainty. Wide confidence intervals indicate that the estimate is imprecise, whereas narrow intervals suggest that the estimate is precise. As confidence intervals are a function of the standard error larger samples from lessdispersed populations will result in narrower confidence intervals.
After Transformation
x
Fig. 35.20 The logistic transformation
35.7.1 Confidence Intervals for the Mean Using the Normal Distribution As we demonstrated in Sect. 35.5.4, 2.5% of the total area under the probability density function is enclosed
35
Graphs in Statistical Analysis
461
by the tail of a normal distribution and a vertical line, intersecting the x-axis 1.96 standard deviations from the mean (Fig. 35.15a). Using the symmetrical property of the normal distribution, we know that 2.5% of the total area under the curve will also be enclosed by the other tail and similar vertical line, intersecting the x-axis 1.96 standard deviations from the mean. As 2.5% of the area under the probability density function is enclosed under each tail by vertical lines 1.96 standard deviations from the estimated population mean, we can conclude that 95% of the area under the probability density function is bounded by the two vertical lines. If x is randomly sampled several times, the range of values between the vertical lines would contain the true population mean on 95% of occasions. This range is known as the 95% confidence interval for the mean. We usually interpret this confidence interval as the range of values within which we are 95% confident that the true population mean lies. It may help interpret the confidence interval in this way as it is conceptually easier to understand, though not entirely correct as the population mean has a fixed value.
where t0.05 is the percentile of the t-distribution with n–1 degrees of freedom, which gives a two-tailed probability of 0.05. Using the t-distribution rather than the normal distribution estimates wider confidence intervals. As the sample size increases, however, the difference between the two distributions becomes smaller. Consequently the t-distribution is usually used to calculate confidence intervals in both small and large samples, as it returns robust estimates of the confidence intervals in small samples and returns similar estimates to the normal distribution in large samples.
35.7.2 Confidence Intervals for the Mean Using Student’s t-Distribution
p = r / n.
The distribution of the mean in any population when estimated using the sample mean is normally distributed if the population has a finite mean and variance. This is called the central limits theorem. However, if the sample size is not sufficiently large (approximately less than 20), the central limits theorem cannot be applied as the distribution of the sample mean may not conform to a normal distribution. Furthermore, even in larger sample sizes the standard deviation may not be a reliable estimate for the population standard deviation because of sampling variation. This problem is solved using Student’s t-distribution. Theoretically, the use of Student’s t-distribution is only valid if the population is normally distributed; however, it is robust and widely used except when the population is extremely non-normal. Using Student’s distribution, the 95% confidence interval for the mean would equal: æ S ö x ± t 0.05 (SEM x ) = x ± t 0.05 ç , è n ÷ø
35.7.3 Confidence Interval for the Proportion The sampling distribution of a proportion follows a binomial distribution; however, if the sample size is sufficiently large (greater than 5), then the sampling distribution of the proportion is approximately normal. If we assume that the mean proportion, p, of events, r, in a sample, n, is given by the following equation:
Then assuming a normal distribution the standard error of p is given by SEM p =
p (1 - p ) . n
The 95% confidence intervals for the proportion are then estimated by: æ = p ± 1.96 ç è
p (1 - p )ö ÷. n ø
35.7.4 Bootstrapping This is a computer-simulation procedure, which allows estimation of population characteristics, such as the mean, standard deviation and associated confidence
462
intervals. A large number of random samples are derived from the original sample, each of the same size as the original sample, by sampling with replacement. Every sample provides an estimate of the population required parameter, and the simulated estimates are used to estimate the required population parameters. The advantage of bootstrapping over the previously discussed methods is that no assumptions are made about the population parameters that are being estimated or even the underlying population distribution. Several assumptions are made; however, for example, that individual observations in the sampled data set are independent. Furthermore, critics of bootstrapping methods argue that it often underestimates the uncertainty associated with population parameter estimates.
35.8 Hypothesis Testing Statistical hypothesis testing utilises several statistical techniques to facilitate decision-making using experimental data. The process of hypothesis testing can be divided into several stages, which are covered in more detail later in this section: 1. Define the null and alternative hypotheses 2. Collect sample data 3. Calculate the test statistic specific to the null hypothesis 4. Compare the value of the test statistic to values from a known probability distribution 5. Interpret the results
35.8.1 Step 1: Defining the Null and Alternative Hypotheses Defining the null hypothesis (commonly referred to in statistical notation as H0) is the first stage of hypothesis testing. Usually, the null hypothesis is that there is no difference between two populations. For example, if we are interested in comparing the rate of oesophageal cancer in British men and women, the null hypothesis would be: H0: Rates of oesophageal cancer in the UK are the same in men and women.
A. R. G. Hanna et al.
We then define the alternative hypothesis (commonly referred to in statistical notation as H1), which is true if the null hypothesis is false. For example, the alternative hypothesis would be: H1: Rates of oesophageal cancer in the UK are different for men and women. Once the null hypothesis has been defined, appropriate data can be sampled and appropriate statistical hypothesis tests performed that will enable us to determine whether we have enough evidence to reject or accept the null hypothesis.
35.8.2 Step 2: Sampling the Data It is important that an appropriate amount of data is sampled from the population to ensure that the null hypothesis is not rejected when it is actually true (Type I Error) or accepted when it is actually false (Type II Error). In conventional statistical notation, the maximum probability of a type I error occurring is a (pronounced alpha), and the maximum probability of a type II error occurring is b (pronounce beta). The probability that a study will not make a type II error and reject a false null hypothesis is referred to as the Study’s power (equal to 1–b). Detailed knowledge of the actual equations has been rendered largely redundant by computer statistical packages; however, it is important to appreciate that power is dependent on three factors: • The power is greater if the significance level (equal to a) is larger. This is because as the probability of a type I error, a, increases as the probability of the type II error, b, decreases. Confusingly, a significance level of a = 0.05 is often referred to in the literature as 95% statistical significance. • Sample characteristics affect the power of a study. For example, as the sample size increases the power increases. • The power of the test is greater for larger difference between populations. A hypothesis test consequently has a greater chance of detecting a large effect than a small one. It essential that power calculations are performed before sampling is commenced as a study that is not significantly powered to reject the null hypothesis would be a
35
Graphs in Statistical Analysis
waste of time and resources and may be unethical in some circumstances. As the anticipated treatment effect is fixed and the arbitrary standards for statistical power of 80% and significance of 5% are generally applied in the medical literature, researches must ensure that the sample size is large enough to ensure that the study is sufficiently powered before sampling is commenced.
35.8.3 Step 3 & 4: Data Analysis The most appropriate method to analyse the sampled data will depend on a number of factors such as the study design, the type of data, distribution of the data and the number of groups. Statistical tests can be broadly classified as being numerical or categorical, parametric or non-parametric. Parametric tests are hypothesis tests that make assumptions about the distributions that the data follows. Often data does not conform to the assumptions that underlie these methods. In these cases, non-
463
parametric tests are used. Non-parametric tests make no assumptions about the distribution of the data that are analysed using a rank describing its position in the ordered data set. Non-parametric tests are very useful when the sample size is small or when the data are measured on a categorical scale. They have less power, however, to detect a real effect than equivalent parametric tests if all the assumptions underlying the parametric test are satisfied. Furthermore, they serve as significance tests and often do not provide estimates of the effects of interest. The most appropriate analytical methods are summarised in Fig. 35.21 and discussed in more detail in Sect. 35.8.5 [3].
35.8.4 Step 5: Interpreting the Results 35.8.4.1 p-Values The statistical significance tests described in this section calculate a p-value, which represents the probability of
Fig. 35.21 Flow diagram showing choice of appropriate hypothesis tests
464
obtaining the observed result, given that the null hypothesis is true. If the p is larger than the significance level (a), then the null hypothesis cannot be rejected, as the chance of the observed event occurring, should the null hypothesis be true is greater than the chance of making a type I error that we are prepared to accept (a). This understanding of the p-value is very important, as it is frequently misinterpreted in several ways: • It is not the probability that the null hypothesis is true, nor is 1–p is not the probability of the H0 is true. • It is not the probability of falsely rejecting the null hypothesis or the probability that a finding occurred merely by chance. This is subtly different from the real meaning, which is that p is the chance that the null hypothesis explains the result. The result might not have occurred by chance, but be explained by the confidence intervals associated with the null hypothesis. • It is not the probability that a replicating experiment would not yield the same conclusion. • The significance level of the test is not determined by the p-value. The significance level should be determined before the data is sampled [12].
A. R. G. Hanna et al.
be of clinical importance. In these situations, the problem of assessing equivalence and non-inferiority is approached by determining whether the confidence interval for the effect of interest (e.g., the difference in means between two treatment groups) lies wholly or partly within a predefined equivalence range of predetermined values that have no clinical importance. If the whole of the confidence interval for the effect of interest lies within the equivalence range, then this suggests that the two drugs are equivalent, even if the upper and lower limits of the confidence interval suggest there is benefit of one treatment over the other. In a non-inferiority trial, the aim is to show that the new drug is not substantially worse than the standard one. Consequently, in a non-inferiority trial, the lower limit of the appropriate confidence interval should not fall below the lower limit of the equivalence range if the new treatment is not inferior [1].
35.8.5 Types of Statistical Tests 35.8.5.1 Numerical Data, One Group
35.8.4.2 Relationship Between Hypothesis Tests and Confidence Intervals The primary aim of a hypothesis test is to make a decision and provide an exact p-value. A confidence interval merely quantifies the effect of interest (e.g., the difference in means), and enables us to assess the clinical implications of the results. However, in some situations, an intervention may clearly be more effective in some respects, while there are concerns about other aspects of its efficacy. Alternatively, the new intervention may be no more effective clinically than the existing control but may have other advantages, such as being less costly. In these situations, the aim of the clinical trial is to demonstrate that the efficacy of the new treatment is similar (in an equivalence trial) or not substantially worse (in a non-inferiority trial) than the control rather than trying to highlight better outcomes in the treatment group. The hypothesis testing procedure used in the usual superiority trial, which tests the null hypothesis, is irrelevant, as a non-significant result does not imply non-inferiority or equivalence, and even if a statistically significant effect is detected, it may not
When a sample has been collected for a single group of numerical observations, it is often necessary to demonstrate that an estimate of a population parameter, e.g., the mean, takes a particular value. For example, a researcher may want to compare the mean complication rate after coronary surgery with a known population mean. The following tests are available to compare estimated population parameters with a known or assumed value: • The parametric one-sample t-test assumes that the variable is normally distributed with a given (usually unknown) variance. A reasonable sample size should be used, so that we can check the assumption of normality. • Here, the interest lies in whether the population mean, m, differs from some hypothesised value, m1. We use a test statistic that is based on the difference between the sample mean, x and m1. Assuming that we do not know the population variance, then this test statistic, often referred to as t, follows the t-distribution. If we do know the population variance or the sample size is very large, then an alternative test (often called a z-test), based on the normal
35
Graphs in Statistical Analysis
distribution can be used. The 95% confidence interval provides a range of values in which we are 95% certain that the true population mean lies. If the 95% confidence interval does not include the hypothesised value for the mean, m1, we reject the null hypothesis at the 5% level. If, however, the confidence interval includes m1, then we fail to reject the null hypothesis at that level. The t-test is relatively robust to some degree of non-normality; however, extreme skewness may compromise its validity. In this case, data can be either transformed so that the variable is normally distributed or we can use a non-parametric test such as the sign test or Wilcoxon signed ranks test. • The sign test is a simple non-parametric test based on the distribution median. Here, there is some hypothesised value l for the population median. If the sample concerned comes from this population, then approximately half of the values in this sample should be greater than l and half should be less than l. Although the sign test is able to determine whether there is a difference between the estimated population median and l, it is not able to determine the size of this difference. • The Wilcoxon signed ranks test is more powerful than the sign test, as it takes into account the signs of the differences, as well as their magnitude, and is therefore more powerful. The individual difference is calculated for each pair of results. These are then classed as being either positive or negative. The differences are placed in order of size, ignoring their signs, and are ranked accordingly. The smallest difference thus gets the value 1, the second smallest gets the value 2, and so on, up to the largest difference, which is assigned the value n′, if there are n′ non-zero differences. If two or more of the differences are the same, they each receive the same mean of the ranks these values would have received if they had not been tied. Under the null hypothesis of no difference, the sums of the ranks relating to the positive and negative differences should be the same.
35.8.5.2 Numerical data, Two Related Groups Individual observations in two different sample groups can be related in one of two ways: when a temporal relationship between an independent and dependant variable is being investigated, e.g., in a cross-over
465
clinical trial, each variable is measured twice in every patient, one while undergoing the new treatment and one while undergoing the control regime. Alternatively, the observations in each sample may be from different individuals, but linked to each other in some way. For example, patients in a case–control study may be individually matched or paired to patients in a control group. It is important to take account of the dependence between the two samples when analysing the data, otherwise the advantages of pairing are lost. This is achieved by considering the differences in the values for each pair, so reducing the two samples to a single sample of differences. The following tests are commonly used in this circumstance; • The paired t-test assumes that the differences between individual pairs of data are normally distributed. If the two sets of measurements are the same, then it is expected that the mean of the differences between each pair of measurements would be zero. Therefore, this test statistic simplifies to a onesample t-test on the differences, where the hypothesised value for the difference in the means is zero. • If the differences do not follow a normal distribution the data must either be transformed or we must use a non-parametric test, such as the sign or Wilcoxon signed ranks test, to assess whether the differences are centred on zero.
35.8.5.3 Numerical data, Two Unrelated Groups Often it is necessary to compare two groups where there is no relationship between the individual observations in the two groups, e.g., when determining whether a treatment group is different to a control group in a randomised controlled trial. In this circumstance, the following tests can be used: • The unpaired (two-sample) t-test assumes that in the population, the observed variable is normally distributed in each group, and the variances of the two groups are the same. In addition, the sample sizes must be sufficient to test the assumptions of normality and equal variances. The null hypothesis that the population means in the two groups will be the same is adopted. The difference in the means, often referred to as t, follows the t-distribution and will have a median value of 0 if the groups are
466
equivalent. The upper and lower limits of the confidence interval can be used to assess whether the difference between the two mean values is clinically important. If the upper or lower limit is close to zero, this indicates that the true difference may be very small and clinically meaningless, even if the test is statistically significant. If the assumptions are not satisfied, when the sample sizes are reasonably large, the t-test is fairly robust to departures from normality. However, it is less robust to unequal variances. If there are concerns that the assumptions are not satisfied, then either the data can be transformed to achieve approximate normality or equal variances, an adaptation of the t-test called Welch’s t-test can be used or a non-parametric test such as the Wilcoxon two-sample rank sum test can be used. • The Wilcoxon two-sample rank sum test or the Mann–Whitney U test is a non-parametric test equivalent to the unpaired t-test. It makes no distributional assumptions. The variables in both groups are ranked together. The test is then based on the sum of the ranks in each group; these should be comparable after allowing for differences in sample size if there is no difference between the groups.
35.8.5.4 Numerical data, More Than Two Groups When an observed variable from more than two populations is compared, the statistical tests previously discussed are vulnerable to a high Type I error rate, because of the large number of comparisons. This may lead to incorrect conclusions. Consequently, it is better to perform a single test to determine whether the estimates of the population parameters differ. In this circumstance, the following tests can be used: • One-way analysis of variance (ANOVA) assumes that the observed variable is normally distributed, the variance in each group is the same, and that the groups are defined by a single factor, e.g., different treatments. The sample size must be sufficient to test these assumptions. The one-way ANOVA separates the total variability in the data into that which can be attributed to differences between the individuals from the different groups (the inter-group variation), and to the random variation between the individuals within each group (the intragroup variation). These are measured using variances,
A. R. G. Hanna et al.
hence the name ANOVA. Under the null hypothesis that the group arithmetic means are the same, the intergroup variance will be similar to the intra-group variance. If, however, there are differences between the groups, then the inter-group variance will be larger than the intra-group variance. The test is based on the ratio of these two variances [8]. The unpaired t-test and ANOVA give equivalent results when there are only two groups of individuals. Although ANOVA is relatively robust to moderate departures from Normality, it is not robust to unequal variances. Therefore, before carrying out the analysis the underlying assumptions must be checked. If the assumptions are not satisfied, the data can be transformed or the non-parametric equivalent of one-way ANOVA, the Kruskal-Wallis test, can be used. • The Krusksal-Wallis test is considered to be an extension of the Wilcoxon rank sum test. Under the null hypothesis of no differences in the distributions between the groups, the sums of the ranks in each of the groups should be comparable after allowing for any differences in sample size. It is important to stress that both one-way ANOVA and its nonparametric equivalent can only be used when the differences between groups relate to a single factor and the groups are independent.
35.8.5.5 Categorical Data: One Group In a single sample, each observation may either “possess” a characteristic of interest or not. A useful summary of the data is provided by the proportion of observations that possess the characteristic. If we wish to establish whether the proportion takes a particular pre-determined value, the following tests can be used: • The aim of the Z-test is to determine whether the proportion of patients in a sample with a particular characteristic, p, takes a particular value, p1. If r individuals in our sample of size n, selected from the population of interest have the characteristic, the estimated proportion with the characteristic, p, is r/n. The number of individuals with the characteristic follows the binomial distribution, but this can be approximated by the normal distribution, providing np and n(1–p) are each greater than 5. If we assume that p is normally distributed, we can determine whether p1 falls within the 95% confidence intervals defined by x = p and
35
Graphs in Statistical Analysis
467
s 2 = ( p (1 - p )) / n . As this test statistic is based on p, it also follows the Normal distribution. • The sign test may be applied to a proportion if the response of interest can be expressed as a preference, although this test statistic and formulation of the problem may appear to be different from the applications of the sign test mentioned earlier. For example, the sign test could be applied in a crossover trial where patients may prefer either treatment A or treatment B. If there is no preference overall, the proportion of patients who prefer A will be 0.5.
35.8.5.6 Categorical Data: Two Independent Groups If there are two independent groups of individuals (e.g., men with or without bowel cancer), the aim may be to show if the proportion of individuals with a particular characteristic (e.g., a strong smoking history) is the same in both groups. The following tests can be performed to show an association between individuals in a particular group and the characteristic of interest: 2
• The Chi-squared (c ) test analyses how many of the observed samples have or do not have a particular characteristic. The frequencies are entered into a contingency table. Table 35.3 shows a generalised contingency table with the observed frequencies, the four marginal totals (the frequency in a specific row or column, e.g., a + b) and the overall total, n. The frequency can then be calculated that we would be expected in each of the four cells of the table if H0 were true (the expected frequencies). The assumption in this situation is that the chosen samples of sizes n1 and n2 are from two independent groups of individuals. The Chi-squared test aims to evaluate whether the proportion of individuals who possess the
particular characteristic are the same in both groups. Each individual is represented only once in the study. The rows (and columns) of the table are mutually exclusive, as each individual can belong in only one row and only one column. If the proportions with the characteristic in the two groups are equal and the overall proportion of individuals with the characteristic is given by p = (a + b)/n, it is expected that n1p of them to be in Group 1 and n2p to be in Group 2 (Table 35.3). A large discrepancy between the observed (O) and the corresponding expected (E) frequencies is an indication that the proportions in the two groups differ [5]. • The Chi-squared test requires the expected frequency in each of the four cells to be at least five. If E < 5 in any cell, Fisher’s exact test is used. Fisher’s exact test does not rely on the approximation to the Chi-squared distribution. This is a laborious test made possible by statistical software packages.
35.8.5.7 Categorical Data: Two Related Groups Individuals may be matched or measured twice in different circumstances (e.g., before and after treatment). When the aim is to assess if the proportions with a particular characteristic are the same in the two related groups McNemar’s test can be used: • McNemar’s test is a non-parametric test used to evaluate if two related or dependent groups have different characteristics. Every individual is classified according to whether the characteristic is present in both circumstances, one circumstance only or in neither, as shown in the Table 35.4. Those individuals who agree in the two conditions are ignored, concentrating on the discordant pairs. The Null hypothesis tested is that B = C. If B π C, then there is a difference in the number of individuals with the
Table 35.4 McNemar’s Test Table 35.3 – Chi squared table Characteristic Group 1 Group 2
Total
Circumstance 1 Present Absent
Total
Present
A
B
a+b
Circumstance 2
Absent
C
D
C+d
Present
A
B
A+B
Total
n1 = a + c
n2 = b + d
n=a+b+c+d
Absent
C
D
C+D
p2 = b/n2
p = a + b/n
Total
A+C
B+D
M=A+B+ C+D
Proportion with p1 = a/n1 characteristic
468
A. R. G. Hanna et al.
characteristic of interest in each group and consequently an association.
35.8.5.8 Categorical Data: More Than Two Categories Individual observations can often be classified by two factors. For example, one factor may represent certain degree of anaemia (mild, moderate or severe) and the other factor may represent blood group (A, B, O, and AB). The aim is to know whether the two factors are associated. Are individuals of a particular blood group likely to be more severely anaemic? The Chi-squared test can also be applied to this example by using a larger contingency table. • Large contingency table Chi-squared tests require that the data are presented in a contingency table with r rows and c as columns (Table 35.5). Every individual is represented once and can only belong in one row and in one column. The null hypothesis in this situation is that there is no association between the two factors. The expected frequency in each cell of the contingency table is calculated if the null hypothesis is true. This type of test statistic focusses on the discrepancy between the observed and expected frequencies in every cell of the table. If the overall discrepancy is large, then it is unlikely the null hypothesis is true. If more than 20% of the expected frequencies are less than 5, a proper combination of two or more rows or columns of the contingency table must be performed, followed by recalculating the expected frequencies of this reduced table. This process should be repeated until the expected value is greater than 5 in 80% of the cells. If the table is reduced to a 2 × 2 table and the Table 35.5 Large contingency table Chi-squared tests Column 1 Column 2 Column 3
expected frequencies are still less than 5, Fisher’s exact test can be used [5]. In this section, we have described some basic statistical tests that can be used to test the association of categorical and numerical variables in sampled data from one or more populations. In the remainder of this chapter and in other chapters in this book, we will discuss statistical tests that can be used to investigate more complex associations between observed variables within studied populations (Fig. 35.22).
35.9 Investigating the Relationship Between Two Variables 35.9.1 Correlation In statistics, the term correlation is used to describe the strength and direction of a linear relationship between two variables, unlike in colloquial speech where it is often used to describe any association between two variables [4]. To investigate a possible linear relationship between two variables, x and y, we first need to plot them against each other on a scatter plot (Fig. 35.23). A linear relationship between x and y could be said to exist if a straight line drawn through the midst of the points provides an appropriate approximation of the observed relationship (Fig. 35.23a–g). The closer points are clustered around the line, the “stronger” the correlation is said to be (Fig. 35.23a–d). If y increases as x increases, the correlation can be said to be positive; however, if y decreases as x increases, then the correlation is said to be negative (Fig. 35.23h–n). Correlation analysis quantifies the degree of association between two variables, using a correlation
…
Column C
Total
Row 1
f11
f12
f13
…
f1c
R1
Row 2
f21
f22
f23
…
f2c
R2
Row 3
f31
f32
f33
…
f3c
R3
…
…
…
…
…
…
…
….
…
…
…
…
…
…
Row R
fr1
fr2
fr3
….
frc
Rr
Total
C1
C2
C3
….
Cc
n
35
Graphs in Statistical Analysis
469
Further Analysis
Regression
Correleation
Regression
Correleation CoefficiantsPearson’s, Spearman’s
Simple Multiple Losistic Modelling
Longitudinal Studies
Assessing Evidence
Additional Topics
Repeated Measures, Time Series, Survival Analysis
Evidence-based medicine, Systematic reviews and metaanalysis
Diagnostic toolsSensitivity Specificity, Agreement - Kappa, Bayesian Methods
Fig. 35.22 Flow diagram showing choice of other statistical tests
Fig. 35.23 Scatter plots and associated correlation coefficients
coefficient. A number of different coefficients are used in different circumstances; however, the most commonly encountered in the medical literature is the Pearson product-moment correlation coefficient. The true value of the Pearson correlation, r, is estimated in the sample by r, where
r=
å ( x - x )( y - y ) å (x - x ) å ( y - y ) 2
2
r has no units and its magnitude ranges from 1 to 0. A value of 0 indicates there is no correlation, where as a
value close to 1 suggests there is a strong correlation between x and y. If r is positive, then y increases as x increases, this is called positive correlation. If r is negative then y decreases as x increases, this is called negative correlation (Fig. 35.23a–n). Correlation analysis using the Pearson coefficient can be misleading if there is a non-linear relationship between the two variables (Fig. 35.23o–t), the data comprise sub groups of individuals (Fig. 35.23u), there are one or more outliers or the data include more than one observation on each individual. For this reason, the Pearson correlation coefficient should only be calculated after the data have been plotted [1].
470
35.9.2 Spearman’s Rank Correlation Coefficient The Spearman’s rank correlation coefficient a nonparametric equivalent to Pearson’s correlation coefficient and is used when: • At least one of the variables, x or y, is measured on an ordinal scale • Either x nor y is non-normally distributed • The sample size is small • We require a measure of the association between two variables when their relationship is non-linear. To estimate the population value in this case, values of x, as well as y, are arranged in ascending order, assigning successive ranks (n1, n2, n3,…., nx) to them. The Spearman’s test has the same properties and hypotheses as Pearson’s test, replacing r by rs, except that rs provides only a measure of association (not necessarily linear) between x and y.
35.9.3 Univariate or Simple Linear Regression Simple linear regression, an extension of correlation analysis, can be used not only to test the strength of a linear relationship between two quantities of interest but also used to construct a simple formula that will predict the value of a quantity of interest when the related variables take given values. To perform linear regression, we assume that the sample is selected at random from the population of interest, that there is no more than one pair of observations from each individual, that the dependent variable is continuous, and that both variables are independent and normally distributed. In the following section, we will explain how a linear regression analysis is performed [1].
A. R. G. Hanna et al.
same arithmetic mean and variance, the same Pearson correlation coefficient and the same linear regression line. Fig. 35.24a shows two variables that appear to be correlated and following the assumption of normality. Despite having identical statistical properties to the data set plotted in Fig. 35.24a linear regression cannot be applied to the data sets in Fig. 35.24b–d. Figure 24b shows an obvious relationship between the two variables; however, it is not linear, and the Pearson correlation coefficient is not relevant. There is a perfect linear relationship in Fig. 35.24c; however, one outlier exerts enough influence to lower the correlation coefficient from 1 to 0.81. Furthermore, because of the influence of the outlier, the linear regression line no longer accurately predicts the data. Finally, in Fig. 35.24d, an outlier in another data set is enough to produce a high-correlation coefficient, even though the relationship between the two variables is not linear. Although it is obvious that linear regression is inappropriate in the data sets shown in Fig. 35.24b–d when they are inspected visually, this would not be obvious if the data sets where not plotted as they apparently have identical statistical properties. 2. The next step is to fit a linear regression line to the data. The value of y (Yi) for a given value of x (Xi) is described in terms of two regression coefficients, the intercept (a) and the slope (b) (Fig. 35.25). An error term is also incorporated (ei) to account for measurement error and the effect of other variables. Yi = a + bX i + ε i . A line is then fitted to the data by the method of ordinary least squares using a statistical software package. This method finds a line that minimises the sum of the squares of the errors (Fig. 35.26), n
åε
2 i
.
i =1
35.9.3.1 Performing Simple Linear Regression Analysis 1. Always plot the data before performing regression analysis. Fig. 35.24a–d shows “Anscombe’s Quartet” [2]. All four plots have x coordinates with the same arithmetic mean and variance, y coordinates with the
In simple linear regression, it is assumed that the errors are normally distributed and a variance independent of the value of y. In statistics constant variance in a sequence of random variables such as the errors is referred to as homoscedasticity. P values can be calculated to test the null hypothesis that there is no relationship between the variables,
35
Graphs in Statistical Analysis
471
a
20
b
20
18
18
16
16
14
14
12
12
10
10
8
8
6
6
4
4
2
2 0
0 0 20
2
4
6
8
10
12
14
16
18
20
0 20
c
18
18
16
16
14
14
12
12
10
10
8
8
6
6
4
4
2
2
2
4
6
8
10
12
14
16
18
20
2
4
6
8
10
12
14
16
18
20
d
0
0 0
2
4
6
8
10
12
14
16
18
20
0
which would result in the slope being zero. Additionally, the standard error and confidence intervals for all of the regression coefficients, including the slope can be calculated. 3. To deal with a failure of the data to satisfy the assumptions of linearity, normality or constant relation of y to the errors (also referred to as residuals), x or y can be transformed, and a new regression line is calculated. It is not always possible to find a satisfactory transformation. The independence of the
Dependant Variable (y)
Fig. 35.24 Scatter plots showingAnscombe’s quartet
y = ax bx b 1
a
Explanatory Variable (x)
Fig. 35.25 Linear regression
472
A. R. G. Hanna et al.
20
R2=
18
SSREG SS =1 - E × SST SST
16
This means that 1–R2 can be interpreted as the proportion of the variance that can be attributed to measurement error or variation in other variables, rather than variation in x [1, 8]. 6. The regression line can be used to predict values of y for predefined values of x within the observed range. It should never be used for this purpose outside of the observed range.
14 12 10 8 6 4 2
35.9.3.2 Multiple Linear Regression
0 0
2
4
6
8
10
12
14
16
18
20
Fig. 35.26 The method of ordinary least squares
observations and a linear relationship between x and y are the most important factors when calculating the regression line. Although a line can be constructed, however, when the normality and or constant variance assumptions are in doubt, estimates of the standard error, confidence intervals or p-value may be unreliable. 4. An outlier (an observation that is inconsistent with most of the values in the data set) may be an influential point, if its exclusion results in one of the regression coefficients changing. They can usually be detected by looking at the scatter diagram or the residual plots although statistical techniques can be used. For both outliers and influential points, the model is fitted with and without the outlying observation. Outliers or influential points should not routinely be discarded, but investigated further, because they may have clinical significance and their omission may affect the conclusions. 5. In ANOVA, the total sum of squares (SST), or variance, is split into two or more components. In particular we are interested in how much of the variation in the y, SST, is because of variation in x (regression variation, SSREG) and how much is because of measurement error or other factors (residual or unexplained variation), SSE. The sum of squares is related to the square of Pearson’s correlation coefficient, R2, in the following way:
Multiple linear regression, an extension of simple linear regression, is a form of a multivariable analysis as it can be used to study the joint effect of several explanatory variables, x1, x2,…, xk, on a response variable, y. It is misleading to refer to the explanatory variables as independent as they may be related. The multiple linear regression equation for a value of y, Yi, in sample of n individuals, when the explanatory variables x1, x2,…, xk take the values Xi1, Xi2,…, Xik is: Yi = a + b1 X i 1 + b2 X i 2 + L + bk X ik + ε i , where a is a constant term, the intercept, and b1, b2,…, bk are constant terms, the partial regression coefficients. b1 represents the amount by which Y increases on average if x1 is increased by one unit and x2, x3,…, xk are constant. If there is a relationship between x1 and x2, x3,…, xk, b1 differs from the estimate of the regression coefficient obtained by regressing y on only x1, because the latter approach does not adjust for the interaction between x1, x2,…, xk. When performing multiple linear regression, the assumption of linearity should be tested in a similar was to simple linear regression by plotting y against each explanatory variable, while the other explanatory variables are given assigned constant value. A multiple linear regression analysis should never be performed if the number of observations, n, is not ten times greater than the number of explanatory variables, k. Overfitting results in the degrees of parameter freedom exceeding the information content of the data leading to arbitrariness in the final (fitted) model parameters, affecting
35
Graphs in Statistical Analysis
the ability of the model to generalise beyond the fitting data. Most computer packages have automatic procedures for selecting variables, e.g., stepwise selection. These are particularly useful when many of the explanatory variables are related. A particular problem arises when pairs of explanatory variables are extremely highly correlated, often termed collinearity. Analysis of covariance (ANCOVA) is an extension of ANOVA for several variables. It tests whether certain factors have an effect on the response variable, y, after removing the variance associated with other explanatory variables (covariates) x1, x2,…, xk. The inclusion of covariates, without overfitting the model, can increase statistical power because it accounts for some of the variability. ANCOVA, such as ANOVA, assumes that errors are normally distributed and homoscedastic and that the relationship between the response variables and the explanatory variable is linear. If a binary variable is included to denote membership of a particular group ANCOVA can be used to investigate the effect of a treatment or risk factor on the response variable [1, 8].
35.9.3.3 Nonlinear Regression Nonlinear regression is a form of regression analysis in which observational data is modelled by a function, which is a nonlinear combination of the model parameters and one or more explanatory variables. The data are fitted by a method of successive approximations, either using the least squared or other methods. The choice of an appropriate statistical model will depend on the outcome of interest. For example, if the dependent variable is a continuous numerical variable, linear regression is used to identify factors associated with this variable. If the result of the data is in the form of binary outcome, however, then logistic regression would may a more appropriate choice of model (Table 35.6). In this section, we will discuss some of the most commonly encountered nonlinear models in the medical literature [13]; • Logistic regression (sometimes called the logistic model or logit model) is a generalised linear model used for binomial regression. It is used to explore the relationship of a binary outcome of interest with one or more explanatory variables. The data are fitted to a logistic curve. This allows the influence of an explanatory variable on the response variable to be explored by evaluating the probabil-
473 Table 35.6 Dependent variables and GLM Modelling Type of outcome Type of GLM commonly used Continuous numerical
Simple or multiple linear
Binary incidence of disease in longitudinal study (patients followed for equal periods of time)
Logistic
Binary outcome in crosssectional study
Logistic
Unmatched case–control study
Logistic
Matched case–control study
Conditional logistic
Categorical outcome with more than two categories
Multinomial or ordinal logistic regression
Event rate or count
Poisson
Time to event
Exponential, Weibull or Gompertz models
ity that a particular outcome will occur. For example, the probability that a person has a heart attack within a specified time period might be predicted from knowledge of the person’s age, sex and bodymass index. • Multinomial (polychotomous) and ordinal logistic regression are extensions of logistic regression. They are used when we have a categorical dependent variable with more than two categories. When the dependent variable is nominal (e.g., the patient has one of haemorrhagic disorders: haemophilia, thrombocytopenia or Factor X deficiency), we use multinomial logistic regression. When the dependent variable is ordinal or ranked, we use ordinal logistic regression. As these methods are complicated, the categories can be combined in some appropriate way to create a new binary outcome variable, and then perform the usual two-category logistic regression analysis. However, this approach may be wasteful of information and can introduce bias. • Conditional logistic regression is used when observations are matched (as in a matched case–control study), and the aim is to adjust for possible confounding factors. In these situations, the methods mentioned earlier are inefficient and lack power, as neither acknowledges that cases and controls are linked to each other. Conditional logistic regression allows the researcher to compare cases to controls in the same matched “set” or “pair” of observations.
474
A. R. G. Hanna et al.
• Poisson Regression is used to analyse the rate of an event when individuals have different follow-up times. Logistic regression, on the other hand, is concerned only with whether or not the event occurred and is usually only used when individuals have the same follow-up period. In Poisson regression, the rate of the event among individuals with the same explanatory variables (e.g., age or gender) is assumed to be constant over the follow-up period. Poisson regression is generally used to explain which variables influence the rate at which the event occurs, and often has a similar form to the logistic regression model, with a usually linear combination of explanatory variables. • When a group of individuals is followed from a natural “starting point”, for example, when they receive a specific treatment, until the time that the person develops an endpoint of interest, for example death, an alternative approach known as survival analysis can be used. In contrast to Poisson regression, it does not assume that the rate at which the event occurs is constant over time.
35.9.4 Generalised Linear Models Generalised linear models (GLM) are generalisations of ordinary least squares regression that unify other statistical models such as linear regression, logistic regression and Poisson regression. GLM relate the random distribution of the measured variable of the experiment (the distribution function) to the systematic (non-random) portion of the experiment (the linear predictor) through a function called the link function. This allows a general algorithm to be developed to fit the model [9]. The generalised linear model (GLM) can be expressed in the form: g (Y ) = a + b1 x1 + bx 2 +¼ bk x k , where Y is the estimated value of the predicted, mean or expected value of the dependent variable, which follows a known probability distribution (e.g., normal, binomial or Poisson). g(Y), called the link function or the identity link, is a transformation of Y which produces a linear relationship with x1,…, xk, the predictor or explanatory variables; b1,…, bk are estimated
regression coefficients that relate to these explanatory variables and a is a constant term. For any GLM, the likelihood of the model (L) is the probability that the observed results would be obtained had the regression coefficients taken specified values. We estimate the coefficients of the model by selecting the values for the regression coefficients that maximise L or those values that are most likely to have produced our observed results. This process is called maximum likelihood estimation (MLE).
35.9.5 Miscellaneous Problems in Regression Modelling In addition to selecting the correct methodology, correct data analysis requires that the following common pitfalls in statistical analysis are considered: • Statistical interaction between two explanatory variables in a regression analysis occurs when the relationship between one of the explanatory variables and the dependent variable changes as the other explanatory variables changes. This occurs because the two explanatory variables do not act independently on the dependent variable. Assessment of interaction between explanatory variables is easily performed using statistical software packages. If the package does not provide this facility, then an interaction term may be created manually by including the product of the two explanatory variables as an additional explanatory variable. • A confounding variable is an explanatory variable that is related to both the dependent variable and to one or more of the explanatory variables in the statistical model. Any regression model that considers the effect of one of the explanatory variables on the dependant variable without including the confounder may misrepresent the true role of the explanatory variable. If issues of confounding factors are not dealt with in regression analysis, a false association between explanatory variables and the dependant variable may artificially be created, or a true association could be hidden. This problem is rarely found in randomised controlled trials if the sample size is sufficiently large as patients are randomly allocated to treatment groups, and therefore, all covariates, both confounders and other explanatory variables, should
35
Graphs in Statistical Analysis
be evenly distributed in the different treatment groups. The problem of confounding is particularly concerning, however, when treatments are compared in a non-randomised clinical cohort studies. In this type of study, the characteristics of the individuals may be unevenly distributed in the different treatment groups. Although multivariate regression models and subgroup analysis can be used to adjust for differences between the treatment groups, this is possible only if the researcher is aware of confounding factors and the sample size is sufficiently large. • Jack-knifing is a way of estimating parameters and providing confidence intervals in an unbiased manner. A single observation is removed from the sample and the remaining observations are used to estimate the model parameters. This process is repeated for each observation in the sample, and the resulting estimates of each parameter are averaged. Because a score derived in this way is generated from many different data sets, it can be validated on the complete data set without taking subsamples. • When two explanatory variables are highly correlated, it may be difficult to evaluate their individual effects in a multivariable regression model. As a result, while each variable may be significantly associated with the dependent variable in a univariate model (when there is a single explanatory variable), neither may be significantly associated with it when both explanatory variables are included in a multivariate model. This is termed collinearity. It can be detected by examining the correlation coefficients between each pair of explanatory variables (commonly displayed in a correlation matrix) or by visual impression of the standard errors of the regression coefficients in the multivariate model. If there is a significant collinearity, the standard errors these will be substantially larger than those in the separate univariate models. If collinearity is found to be present only one of the variables should be included in the model [1, 3, 10].
475
35.10 Summary In this chapter, the applications, strengths and weaknesses of the statistical methods commonly applied in surgical research were briefly introduced. Statistical terms, graphical methods, summary statistics, hypothesis testing, correlation and regression were discussed. It is our hope that this chapter will not only provide a foundation for some of the more advanced topics discussed elsewhere in this book but also for the interpretation and application of statistical techniques in clinical practice and surgical research.
References 1. Altman DG (1991) Practical statistics for medical research. Chapman and Hall, London 2. Anscombe FJ (1973) Graphs in Statistical Analysis. Am stat 27:17–21 3. Cox DR (2006) Principles of statistical inference. Cambridge University Press, Cambridge 4. Everitt BS (2003) The Cambridge dictionary of statistics. Cambridge University Press, Cambridge 5. Greenwood PE, Nikulin MS (1996) A guide to chi-squared testing. Wiley, New York 6. Kurichi JE, Sonnad SS (2006) Statistical methods in the surgical literature. J Am Coll Surg 202:476–484 7. Lange KL, Little RJA, Taylor JMG (1989) Robust statistical modelling using the t-distribution. J Am Stat Assoc 84:881–896 8. Lindman HR (1974) Analysis of variance in complex experimental designs. WH Freeman, San Francisco 9. McCullagh P, Nelder J (1989) Generalized linear models. Chapman and Hall, London 10. Moses LE (1986) Think and explain with statistics. AddisonWesley, Reading 11. Robinson PM, Menakuru S, Reed MW et al (2009). Description and reporting of surgical data-scope for improvement? Surgeon 7:6–9 12. Schervish MJ (1996) P values: what they are and what they are not. Am Stat 50:203–206 13. Seber GAF, Wild CJ (1989) Nonlinear regression. Wiley, New York 14. Walter CJ, Dumville JC, Hewitt CE et al (2007) The quality of trials in operative surgery. Ann Surg 246:1104–1109
Questionnaires, Surveys, Scales in Surgical Research: Concepts and Methodology
36
Mohammed Shamim Rahman, Sana Usman, Oliver Warren, and Thanos Athanasiou
Contents 36.1
Introduction ............................................................ 477
36.2
Surveys .................................................................... 478
36.2.1 Stages of Building a Survey ..................................... 478 36.3
Types of Surveys ..................................................... 481
36.4
Data Collection Methods in Surveys..................... 483
36.5
Survey Advantages and Disadvantages ................ 483
36.6
Error in Survey Research ...................................... 483
36.7
Questionnaires ........................................................ 484
36.7.1 36.7.2 36.7.3 36.7.4
Questionnaire Design ............................................... Distribution............................................................... Response and Non-Response ................................... Advantages and Disadvantages of Questionnaires .....................................................
36.8
484 487 487 487
36.1 Introduction
Scales ....................................................................... 488
36.8.1 Attitude Measurement Scales ................................... 489 36.8.2 Semantic Differential Scales .................................... 490 36.8.3 Other Scaling Techniques......................................... 490 36.9
Abstract Appropriate data gathering is the key to a successful study, and the methods employed will invariably influence the results obtained and conclusions eventually drawn. This chapter works through different methods of data gathering techniques, namely, surveys, questionnaires and scales, discussing the concepts behind the development of such tools as well as their relative merits and drawbacks. Forming an easy-to-understand overview of these main methodologies in use today, the chapter guides the readers on the development of population-based research questions in order to develop their own research tools as well as help them employ established ones.
Validation ................................................................ 490
36.10 Factor Analysis ....................................................... 491 36.11 Structural Equation Modelling ............................. 492 36.12 Research Approval ................................................. 493 36.13 Future Research ..................................................... 493 References ........................................................................... 493
The use of rigorous qualitative research methods has been on the rise in health services planning and health policy research [34]. The ability of these methods to adequately capture and evaluate group and individual attitude, belief, mood and behaviour has been playing an increasing role in the assessment of patients utilising the health services. Social science and health-related research aims to investigate different aspects of health, which can be broadly divided into the following domains:
Further Reading ................................................................. 494
M. S. Rahman () The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust at St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
1. 2. 3. 4.
Emotional health – life satisfaction, self-esteem Psychological health – anxiety, depression Physical health – physical health, physical functioning Social health – social support, networks and role
Shortell [32], on commenting on the increasing use of qualitative research tools as a part of health services research, stated that the growth “is consistent with the developments in the social and policy sciences at large,
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_36, © Springer-Verlag Berlin Heidelberg 2010
477
478
reflecting the need for more in-depth understanding of naturalistic settings, the importance of understanding context, and the complexity of implementing social change”. Indeed, as greater appreciation of the impact of context and the varied effects of similar health policy and intervention on different individuals develops, the greater is the need for the ability to accurately record these unique effects on those involved in the receipt of healthcare. While populations involved in surgical research and surgical interventions have previously been assessed using survival rates, success rates and complication rates, it is becoming increasingly more important and pertinent to assess the impact of these by assessing patient satisfaction, subsequent changes to quality of life and attitudes to surgical intervention. This chapter discusses the methods that can be used to record and measure these effects, which were previously used only within the realms of social sciences but now are increasingly more applicable in the assessment of patient care.
36.2 Surveys “A collection of standardised information from a specific population usually but not necessarily by means of a questionnaire or interview” [29]. Origins can be traced to Victorian Britain when poverty was a key issue in society [2, 26]. Surveys are a method of collecting information from a sample of the population of interest [3]. It has subsequently been developed as a research strategy rather than a method or technique. It uses carefully standardised questions applied to a carefully chosen set of people. Units of sampling can be individuals, departments within an organisation or whole organisations.
36.2.1 Stages of Building a Survey Constructing a survey is a multi-staged process, and each process must be considered prior to embarking on the project. This ensures that the resource requirements and allocations required for each stage are fully considered in advance. An adaptation of the suggested template for design of a survey (Robson, 1993) is presented in Fig. 36.1 [29].
M. S. Rahman et al.
Designing the survey
Questionnaire construction and pilotin Determine the population and the sample to be selected
Briefing the interviewers (if interview-based)
Fieldwork phase
Editing and coding
Computer entry and editing
Fig. 36.1 Adaptation of suggested design template for developing a survey
Some of the above-mentioned factors are further developed in the following section.
36.2.1.1 Design Work Preliminary analyses must first indicate that a survey is the most appropriate way of addressing the research question (Box 1).
Box 1 Preliminary Analysis Questions 쐌 Who do you ask? Whole population by general census vs. distinct groups (representative sample). 쐌 How do you ask? Interviews, postal, phone, Internet 쐌 What do you ask? Should be related to the research issue 쐌 What resources do you need? Ensure adequate resources (e.g. staff, technology, financial) First, the general purpose of the study needs to be translated into a more specific aim. For example, the
36
Questionnaires, Surveys, Scales in Surgical Research: Concepts and Methodology
general purpose may be “the measurement of people’s satisfaction of their hospital’s surgical department”. However, this is too vague in designing the survey, so a more specific question needs to be identified, such as “how satisfied are patients with the results of their varicose vein operations?” Next, important aspects of determining the answer to the overall questions should be identified. Thus, for a busy surgical department in which performance is the issue under investigation, the following questions may be posed: 1. 2. 3. 4.
What procedures are on offer? What are the waiting times? Do waiting times affect satisfaction? Does each surgeon have a different satisfaction rate?
Finally, which methods of data collection are the most appropriate to extract relevant data? For example, this could entail face-to-face interviews when asking patients about post-operative pain, postal questionnaires or even diaries of experience following surgery. The question being asked, the sample population being asked and the financial constraints of the study group will influence the method being used.
36.2.1.2 Construction of Data Collection Tools Whether or not the questionnaire is for interview work, postal or Internet consumption, pilot work is vital to ensuring that appropriate questions are asked in the most appropriate manner to ensure relevancy of results and reduce potential bias. The aim is to find and overcome any difficulties prior to deployment of the actual questionnaire. One method is the use of focus groups [18] and structured interviews prior to conducting the survey proper, enabling the development of a survey with valuable and meaningful issues for the subjects with the identification of the language commonly used regarding the topic in question [25].
36.2.1.3 Determining the Population and the Sample to be Selected This is usually indicated by the research question. Sampling of the population (see Sect. 36.2.1.4) and the needed sample size depends on the resources available, ensuring that it is representative of the whole population.
479
36.2.1.4 Sampling Sampling involves techniques used to identify a population subset on which research can be conducted. This is distinct from gathering information from a census, whereby information is gathered from all members of the population of interest. Sampling of the population initially involves establishing the sample unit. This depends on the question being asked by the survey and can relate to individuals, organisations or geographical areas. In some instances, multiple levels of sampling hierarchy are requiredsuch that both individuals and organisations and geographical areas are used. Sampling can be broadly divided into probability samples where each respondent has a known probability for selection and non-probability samples where sample units have not been selected at random, essentially implying that some units have a greater probability of being selected than others.
Probability Sampling Simple Random Sampling This is the most basic form of probability sample and involves the selection of respondents at random from a list of the population (sampling frame) of the required number of persons from the sample. Each sample unit has an equal chance of being selected. The respondents can be selected at random using either a “lottery method” from random-number tables, (from statistical books) or computer-generated random numbers. This process helps to eliminate human selection bias. Systematic Sampling This is a variation on the simple random sampling system, in which a starting point in the sampling frame is chosen at random, and then, individual sample units are selected for every nth person. In order to remain representative of the population, the list must be organised in a way unrelated to the subject of the survey. Both simple and systematic sampling require a full list of the population, which may not be possible in certain circumstances. It is important to ensure that sample units are in no way ordered when random numbers are applied for them to be selected, as this may introduce an element of selection bias.
480
Stratified Random Sampling The population is stratified into mutually exclusive groups (strata) in which members share a particular characteristic (e.g. sex). Random sampling within each stratum is then undertaken (either by simple random sampling or systematic sampling), and proportionate sampling can be utilised to better reflect the general population by selecting multiple numbers from one stratum compared to the other. For instance, if there are an equal number of males and females in the population, then there are an equal number of samples from each group. However, if 80% of the population is male and 20% female, then one sample is four times greater than the other, respectively. Disproportionate sampling does not follow equal weighting and allows over sampling of a small but important stratum. This can be of use when studying rare cases of disease in order to choose more examples of them. Multi-Stage Cluster Sampling In cluster sampling, the population is first divided into a primary sampling unit (the first stage of the sampling procedure) and not into individual units. These aggregations of units are known as clusters, each of which contains individuals with a range of characteristics. A sub-population within each cluster is then sampled randomly. This is useful when the population is widely dispersed and large. An example of cluster sampling is to sample from a given number of hospitals and then to survey all the surgical patients within each hospital at a given point in time. Multi-stage sampling is an extension of cluster sampling, which involves selecting the sample in stages. An example of this would be a random sampling of a hospital, random sampling of a specialty within the hospital and then choosing a sample of patients from within the selected specialty. This method allows interviewers to be far more concentrated than would be the case, if a simple random or stratified sample was selected. Both cluster and multi-stage can use stratification. Non-Probability Sampling These techniques are also known as purposive samples and are used when it is not possible to specify the probability that any unit will be included in the sample. These techniques are commonly used in small-scale surveys because they are easy to set up and are acceptable when there is no need for a statistical generalisation to
M. S. Rahman et al.
any population beyond the sample surveyed or when “piloting” a survey. 1. Quota sampling – where interviewers are given targets for surveying. The sample is intended to represent a cross-section of the population. Clearly, this is subject to interviewer and responder bias. Dimensional sampling is an extension of quota sampling when a particular group is sampled as an additional dimension to the study question posed. 2. Self-selection can be utilised through posters, telephone numbers to contact or questionnaires that members of the population can pick up. This neglects the views of the apathetic. 3. Convenience sampling involves choosing the nearest and most convenient persons to act as respondents. This is a biased and unrepresentative study. Snowball sampling is in essence an extension of convenience sampling, but makes use of individuals contacted to introduce other individuals who fit the selection criteria. There is clearly no random selection here, but it can be of use in order to gain information about groups of people who do not belong to a particular sampling frame.
36.2.1.5 Sample Size and Statistical Power It should be noted that it is, in fact absolute sample size rather than relative sample size, which increases the likelihood of a study’s precision. Indeed, Bryman [5] stated that a sample size of 1,000 individuals in the UK has as much validity as a national probability sample of 1,000 individuals in the USA, despite the latter possessing a far larger population. The power calculation is the statistical technique used to determine sample size for a given study [3]. This enables the study to recruit the correct number of sampling units in order to produce a statistically significant result for a difference between groups of given sizes (i.e. the ability to detect a true difference between populations, rather than it being simply due to chance). If the sample size is too low, the statistical power will be low and the results questionable. However, a study designer will not want to over-recruit sample units, as this is a time-consuming and expensive process, though the larger the sample size, the greater the study’s relevance to the general population. Calculations for obtaining sample size are easily found in statistical textbooks.
36
Questionnaires, Surveys, Scales in Surgical Research: Concepts and Methodology
36.2.1.6 Selecting the Population Sample Resources are available for use in order to identify the population pool from which the sample population can be extracted. In the UK, these include the Office for National Statistics (electoral register and postcode address files), the Government or Local Government records, telephone or health records (which may be confidential) or more recently, the Internet sources.
36.2.1.7 Sampling Error Sampling error arises from the assumption that the sample eventually selected is not fully representative of the population from which it has been extracted. This demonstrates the amount by which the sample can be expected to differ from the underlying population for the particular variable under investigation. According to the NHS Health Survey Advice Centre [27], factors determining the level of sampling error are the following: for a characteristic, the proportion of people who possess the characteristic; for a numerical variable, the distribution of the variable in the population, sample design and sample size. Probability sampling does not and cannot eliminate sampling error, as an element of it will always exist. It is important, nevertheless, to attempt to reduce it as much as possible in order to further enhance the applicability of the results of any sample-based research. Probability sampling does allow the study group to make use of the tests of statistical significance, permitting the user to draw conclusions about the population from which the sample was selected.
36.3 Types of Surveys A number of different types of survey exist, each suitable for different elements of studying populations. They vary in population selection, temporal setting and the parameters that the surveys are attempting to measure or describe 1. Social surveys 2. Cross-sectional (descriptive) surveys 3. Longitudinal (analytical) surveys Each survey is unique in the manner by which it defines its data collection methods and the time frame in which
481
this occurs. This allows each survey type to achieve distinct goals and provide a particular type of analysis. Thus, the question being asked will largely determine which form of survey will be appropriate for extracting the relevant data from the population. Cross-sectional surveys This is a form of a descriptive survey of a defined, random cross-section of the population at any one particular point in time. Groups selected can be samples of a population or may consist of specific subsets of a single population (e.g. departments and types of patients). These surveys are effective at measuring opinions or attitudes and are relatively easy to perform. When retrospective, sample members (respondents) are usually questioned about past and current behaviours, attitudes and events. This type of survey enables the user to identify particular characteristics of a population and possible relationships within them. In particular, they can be used to estimate prevalence (but not incidence) of the disease burden within a population. They can estimate population parameters and can be used to generate hypotheses about cause and effect associations; however, they cannot be used to suggest the direction of the cause–effect relationship. For example, they may demonstrate that a population of patients who have undergone varicose vein surgery are a happier population in terms of their health; however, it is not possible to determine if the varicose vein surgery has made the population happy, or if the happier population elected more often for varicose vein surgery. Such findings can lead to the development of hypotheses, which can in turn be tested by analytical surveys. Cross-sectional surveys are, on the whole, economical in terms of time and resources. The retrospective nature of these surveys open up the possibility for recall bias. This increases with greater recall lengths of time that the sample members are asked to think back. Data collected can become outdated if the situation changes rapidly. Longitudinal studies Longitudinal studies involve the collection of data at several different points in time. They can broadly be divided into 1. Cohort designs 2. Panel designs 3. Trend designs
482
Such studies are valuable at investigating the impacts of new interventions, trends in behaviour or attitudes and can take the form of a series of cross-sectional surveys. The nature of the data allows for incidence rates to be calculated in exposed and unexposed groups, in addition to the direction of any association between the variables to be alluded to. If the study needs to define a sample population with a common characteristic (such as age or presence of a particular disease), then they will be known as a cohort, and if data is collected from inception, forward in time, it will be referred to as a prospective longitudinal study. Cohort designs These studies sample a cohort of the population who have undergone a shared common event (typically, birth) and are surveyed at selected intervals of time following the event. The sample groups are randomly selected each time. Panel designs This type of longitudinal survey samples a cross-section of the defined population and follows them up at more than one point in time, with changes in their variable status recorded over this defined time period. This continues until the study end-point is reached or until the sample size dwindles. These differ from cohort designs, as the composition of the group is the same each time. Trend designs A group is selected from the population that has undergone a shared experience for a limited time. At each data collection point, these surveys select a new sample population, preferably at random, allowing for wider sampling of the general population over the study time period, e.g., the surveying of race and sex entering medical school, repeated each year in order to establish a trend of entry. This type of survey is more commonly seen in market research and polling sectors in order to engage a broader opinion. It allows the epidemiologists to identify sample members with differing levels of exposure to a variable and calculate the incidence rates based on it. This also confers the ability to survey a dynamic population.
M. S. Rahman et al.
Difficulties of longitudinal surveys On the whole, these surveys are more expensive than their alternative counterparts in terms of resources, time and manpower. Much administrative support is required in order to continually update, code and reenter data in the light of sample population changes (e.g. deaths and tracing members), and better computing skill is needed in order to merge the data collected over differing time periods. Though there is a theoretical advantage, it is not always possible to demonstrate a causal relationship for a number of reasons, e.g., when there is a long time of onset of disease from exposure, or if causation is multifactorial or if confounding variables exist. There is also an argument that members of a sample population can become conditioned to the study, learning the responses that they believe are expected, remembering and repeating old answers or even becoming sensitised to a topic (Hawthorne effect). These studies require careful definitions of groups under investigation, careful selection of variables for measurement and frequent enough points of data collection in order to reduce the levels of recall bias. They also depend upon high response rates over the time course of the study. Bias can enter such a study design through selection bias if random sampling techniques are not utilised or more commonly, through sample attrition (e.g. loss of sample members through geographical mobility of sample members or refusals to participate over time). The timing of repeated data collection points is therefore crucially important and should have clear rationale behind them, e.g., when changes within the population are anticipated following an intervention. Cohort studies A cohort is a sample population who share a unifying characteristic. This population can be studied retrospectively at one point in time (retrospective cross- sectional cohort study) or over a period of time (prospective longitudinal cohort study) as discussed earlier. When longitudinal in nature, cohort studies need to maintain high response rates at each wave of the study in order to avoid sample bias. Cohorts can suffer from the “cohort effect” whereby each cohort will suffer from unique historic conditions (e.g. social deprivation, war and economic boom), which will affect the
36
Questionnaires, Surveys, Scales in Surgical Research: Concepts and Methodology
attitudes, beliefs and observations documented during the study time period.
483
their relative merits and drawbacks are then further discussed in detail later in this chapter. Advantages of surveys
36.4 Data Collection Methods in Surveys Data have traditionally been collected through the use of either interviews or self-completed questionnaires. The choice of which data to use depends largely on the overall question being asked, the population to which this is being asked and the monetary and time constraints placed upon the study group. Interviews can be face-to-face or more recently via other methods such as over the telephone, video or Internet links. These are conducted either as one-toone interviews where an interviewer poses the questions to the respondent in order to extract data, or as a part of a larger focus group. Interviewers will adopt either an open or closed structure to their questioning, influencing the nature of the data collected. The more open the interview format, the more discrete the data is, requiring greater qualitative analysis. Quantitative analyses can be performed easily on information gathered in response to closed questions asked. Questionnaires are usually self-completed and differ mainly in terms of their distribution, the commonest being a postal self-completion questionnaire, although increasingly the Internet is becoming a very popular method used to target audiences. Questionnaires appeal to larger studies requiring large sample sizes and essentially substitute the interviewer with a written form of data collection tool. The manner in which this is constructed is vital to ensuring whether relevant data are acquired from the participant. Similar rules to interview format apply here as well, and the development of questionnaires is further discussed in the next section.
36.5 Survey Advantages and Disadvantages The use of surveys as a method to engage and capture attitudes, beliefs and behaviours of a sample population has advantages over other forms of research in use in the healthcare domain, discussed in the subsequent paragraph. Particular methods of data collection and
Surveys are well suited to descriptive studies, e.g., investigating the number of people in a given population possessing a particular attribute/opinion. If all respondents have the same standardised questions put to them and are transparent in the way that data are collected and reported, they can significantly boost a high reliability rate. Hakim, in 1987 said, “the methods and procedures used can be made visible and accessible to other parties, so that implementation as well as the overall research design can be assessed” [15]. They form a relatively simple and straightforward approach to the study of attitudes, values, beliefs and motives and may be adapted to collect information from almost any human population. Disadvantages of surveys There remains a problem of “internal validity”, i.e., not obtaining vital information about the respondents and their thoughts and feelings. This occurs because the statements concerning attitudes are not correctly assessed in the pilot work to gauge the respondents’ opinion and can be improved using scaling. “External validity” is also a problem if the sample is faulty (e.g. not a high response rate), making it difficult to generalise the findings and extrapolate these to the general population. Surveys can seek to generalise member responses to what they actually provide, resulting in a lack of relation between attitude and behaviour [16]. Surveys are, as discussed earlier, able to determine the correlation between variables; however, it can sometimes be difficult to establish causation. The data are affected by the characteristics of the respondents, e.g., their memory, knowledge or experience and respondents may not necessarily report their beliefs or attitudes accurately, suffering from “social desirability bias” when people respond in a manner which they believe shows them off in a good light.
36.6 Error in Survey Research Bryman [5] suggests that error in survey research is broadly the result of four main contributing factors (Fig. 36.2).
484
M. S. Rahman et al.
Sampling error
Data collection
Error
Samplingrelated error
Data Processing Error
Fig. 36.2 Adapted from reference [5]
Sampling error results from bias in the selection process of individuals for the study. Methods to reduce this include random probability sampling. Sampling-related error is when significant differences between the selected sample group and the population arise from either poor sample techniques or a sampling frame, which is not comprehensive. It may also arise from non-response to survey data collection tools. Data collection error results from problems in the research process itself, whether this is the incorrect development of questionnaires, scales or interviews in extracting data from the sample units or poor application of the above mentioned tools. Data processing error arises from the inappropriate handling of the data once it has been collected. This may be from different stages in the management of data, from coding to statistical analysis. These areas of error in survey research will intrinsically affect the validity of results and conclusions drawn from a study and hence, their applicability to the general population. Recognising factors that will contribute to error at each stage can help reduce this and can provide the administrators of the study tangible error levels, which they may or may not happily accept.
36.7 Questionnaires Questionnaires are a form of data collection technique commonly used in different types of survey. They are used widely in social science research and form a powerful tool in collecting data about attitudes, beliefs and knowledge and have the ability to create data that can undergo quantitative analysis.
Questionnaires can be devised as structured or semistructured documents or indeed a mixture of both. Unstructured questionnaires usually take the form of an interview. Each has its own advantage over the other; whilst highly structured questionnaires can allow for the collection of unambiguous data and can easily be collated, counted or analysed, they do not always provide comprehensive response choices for the respondent. They assume that the understanding of particular wordings and orders of questions are interpreted equally by all those reading them; however, this may not be the case. They assume a certain amount of general knowledge and are poor at assessing deeply into attitudes, behaviours or social processes. Highly structured forms however do prove to be more economical when it comes to analysis and allow for large samples to be included.
36.7.1 Questionnaire Design Developing a questionnaire from scratch can be a timeconsuming and expensive process. The questionnaire must ask the right questions to the right people in the right way, in order to produce the answer to the question posed. Design of a questionnaire can be broken down into four broad stages: 1. 2. 3. 4.
Planning Layout Question form Piloting
Distribution is then an important feature to consider. While they can be completed face-to-face in interviewstyle questionnaires, the option of postal or more recently, Internet and email-based questionnaires reap significant rewards. Questionnaires are subject to a number of problems in terms of recouping data and non-response plays a major part in this. There are methods to account for this and ways in which to reduce the rate of non-response. Planning Many questionnaires have already been developed, tested and used previously. As a number of these questions and whole questionnaires are available, it is reasonable to utilise this valuable resource, avoiding the
36
Questionnaires, Surveys, Scales in Surgical Research: Concepts and Methodology
need for re-validation of such questions. These should be collated along with any scales that are to be used as part of the questionnaire. Clearly, though these may not cover the whole scope of the study, additional questions should be developed in order to address specific areas of the study. The questions should be related back to the overall study aims. Question banks comprise commonly used pre-validated survey questions, which can be utilised in developing the individual questionnaire. The Centre for Applied Social Surveys hosts a question bank of those questions commonly used in major surveys in the UK (http://qb.soc. surrey.ac.uk) as does the British Office for National Statistics (http://www.statistics.gov.uk/about/data/harmonisation/default.asp), displaying a set of online harmonised questions from its national social surveys. Strategies should be developed prior to use of the questionnaire to address issues of quality control. Undoubtedly, there may be poor compliance or data that are missing from questionnaires or even inaccurate data. It is vital to establish early on how the study will handle these should they happen. Questionnaire layout The questionnaire should be laid out clearly and printed professionally, all aiding the response rates. Lower case should be used wherever appropriate and it should contain the word “confidential” somewhere at its beginning. Each questionnaire should display an identification number of some sort to aid with the response analysis later on, if respondents need to be identified. The study title and a short introduction to the questions should also be displayed early on in the questionnaire so that respondents know exactly why and what for they are participating in the study. Instructions and a thank you should also be clearly included and any labelling to identify different sections be made obvious. Should filter questions be used, these also should be highlighted clearly to avoid confusion. It should be noted that the layout of the questionnaire itself can introduce bias, for instance, even the colour of the paper used to produce the questionnaire can influence the respondents. Market research companies will alter the colours of the products in order to influence the psychology of the consumer and this applies to the respondents of a questionnaire also. For instance, the colour green is associated with a “healthy lifestyle”, yellow with “optimism”, red with “physical stimulation” and blue often associated with “freshness” [3].
485
These steps are all vital to ensuring a high response rate. If too cluttered or poorly labelled, then the respondent is less likely to take his or her time to answer the vital questions. Question form This refers to the use of open or closed questions. The more open the questions are used, the less structure the questionnaire takes. Open questions are useful when the replies are unknown, unpredictable or particularly complex, whereas closed questions only allow for binary answers to be given. Open questions are particularly recommended for developing questionnaires; however, the actual answers given by the respondents can be degraded through the process of coding. Closed questions clearly offer the advantage of being relatively easy and cheap to analyse and collate; however, they may force particular responder’s replies. Question items This refers to the actual questions used to assess the respondent in terms of the overall question being asked. They are imperfect indices of attitudes or behaviour, as they seek to answer parts of the respondents’ feelings towards a situation. Collating the individual items can provide the overall attitude or behaviour that a respondent may exhibit. Responses are affected by the wording used, the interviewer’s agenda (leading to interviewer bias) and the respondent’s desirability to fit in with the social norms (social desirability bias, which can all lead to measurement error). Techniques can be used to reduce these biases and can involve the use of scales (discussed later) and testing each question item for validity against set measures. Question order The ordering of question items within a questionnaire can affect the way the questionnaire is completed. In fact, the order of questions may be poor such that fewer questionnaires are actually completed with the respondent being put off by initial question items. Thus, it is vital that a strategy is employed, which enables the study group to ensure that important questions are answered and that probing and personal questions are handled tactfully. Funnelling involves the use of broad questions to start with, eventually narrowing down to more specific questions. They can utilise filter questions, whereby
486
certain answers will allow a respondent to move to a different section, while others will continue on. The use of computer questionnaires allows for more complex filtering techniques to be used, which may be too confusing for self-administered questionnaires where no help is available. The specific order of questions is important, and, generally, the questionnaire should start with the basic before complex questions. It is also wise to avoid asking socio-demographic questions immediately at the start (unless for filtering purposes), as the respondent may find these questions objectionable, thereby losing rapport and the desire to complete the remaining questionnaire. It is suggested that the most important questions be asked nearer the start, such that if the questionnaire is not completed but still returned, at least the most vital data has been obtained. As a rule of thumb, it is advised that behaviour questions are asked before attitude questions so as not to bias the subsequent answer; e.g., if someone is asked whether or not they believe smoking is bad for one’s health and then asked if they smoke, they are much less likely to confirm that they do. General questions should be asked before specific questions about attitude or behaviour, as answers to more specific questions can influence the answers to subsequent general questions, thus, questions about “life in general”, for instance, should be asked before questions about “health” specifically; otherwise, when asking about “life in general” after “health”, respondents may discount “health”, as a factor as they may think that they have already answered about that aspect of life. Signposting is an important feature in the questionnaire, enabling the reader to understand the context of questions being asked and what they relate to. This is particularly important when using scales within questionnaires, such that the correct scale is applied to the correct question. This is important in self-administered questionnaires, as there is a lack of ability for the study group to intervene in the event of questions not being understood or filled in incorrectly. Being clear and precise reduces the risk of inappropriate answers and incomplete questionnaires. Question wording Simple words, wherever possible, should be used in order to reduce the inter-user variations of understanding. This helps to ensure a more global understanding of the
M. S. Rahman et al.
question being asked. It is important to avoid ambiguity; negatives and certainly double negatives should never be used. Questions should be kept short, and jargon should be avoided, using lay terms wherever possible. Leading questions such as “you don’t have difficulty with …?” should be avoided as these can push respondents in particular directions. In addition, loaded questions should also be avoided. This usually takes the form of questions, whereby an opinion is already expressed, e.g., “would you prefer to have your operation in the safety of a hi-tech academic surgical unit?” and can clearly affect the answer of the respondent. If a choice of answers is given, then a balanced choice should be offered to the respondent. “Very good” should be balanced by a “very bad”, and it is usual for a “middle ground” answer to be offered. Questions can be “framed” by attaching a fact to it. The way in which this is worded can alter a respondent’s choice. Framing effects therefore should be avoided as they can result in measurement bias. An illustration of a framing effect would be to ask whether or not someone would consider surgery, given the 90% survival rate or conversely the 10% mortality rate. Each of these statistics will influence the way in which a respondent chooses; though both are accurate, each paint a slightly different picture [24]. Sensitive or embarrassing questions are better left towards the end of the questionnaire as they can potentially diminish the rapport gained throughout the remaining questions. In this way, if completion is threatened, the remaining questionnaire can still be utilised. Questions regarding attitude are affected by social desirability bias, responders trying to fit into what they regard as “social norms”. They may also have never actually thought about the topic being asked; therefore, they may have to make a snap judgement that may not truly reflect their feelings on a subject. These questions should be posed in a manner that closely reflects the daily colloquial speech in order to obtain the most natural answers [26]. When asking about facts, it is important only to ask respondents who are likely to possess the information regarding those facts. It has been noted that people generally tend to over and understate particular facts, namely, height and weight, respectively. Social desirability bias once again plays a role, while a person’s “selective memory” in recalling events or recognising facts can lead to recall bias, both skewing responses. Questions are often posed regarding frequency of behaviour. This is an instance when loading questions
36
Questionnaires, Surveys, Scales in Surgical Research: Concepts and Methodology
can be of benefit, in order to gain a more truthful response. Allowing the respondent to believe that a particular behaviour is commonplace will lead to the respondent being more likely to admit to traits that they perceive to be undesirable. Piloting The questionnaire should be piloted among a small group of people in order to raise the issues prior to widespread dissemination. Discussions with a target group will aid in identifying poorly worded questions or misleading statements.
36.7.2 Distribution The choice of how to distribute the survey will be influenced by the overall study aims, the funds allocated for data collection and time constraints on the fieldwork phase. Traditionally, surveys have collected data through interviews, focus groups or questionnaires. Questionnaires are usually self-completed and distributed via postal means. Postal methods have their own advantages and disadvantages, discussed in more detail in the subsequent section. Data collection via telephone is also a viable alternative in the form of interviews. Increasingly, the Internet has played a major role in the widespread dissemination of information.
36.7.3 Response and Non-Response Questionnaires may not be completed or returned at all. This represents an aspect of sample attrition for surveys utilising questionnaires as their measuring tool. There may also be an element of bias introduced as, perhaps, the group of non-responders differs significantly in one or more ways to the group of responders; for instance, they may be sicker and therefore be less able to complete the questionnaires and respond. Single items within the questionnaire, which are not answered, may be neglected and the way in which these are dealt with can vary. Individual items can be excluded from the questionnaire or the respondent can be wholly excluded, depending on the pre-agreed rules for accepting a questionnaire into the study. The
487
question of validity of answers remains when nonresponse items are assigned an average value, though this is an option if less desirable. There are methods in practice, which can allow for the study group to deal with this. One such method is that of weighting, which involves a statistical weighting method applied to the group of responders to compensate for those who did not respond. A response rate of 85% is generally accepted as a “good” response rate [23] and tends to be higher for interviews than postal questionnaires [3]. There are particular techniques, which can be utilised to attempt to raise the response rate to postal and interview questionnaires. The first would involve the inclusion of a covering letter with postal questionnaires detailing the aims of the survey and how the particular respondent was selected. The letter should be personalised and it has been observed that people are more likely to respond if the letter is from a recognised body [6] rather than commercial organisations [19]. Incentives are a costly but effective way [11] of increasing response rates, though this will clearly be limited by the budget of the study. They can however also select-out people who receive higher levels of income, as they may not see the financial incentive as benefit enough to partake in the study. This in itself can create a selection bias. Mailed questionnaires should all be accompanied by a stamped-addressed-envelope as a form of incentive in order to promote return. Postal reminders for postal questionnaires and utilising wider distribution methods such as the Internet may all serve to aid in increasing the response rates. Keeping the questionnaire short, easy to comprehend and interesting will also aid this cause.
36.7.4 Advantages and Disadvantages of Questionnaires Advantages of questionnaires Questionnaires are primarily compared with interviews as a data collection tool in surveys. The relative merits of questionnaires therefore exist as the converse for interviews. These issues relate primarily to cost, time and bias. Questionnaires are generally cheaper to administer than interviews. Clearly, the lack of need for an interviewer will save the study group their money, similar
488
to the ability of postal- (or Internet-) distributed questionnaires widely disseminated across large geographical regions, in reducing the necessity for study participants to commute and potentially have funds reimbursed. Interview costs can however be offset through the use of telephone or the somewhat untested Internet interviews. Questionnaires are also generally quicker to administer. Mass distribution via postal techniques or Internet avenues allows for large numbers to be recruited into the studies faster. Interviews are clearly limited by the ability of the study group to supply interviewers who can actually only interview one at a time (unless a focus group system is used). Bias in collecting the data in interviews usually arises from the effect of having an interviewer present. While creating a Schrödinger effect by intrinsically altering the subject’s behaviour and possibly attitudes through a conditional process, the way in which the interviewer administers questions and extracts data also influences the data extracted. Though impractical in large studies, should only one interviewer be used, the varying effects of different people asking questions can be reduced; however, a different rapport will exist between the interviewer and each subject, naturally affecting the responses given. This effect is difficult to quantify. Using multiple interviewers will clearly raise the possibility of inter-interviewer variability, which can be tested to a degree. Self-completed questionnaires do not suffer from these problems. Convenience is finally another added bonus to using self-administered questionnaires when they are distributed to the respondent, avoiding the need to commute. Disadvantages of questionnaires “The great popularity with questionnaires is they provide a “quick fix” for research methodology. No single method has been so abused” [14]. Despite the advantages detailed earlier, interviews do confer several advantages over the use of self-completed questionnaires. Questionnaires cannot prompt or aid respondents during the completion stage if difficulty is encountered with a question. Thus, great attention must be paid to the development of each question, as mentioned earlier. When open-question techniques are used, the use of probing in interviews can further elicit respondent attitudes and beliefs. This is clearly not possible with the use of questionnaires, which by and large do not utilise
M. S. Rahman et al.
many open-ended questions for data collection purposes. The impersonal nature of questionnaires means respondents are less likely to complete questionnaires that they find irrelevant or uninteresting. Complex questionnaires again will prove a “turn-off” when it comes to completion, with intricate filtering and longwinded questions best being avoided. Questionnaires will not protect later questions from being read in advance, which may indeed affect responses to earlier questions. Interviews allow control of the pace and order of questions asked, which as discussed previously, can affect the way questions are answered. Taking things one-step further, it is not possible to ensure that the individual selected to take part in the study is indeed the person filling in the questionnaire. It is virtually impossible to control this and should be accepted as a potential source of error. They may not be fully completed, again something which cannot really be accounted for prior to the study, other than providing short, relevant questionnaires with perhaps incentives for completion. Respondent characteristics may prohibit them from correctly filling in the questionnaires, such as illiteracy or language barriers, which can be adapted in the interview scenarios. Thus, questionnaires on the whole tend to have lower response rates than interviews, as mentioned earlier.
36.8 Scales Scales form a tool by which an individual’s attributes or attitudes regarding particular topics can be measured. They usually provide a range on which the individual can plot their own feelings on a subject, usually quantitatively allowing summation and the ability to yield a score. Responses can be averaged, helping to eliminate individual item error. The summation of scores and calculation of averages can permit a more rigorous statistical analysis of the answers obtained. Scales can be thought of as comparative or noncomparative. In comparative scaling, items are directly compared against one another whereas non-comparative scaling is the converse and items are scaled independently. Measurement scales that try to gain insight into what people believe rather than testing them are known
36
Questionnaires, Surveys, Scales in Surgical Research: Concepts and Methodology
as attitude measurement scales. These are of particular importance in the surgical context when trying to evaluate the thoughts of patients, for instance, with respect to patient satisfaction.
36.8.1 Attitude Measurement Scales A valuable source primarily used to utilise existing British and American tests is the Buros Mental Measurement Yearbooks [35]. These yearbooks are republished every 5 years and provide all information needed including reliability and validity of the test. The test and the manual can be requested if deemed suitable for your study. Commonly employed attitude measurement scales include the Thurstone, Likert, Guttman and semanticdifferential, while many other scales exist within the realms of research. Thurstone scale Originally developed to measure attitudes towards religion, this non-comparative scaling technique treats attitude as a continuum, asking users to place responses to a question on a scale from “complete endorsement” of a concept to “complete opposition”. The scale is developed by taking a wide range of statements collated from literature, expert opinion or from questioning the relevant population to obtain a spread of views. Roughly 20–40 such statements are finally used to produce the scale. Numerical values are attributed to each statement by utilising a panel of “expert judges” (usually at least 300 people) who each attach their own numerical value. The level of inter-judge consensus is calculated, with poor concordance statements being discarded. An average score is calculated for each individual statement, which is then used as the score for that statement on the scale. High values usually indicate positive responses (i.e. endorsement of the concept). This scale can clearly be time-consuming to create as well as bear a significant cost-load to the study group who would need to employ the panel of judges. Likert scale It is the most popular scaling technique among sociologists, and this non-comparative method bears resemblance to the Thurstone model in that an initial pool of
489
statements are generated and edited. However, it differs as it does not assume that the intervals between each response are constant, and no judges are used to decipher the scores. Thus, a pool of items is gathered from relevant individuals relating to the issue under investigation. Items should reflect both a positive and a negative stance to the issue and should be equal in number. A response categorisation system should then be decided upon; the most common is to have five fixed alternate expressions, e.g., “strongly agree”, “agree”, “undecided”, “disagree” or “strongly disagree” and subsequently, a weight of 1–5 is applied to the response. A large number of representative respondents are then asked to check their attitudes to the list of statements, which should be in random order with positive and negative statements intermingling. A total score for each respondent is then obtained and by summing the value of each of the responses, an overall score can be calculated. The respondents can then be ranked according to the score obtained. Items are selected for the final scale using “item analysis”. Each statement is subjected to a measurement of its discriminative power (DP). DP is the ability to discriminate between the responses of the top 25% (upper quartile) and the bottom 25% (lower quartile). Those with the highest DP indices are chosen for the final scale consisting typically of 20–30 items. The reliability and validity of scales can be assessed using the methods described in Rust and Golombok [30]. Guttman (cumulated) scale This scale is based on the premise of hierarchical techniques, whereby it is assumed that when statements are placed into hierarchical rank order, agreement with one statement implicitly implies agreement with all statements below this rank in the hierarchy. This is a form of a comparative scaling technique. An individual’s attitude score towards a particular concept is based upon the highest rank the individual is willing to endorse. An example of the Guttman scale in use can be found in the Bogardus social distance scale [1] (Fig. 36.3). Each item has a cumulative property and each item is ordered such that accepting item number 3 implies that both items 1 and 2 are also accepted. This scale is popular due to its simplicity and is used when unidimensional items can be assessed without difficulty.
490
M. S. Rahman et al. Views of O&G Trainees
(Least extreme) 1. Are you willing to undergo an operative procedure in the UK? 2. Are you willing to undergo an operation in the NHS? 3. Are you willing to attend the local hospital for an operation? 4. Are you willing to have an operation as part of an academic trial at your local hospital? (Most extreme)
Fig. 36.3 Example of a Guttman scale
1
2
3
4
5
6
7
unnecessary for career challenging
36.8.2 Semantic Differential Scales Developed in 1957 [28], this popular scale utilises a rather different technique. Instead of assessing a respondent’s belief of a subject, as used by other scales, it focuses on the subjective meaning of the concept to the respondent. Thus, the ratings are scored along a series of bipolar rating scales such as “good/bad”. These ratings then fall into three underlying dimensions – activity, evaluation and perceived potency. 1. Activity assesses the extent to which the concept is associated with action, e.g., “fast/slow”. 2. Evaluation assesses the overall positive meaning associated with it, e.g., “good/bad”, “happy/sad”. 3. Perceived potency gauges the overall strength and importance of the concept, e.g., “strong/weak”, “easy/hard”. As a result, a list of adjectives to describe the concept can be created. Respondents rate each of the adjective pair on a scale of 1–7. The ratings are then totalled up. An average rating is then calculated and comparisons made and evaluated. Alternatively, generic lists can be used, sources of being the book by Valois and Godin [36] (Fig. 36.4).
easy
expensive
cheap
enjoyable
boring
time consuming
More details of this scale can be found in Dawes and Smith [9], Scott [31] and Lemon [20]. The scale is developed by first selecting a large number of relevant statements. A standardised group that answer the statements in an “agree/disagree” fashion then assesses these statements. A scalogram of the responses is made with analysis of the results, and the scale is then applied to the respondents. The total number of agreements or disagreements to the statement measures the attitude.
vital for career
time efficient
Semantic profiles for trainees with ( ) and without ( ) basic surgical skills training
Fig. 36.4 An example of a semantic differential scale
36.8.3 Other Scaling Techniques Many other scaling techniques exist within the realms of measuring attitudes. “Q-sorts” is a technique, which measures the relative position of an individual or concept and is most commonly used with small numbers as the analysis can become very difficult with large numbers. “Sociometric scales” are simple techniques used to describe relationships between individuals in a group. It varies in complexity but essentially requires members of the group to make choices regarding other members of a group, such as whom they can communicate effectively with. Many tests over the last 20 years have become computer-based, adding the advantage of being able to carry out both the administration and the analysis of the test, one of the first of which was described by French [12]. An excellent guide to creating your own test has been produced by Rust and Golombok [30]. Alternatively, an existing test can be further developed and altered to suit the aims of the study’s specific needs once permission from the copyright holder has been obtained. This, however, carries the risk of losing the validity established for an existing scale, which may no longer be applicable to a modified scale.
36.9 Validation A valid questionnaire measures what it claims to measure [4]. Importantly, it is not the actual measure that is valid or invalid, but the use to which the measure is
36
Questionnaires, Surveys, Scales in Surgical Research: Concepts and Methodology
put. Surveys of patient experiences have become a common approach to monitoring and improving quality of healthcare, but have been criticised for the lack of valid and reliable instruments [33].&$$$; Validity can be applied to results of a study, overall, in terms of how relevant it is to the general population and is referred to as external validity. Internal validity examines the specific questions asked and whether or not these accurately test the individual to extract the relevant information as posed by the aims of the study. Valid patient-experience questionnaires are now increasingly being developed. Important questions that are now routinely asked using such validated questionnaires, include those on patients satisfaction with the services [17, 22] and health-related quality of life [21, 37]. Cronbach’s alpha (a) [8] has an important use as an unbiased measure of the reliability of a psychometric instrument, if certain criteria are met [22]. Cronbach’s alpha will generally increase when the correlations between the items increase. Hence, the coefficient is also called the internal consistency or the internal consistency reliability of the test. Validity can be assessed by the following three methods:
491
concept. This critically assesses whether or not the questions asked are actually assessing the overall concept being investigated. For instance, should a study be looking into the “post-operative patient satisfaction” but asks questions and collects data regarding postoperative pain, while the study may establish very accurately post-operative pain rates, it does not accurately account for other facets of post-operative care (e.g. time to discharge, regular doctor-review, followup). Thus, a study depends upon the definition of the concept it is designed to test. Construct validity This refers to how well a scale conforms to a theoretical construct created prior to the scale was undertaken.
36.10 Factor Analysis
Comparisons of how individuals respond to the new measure of a concept are made with existing wellaccepted measures. Problems posed by this method of verifying validity occur when low correlations with the existing method occur. Although this may well confer a low level of validity to the scale being evaluated, it may actually mean that the existing method is indeed the invalid one, while the new method is a far better measure. In some circumstances, there may be no wellestablished scale against which the new scale can be tested. Triangulation is the name given to the method that allows the study to be validated against external tests by using numerous different methodologies to investigate a single hypothesis. The methodologies used must have different weaknesses in order to prove useful.
Factor analysis, originally developed in psychometrics by the psychologist, Charles Spearman, is a statistical technique used to reduce a larger number of observed variables into a smaller number of unobserved random variables called factors. These identified “factors” explain a variety of results from different tests. Factor analysis assumes that the reduction is possible because the attributes are related. This method is popular in behavioural and social science research as it allows large quantities of data to be extrapolated and then processed. However, it is very different from other statistical methods that are traditionally used to study the relation between independent and dependent variables. Factor analysis is used to study the patterns of relationship among many dependent variables, with the goal of discovering something about the nature of the independent variables (factors) that affect them, even if those independent variables are not measured directly. It helps to identify groups of inter-related variables to allow assessment of their relation to each other. The most famous use of this was to assess human cognitive abilities by an American psychologist, Carroll, who, by assessing the relationship between auditory, visual and general intelligence, developed the Three Stratum Theory of intelligence [7].
Content validity
Factor analysis in use
This emphasises the extent to which individual indicators measure the different aspects of a particular
Once research surveys are carried out, the results are coded and can be input in to a statistical programme
1. Criterion 2. Content 3. Construct Criterion validity
492
such as SPSS, SYSTAT and analysed using factor analysis. This will result in the programme yielding a number of underlying factors that explain the data. The statistical algorithm deconstructs the raw data into its various components and reconstructs the partial scores into underlying factor scores. The degree of correlation between the initial raw score and the final factor score is called factor loading. “Error” is included in the analysis to account for individual variation. Thus, when constructing a survey or utilising a questionnaire or scale, a number of questions will be developed, all with the aim of extracting from a sample population an overall attitude or belief by utilising the tools outlined earlier. This overall attitude arises as a sum of a number of individual opinions on aspects of a topic, otherwise termed factors, which influence a final opinion. Factor analysis allows the identification of an overall vector that a set of questions take and can reduce the number of questions while maintaining the overall direction that the answers will take. Thus, questions essentially asking the same thing can be identified and subsequently whittled down to one. It is important to bear in mind that the accuracy of factor analysis is very much dependent on the validity of the data it is modelling and that more than one interpretation can be made from the same data factored in the same way. Additionally, factor analysis cannot identify causality. Types of factor analysis Factor analysis can either be exploratory or confirmatory: 1. Exploratory factor analysis is the most common form of factor analysis. With no prior theory, it tries to uncover the underlying relationship of a relatively large set of variables. It uses the principal component analysis, where the total variance is considered. This is the most common type of factor analysis and is generally used when the research purpose is data reduction (to reduce the information in many measured variables into a smaller set of components). 2. Confirmatory factor analysis (CFA) seeks to determine if the factors and the variables conform to what is expected on the basis of pre-established theory. CFA is also known as principal axis factoring and principal factor analysis, where common variance between the factors is considered. CFA is
M. S. Rahman et al.
preferred for use in modelling, and it is commonly used in structural equation modelling (SEM) packages such as AMOS where alternative factor models are analysed. The various aspects of CFA are further discussed in the following section.
36.11 Structural Equation Modelling SEM is most commonly used to model causal relationships among latent variables and can also be used to explore CFA measurement models. It serves a purpose similar to multiple regression, taking into account the modelling of interactions, non-linear relationships, correlated independents, measurement error and one or more latent dependents (each of which can also have multiple indicators). The structural equation modelling process has two steps: 1. Validation of the measurement model using CFA. 2. Fitting the structural model accomplished primarily through path analysis with latent variables (variables that are not measured directly, but are estimated in the model from measured variables that are assumed to “tap into” the latent variables). A model is specified on the basis of a theory. Each variable in the model is conceptualised as a latent one, measured by multiple indicators. Several indicators are developed for each model, with a view to winding up with at least three per latent variable after CFA. Based on a large (n > 100) representative sample, factor analysis is used to establish whether or not indicators seem to measure the corresponding latent variables, represented by the factors. The researcher proceeds only when the measurement model has been validated. Two or more alternative models are then compared in terms of “model fit”, which measures the extent to which the co-variances predicted by the model correspond to the observed co-variances in the data. LISREL, AMOS (both SPSS packages) and EQS are three popular statistical packages used for SEM. Analysis of moment structures (AMOS), a package developed recently, has become popular as an easier way of specifying structural models because of its user-friendly graphical interface.
36
Questionnaires, Surveys, Scales in Surgical Research: Concepts and Methodology
Fig. 36.5 An example of a structural equation model (adapted from the North Carolina State University syllabus on Qualitative Research [13])
e e e
p1 p2 p3
e e e
q1 q2 q3
e e e e e e e e e
p1q1 p1q2 p1q3 p2q1 p2q2 p2q3 p3q1 p3q2 p3q3
An example of SEM can be represented diagrammatically to include the latent variables and their relationships with each other, the multiple indicators and the associated errors. The figure below outlines a SEM for two independent variables, each measured by three indicators (p1–3) and their interactions, three indicators multiplied by three indicators giving nine interactions (p1q1–p3q3) as the cause of one dependent, measured by three indicators (d1–3) (Fig. 36.5).
InDependent variable 1
493
Dependent variable
d1 d2 d3
e e e
InDependent variable 2
Error Interactions blue: observed indicators green: latent independent variables red: indicator error and residual error brown: latent dependent variables ( here 1) straight arrows: caustion; curved arrows: correlation
36.13 Future Research
36.12 Research Approval
With the growing availability of and interest in information technology, priority should be given to comparative studies of traditional vs. computer-assisted approaches. For example, comparison of traditional keyboard entry for computer-assisted questionnaires and touch screen. More pertinent is the study of webbased delivery of questionnaires. Issues that need to be resolved include how to define and determine the underlying population and how to control for the same individual submitting multiple questionnaires.
A research study on NHS patients or staff in the UK must fulfil certain criteria:
References
1. It must be formally approved by the responsible person in an organisation registered with the Department of Health such as a university 2. The research must obey the Data Protection Act and be collected onto the data protection files of the organisation. 3. Must be in accordance with the research governance framework [10] 4. Must be approved by the research ethics committee It is important that further regulations of the department such as gaining approval from the supervisor is sought if a questionnaire study is part of an academic course such as a dissertation for a PhD.
1. Babbie E (2003) The practice of social research, 10th edn. Thomson/Wadsworth, Belmont 2. Booth C (1902) Life and labour of the people in London. Macmillan, London New York 3. Bowling A (2002) Research methods in health: investigating health and health services. 2nd edn. Open University Press, Buckingham 4. Boynton PM, Greenhalgh T (2004) Selecting, designing, and developing your questionnaire. BMJ 328(7451): 1312–1315 5. Bryman A (2001) Social research methods. Oxford University Press, Oxford 6. Campanelli P (1995) Minimising non-response before it happens: what can be done. Survey Meth Bul 37:35–37 7. Carroll JB (1993) Human cognitive abilities: a survey of factor-analytic studies. Cambridge University Press, Cambridge 8. Cronbach LJ (1951) Coefficient alpha and the internal structure of tests. Psychometrika 16(3):297–334
494 9. Dawes RM, Smith TL (1985) Attitude and opinion measurement. In: Lindzey G, Aronson E (eds) The handbook of social psychology: the individual in a social context. Random House, New York 10. Department of Health (2002) Research governance framework for health and social care. Stationary Office, London 11. Frankfort-Nachmias C, Nachmias D (1996) Research methods in the social sciences, 5th edn. Arnold, London 12. French C (1990) Computer assisted assessment. In: Beech JR, Harding L (eds) Testing people: a practical guide to psychometrics. NFER-Nelson, Windsor, pp 164 13. Garson GD (2006). Syllabus for PA 765: quantitative research in public administration. Available from http:// www2.chass.ncsu.edu/garson/pa765/pa765syl.htm. Accessed 25 October 2007 14. Gillham B (2000) Developing a questionnaire. Continuum, London 15. Hakim C (1987) Research design: strategies and choices in the design of social research. Allen & Unwin, London 16. Hanson DJ (1980) Relationship between methods and judges in attitude behaviour research. Psychology 17:11–13 17. Howie JG et al. (1998) A comparison of a patient enablement instrument (PEI) against two established satisfaction scales as an outcome measure of primary care consultations. Fam Pract 15(2):165–171 18. Howitt D and Cramer D (2000) First steps in research and statistics: a practical workbook for psychology students. Routledge, London 19. Klepacz A (1991) Activity and health survey; possible effects on response of different versions of the same advance letter. Survey Methods Bul 29:29–32 20. Lemon N (1973) Attitudes and their measurement. Batsford, London 21. Lohr KN (2002) Assessing health status and quality-of-life instruments: attributes and review criteria. Qual Life Res 11(3):193–205 22. Lord FM, Novick MR (1968) Statistical theories of mental test scores, Addison-Wesley, Reading 23. Mangione TW (1995) Mail surveys: improving the quality. Sage, Thousand Oaks 24. McNeil BJ et al (1982) On the elicitation of preferences for alternative therapies. N Engl J Med 306(21):1259–1262 25. Morgan D (1993) Successful focus groups: advancing the state of the art. Sage Publications, Newbury Park 26. Moser CA, Kalton G (1985) Survey methods in social investigation, 2nd edn. Gower, Aldershot 27. NHS Health Survey Advice Centre: Sample Size 1995, Office of Population Censuses and Survey: London 3–5 28. Osgood CE, Suci GJ, Tannenbaum PH (1957) The measurement of meaning. University of Illinois Press, Urbana
M. S. Rahman et al. 29. Robson C (1994) Real world research: a resource for social scientists and practitioner-researchers. Blackwell, Oxford 30. Rust J, Golombok S (1989) Modern psychometrics: the science of psychological assessment. Routledge, London pp 164–168 31. Scott WA (1968) Attitude measurement. In: Lindzey G, Aronson E (eds) The handbook of social psychology, vol 1: historical introduction – systematic positions. AddisonWesley, Reading, pp xv; 653 p 32. Shortell SM (1999) The emergence of qualitative methods in health services research. Health Serv Res 34(5 Pt 2):1083–1090 33. Sitzia J (1999) How valid and reliable are patient satisfaction data? An analysis of 195 studies. Int J Qual Health Care 11(4):319–328 34. Sofaer S (2002) Qualitative research methods. Int J Qual Health Care 14(4):329–336 35. Spies RA et al. (2005) The sixteenth mental measurements yearbook. Buros Institute of Mental Measurements, University of Nebraska-Lincoln, Lincoln 36. Valois P, Godin G (1991) The importance of selecting appropriate adjective pairs for measuring attitude based on the semantic differential method. Qual Quant (25):57–68 37. Van Hook, MP, Berkman, B, and Dunkle, R (1996) Assessment tools for general health care settings: PRIME-MD, OARS, and SF-36. Primary care evaluation of mental health disorders. Older Americans Resources and Services Questionnaire; Short Form-36. Health Soc Work 21(3):230–234
Further Reading 1. Attitude measurement scales: Buros mental measurement yearbooks (Buros, 1978) [35]2. Department of Health (2002) Research governance framework for health and social care. Stationery Office, London 3. Download AMOS™ (Student Version) for free: (http://amosdevelopment.com/download/) 4. Office for National Statistics (http://www.statistics.gov.uk/ about/data/harmonisation/default.asp) 5. Structural equation modelling in AMOS: http://www2. chass.ncsu.edu/garson/pa765/structur.htm#AMOSmeasure 6. The Centre for Applied Social Surveys (CASS) (http:// qb.soc.surrey.ac.uk/) 7. Validity and reliablity of scales can be assessed in the methods section of Rust J, Golombok S (1989) Modern psychometrics: the science of psychometric assessment. Routledge, London, pp 165–168
How to Perform Analysis of Survival Data in Surgery
37
Fotios Siannis
Contents 37.1
Introduction ............................................................ 495
37.2
Features of Survival Data ...................................... 496
37.2.1 Timescale ................................................................. 496 37.2.2 Types of Censoring................................................... 496 37.3
Standard Survival Analysis ................................... 498
Abstract The aim of this chapter is to provide an introduction to survival analysis. Terminology together with basic ideas and standard methodology of how to analyse survival data will be presented. Finally, further and more advanced issues regarding the survival analysis will be briefly discussed, serving as an introduction of these areas to the reader.
37.3.1 Basic Quantities ....................................................... 498 37.4
Methods for Analysing Survival Data .................. 498
37.4.1 Non-Parametric ........................................................ 498 37.4.2 Semi-Parametric ....................................................... 500 37.4.3 Parametric................................................................. 501 37.5
Further Issues in Survival Analysis ...................... 502
37.5.1 37.5.2 37.5.3 37.5.4 37.5.5
Competing Risks ...................................................... Time-Dependent Covariates ..................................... Missing Data ............................................................ Long-Term Survivor Models .................................... Meta-Analysis ..........................................................
502 503 504 505 505
References ........................................................................... 506 Further Reading ................................................................. 506
F. Siannis Department of Mathematics, University of Athens, Panepistemiopolis, Athens 15784, Greece e-mail: [email protected]
37.1 Introduction The analysis of survival or time-to-event data is an important as well as heavily explored area of statistics. Statistical tools have been developed to address efficiently most of the standard problems that have risen in this area. Nevertheless, the particular nature of the survival data together with the need to investigate more complicated problems form an interesting and active area of research. Survival data arise when subjects, e.g. patients, are being followed-up for a certain period of time in order to observe when a specific event of interest occurs. This event is commonly called failure, and the time to the event is called failure time. For example, if the actual survival experience of patients with metastatic breast cancer is of interest, then the time of death since diagnosis of metastasis is the event of interest. In this case, we literally measure the survival time of the patients from a well-defined starting point, the time of diagnosis of metastasis, till death. However, the type of failure of interest might well be of an entirely different nature than the time of death. The time to disease progression, the onset or disappearance of side effects, the end of hospitalisation and time of transplantation can all serve as “failures” in a survival analysis context. The aim of this chapter is to provide an introduction to survival analysis. Terminology together with basic
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_37, © Springer-Verlag Berlin Heidelberg 2010
495
496
ideas and standard methodology of how to analyse survival data will be presented. Finally, further and more advanced issues regarding survival analysis will be briefly discussed, serving as an introduction of these areas to the reader.
37.2 Features of Survival Data There are issues that render time-to-event data to be different from other types of data. The varying lengths of follow-up in each study, the time points on which the final results focus (e.g. 3-year or 5-year survival), the presence of censoring and the fact that censoring might actually include some information about the outcome of interest are some of the issues that add to the complexity. Therefore, before we proceed any further, it is essential that features that characterise survival data are introduced.
37.2.1 Timescale It is important that the time origin is precisely defined for every individual. It is also desirable that, subject to any individual characteristics, patients should be as comparable as possible at their time origin. In randomised clinical trials, the date of randomisation satisfies both conditions and serves as a well-defined time origin. Although it might be more reasonable to consider the time at which a patient’s clinical condition met some certain characteristics as a possible starting point, the difficulty of determining and the possibility of bias would exclude their use as time origins. The time origin need not be the same for every individual. In a study, not all patients are recruited at exactly the same time. They are usually accrued over a period of several months or even years, and they are followed-up until the event of interest occurs or they reach a predetermined point in time that marks the end of study. The calendar time period that the patients spend in the study is known as the study time. On the other hand, for a patient, the trial begins at some point, say t0. The period of time that the patient spends in the study, measured from t0, is known as patient time. Therefore, the collected data usually consist of the date the patient enters the study and the date the event of interest occurs or the patient was last seen. Then, the
F. Siannis
survival times in days, weeks or years can be easily calculated. In Fig. 37.1, an example of a 10-year study with an initial 4-year recruitment period is being presented, wherein the study and patient times of a number of individuals are clearly presented. There are cases, however, that the time at which a patient enters the study is not the most appropriate time origin. For example, in epidemiological studies of the effects on mortality of occupational exposure to agents such as asbestos, the natural measure of time is age, as this is a strong determinant of mortality [2]. However, we are able to observe each individual only after they have started working in a job that involves exposure to asbestos. These observations are called left-truncated, and special methods are needed for the analysis of such data. In any case, one reason for the choice of timescale is direct meaningfulness. A second consideration is that two identically treated individuals should be at a similar state after the lapse of equal “times”, with other things being equal. If two or more different ways of measuring time are available, it is possible, having selected the most appropriate timescale, to use the remaining ones as explanatory variables.
37.2.2 Types of Censoring It is desirable to be able to observe the event of interest in all participating subjects, however, this is not always feasible. When survival times are not fully observed, this is called censoring. Specific types of censoring depend on the nature of the observational process; whether subjects are being followed-up continuously or at specific (regular or not) points in time. Furthermore, other types of observations also exist, some of which might be complicated [11]. There are three possible reasons why the exact failure time might not be observed, leading to three different types of censored observations. The first type, termed left censoring, occurs when the actual failure time occurs at a time prior to the initiation of the subject’s observational period. To see that, consider the case in which arthritis patients register to a specialised clinic. As most of the patients register after the onset of the disease, the time to arthritis is left censored for these patients. Clearly, in this case, the event of interest is not terminal; hence, the follow-up process can continue after the occurrence of the event. The second type of censoring is right censoring. This occurs when a subject has not
37
How to Perform Analysis of Survival Data in Surgery
497
Fig. 37.1 Time spent in the study
xD Patients
xD C C xD C
Start Recruiting
End of Recruitment
End of Study
Study Time (calender time)
xD xD Patients
C C xD C
0
120
Consider a ten-year study with a four-year recruitment period. The study time measures the calender time (top graph) while the patient time measures the actual time the patient spends in the study (bottom graph). Symbols (X) and ( ) indicate a failure and a censored observation respectively, while ( ) indicates the time origin of each patient.
experienced the event of interest by the time of the last known survival time. This means that the failure time exceeds the last observed time. This is by far the most common type of censoring, and in the remaining part of this chapter, reference to censoring will indicate right censoring. Finally, the last type of censoring is interval censoring that appears when subjects experience the event of interest in an interval of time. This is common when we follow-up patients at various points in time, and the event of interest may happen at a time between the visits. For example, smokers are followed-up at regular time intervals after quitting smoking, in order to fill in a questionnaire and record whether they are smokefree or not. The event of interest is to observe the participants to start smoking again, identified as the failure in this particular study. Hence, if someone is observed to smoke at a particular visit while was smoke-free in the previous one, then the actual failure time has occurred in the time interval between the two subsequent visits, leading to an interval censored observation. In Fig. 37.2a, we have a set of observations when subjects are under continuous follow-up in the interval
EX D
X
C B
X
A
X
t0
t1 (a)
X
F
t0
tL
tR
t1
(b)
Fig. 37.2 Types of censored observations (a) and interval (b) follow-up
[t0, t1]. The solid line represents the period of risk for each subject, while (×) indicates the occurrence of the event of interest, and (o) indicates the occurrence of an
498
event other than the event of interest (censoring). The risk period of subjects A, B and C initiates within the follow-up period, whilst this is not true for subjects D and E. Subject A fails within the observation period, while for subject B, the event of interest occurs after the follow-up is terminated at time t1. As a result, the observation for subject B is right-censored at t1, leading to a type of censoring that is called end-of-study or administration censoring. Subject C is also right-censored, but this is because an event other than the event of interest has occurred. This is a typical right-censored observation, which may also be called lost to follow-up censoring when we need to distinguish it from the end-of-study censoring. From the remaining observations, subject D is left-truncated, as the event that counts as the starting point for the risk period of D occurred before t0, while subject E represents a left-censored observation. Finally, in Fig. 37.2b,, we observe subject F whose failure time is interval-censored in the [tL, tR] time interval, where tL and tR are the times of regular visits just before and after the event of interest.
37.3 Standard Survival Analysis
F. Siannis
S(t), describes the probability of a subject surviving beyond time t. If T is the time to some specified event, then S(t) is defined as S (t ) = Pr(T > t ). Note that the survival is a non-increasing function with the value of 1 at the origin and 0 at infinity. If f(t) and F(t) are the probability density and cumulative distribution functions, respectively, then S(t) = 1 − F(t). If T is a continuous random variable, we can write f (t ) = -
dS (t ) . dT
Second, the hazard function, l(t), is a fundamental quantity in survival analysis, defined by λ (t ) = lim
Dt ® 0
Pr[t £ T < t + Dt | T ³ t ] . Dt
(37.2)
Its interpretation is that l(t)Dt can be seen as the “approximate” probability of a subject failing in the next instance, given that it has survived till time t. A related quantity is the integrated or cumulative hazard function L(t), defined as t
L (t ) = ò l (u )du The analysis of survival data has been extensively investigated and tools to address the most common statistical problems are available [1, 7 ,9]. These tools can be divided into three broad categories: (i) non-parametric, (ii) semi-parametric and (iii) parametric approaches, according to the extent to which they make use of parametric assumptions. Parametric approaches are easily conceived and they benefit from the use of the wellknown structure of parametric distributions. On the other hand, two of the most popular methods for analysing survival data are the semi-parametric relative risk regression approach, widely known as Cox’s regression model, and the Kaplan–Meier non-parametric estimate of the survival curve.
(37.1)
(37.3)
0
for continuous T. Hence, as S(t) = e−L(t) and making use of (37.1), we can write f (t ) d ln S (t ) dL (t ) , l (t ) = == (37.4) S (t ) dT dT an expression that describes the relationship between the most important functions in survival analysis.
37.4 Methods for Analysing Survival Data 37.4.1 Non-Parametric
37.3.1 Basic Quantities Before we proceed with the description of methods for data analysis, it is necessary to introduce two functions that are essential for the description of time-to-event phenomena. First, the survival or survivor function,
In this section, we will discuss how non-parametric or distribution-free methods can be used to find estimates for the survival or cumulative hazard functions. Although the interest of the researcher may lie in just finding the estimates for these functions, these procedures serve as a useful first step to any intended
37
How to Perform Analysis of Survival Data in Surgery
499
analysis, as they simply present the data with no further modelling assumptions. The survival function S(t) can be estimated by the empirical survival function
F
C
F F F
C
C
F F
Number of subjects surving beyond time t , (37.5) S% (t ) = Total number of subjects in the data set
t(0)
which is equal to unity for values of t before the first ~ failure and 0 after the last failure time. The value of S (t) is constant between two adjacent failure times, and hence, a plot will produce a step function. Censored observations are not allowed in the above-mentioned calculations, since the information from observations censored before t cannot be used in estimating the survival function at time t. The life-table or actuarial estimate of the survival curve is one of the methods that adjust for the presence of censoring. The idea is that the whole period of observation can be divided into a series of time intervals
Fig. 37.3 Definition of intervals used in the Kaplan–Meier estimate
t'1 < t'2 < ... < t'k, that although need not be of equal length, remain equal most of the times. Suppose that we have k such intervals and in the jth of those intervals tj' to t'j+1, j = 1, 2, ..., k, we observe dj failures and cj censored observations, respectively, whilst nj is the number of subjects alive and hence, at risk of failing at the beginning of the interval, also know as the risk set. The average number of subjects at risk in the jth interval is then equal to n ¢j = n j - c j 2. This is because it is assumed that censoring occurs uniformly in the time intervals, also known as the actuarial assumption. The probability of failure at the jth interval is d j n ¢j , so that the corresponding survival probability is (n ¢j - d j ) n ¢j . Hence, if we would like to compute the probability that a subject survives beyond the start of interval r, then this subject will have to survive all r − 1 preceding intervals and hence, the life-table estimate of the survival is given by
t(1)
t(2)
t(3)
t(1) < t(2) < ... < t(k), arranged in ascending order. Under this structure, each interval starts at an observed failure time, except the first one that starts at time origin and includes no failures. It is possible that more than one failure may be observed in an interval if subjects are observed to fail at the same time. This is something common if time is considered to be discrete (e.g. measured in days), while this is assumed infeasible in continuous time. See Fig. 37.3 for an example in discrete time. Censored observations do not contribute to this process, apart from the definition of the risk set. Hence, if nj is the number of subjects at risk right before time t(j) and dj is the number of failures at that particular time, then the probability of failure is dj /nj and the corresponding probability of survival is (n j - d j ) n j . Therefore, for a subject to survive the time interval from t(r) to t(r + 1), he or she also has to survive all the preceding intervals. This can be expressed as r æ dj ö Sˆ (t ) = Õ ç 1 - ÷ , nj ø j =1 è
for t(r) ≤ t < t(r + 1),, r = 1, 2,…k. Survival in the interval prior t(1) is unity while we take t(k + 1) = ∞. Alternatively, the cumulative hazard function can be obtained directly by r d j ˆ (t ) = L ån , (37.8) j =1
r *
S (t ) = Õ j =1
n ¢j - d j n ¢j
æ dj ö = Õ ç1 - ÷ , n ¢j ø j =1 è
(37.7)
j
r
(37.6)
for tr ≤ t < tr + 1,, r = 1, 2,…k. Clearly, the probability of survival prior to t'1 is equal to unity, while the probability of survival after t k¢+1 is 0. Similarly, we can obtain the product-limit or Kaplan–Meier estimate of the survival function [8]. The difference is that the time intervals are now constructed based on the observed survival times
which is known as the Neslon–Aalen estimator [10]. For example, consider the survival data from 48 multiple myeloma patients, presented in Table 37.1 [1]. The symbol (*) indicates death, while the remaining patients were censored observations. Figure 37.4 presents the estimates of the survival curve based on both the Kaplan–Meier and the life-table methods. They look similar, although the estimated survival probability for the first interval, based on the life-table
500
F. Siannis
Table 37.1 Survival times of multiple myeloma patients 52
6*
40*
10*
7
66*
10
10*
14*
16*
4*
65*
5*
11
10*
15
5*
76
56
88*
24*
51*
4*
40
8*
18*
5*
16*
50*
40*
1*
36*
5*
10*
91*
18
1*
18
6*
1*
23*
15*
18*
12
12*
17*
3
0.6
Kaplan-Meier Life-table
0.0
0.2
0.4
Survival
0.8
1.0
13*
0
10
20
30
40
50
60
70
Time
Fig. 37.4 Kaplan–Meier and life-table estimates of the survival function for the multiple myeloma data
method, is quite below the value of 1. This is due to the number of deaths observed in that interval. This method is well suited when the actual death times are unknown and the only information that is available is the number of deaths and censored observations, which occur in a number of consecutive intervals. However, when the actual times are known, the grouping of the survival times results in some loss of information, and the estimate of the survival function is sensitive to the choice of intervals used in its construction. Therefore, the Kaplan—Meier method seems more appropriate.
37.4.2 Semi-Parametric Consider the case where we have patients randomised to receive either a new or a standard treatment, with the hazards of death being lN(t) and lS(t), respectively. Then, if we assume that the hazards are proportional to each other at time t, the proportional hazards model, in its simplest form, can be written as λ N (t ) =y lS (t ),
(37.9)
where y is a constant [1]. The value of y is the ratio of the hazard of death of a patient receiving the new treatment relative to a patient receiving the standard treatment and is know as hazard ratio or relative hazard. As a result, a value of y < 1 implies that the hazard of death of a patient on the new treatment is smaller than the hazard of patient on the standard treatment, at time t, with the conclusion being that the new treatment is superior to the standard one. Exactly the opposite holds when y > 1. In the general case, assume that the hazard of death depends on a set of n explanatory variables X1, X2,…,Xn, also known as covariates. Their values are assumed constant over time and they are represented by vector x, so that x = (x1,x2,…,xn)′. These types of covariates are often called baseline covariates. Thus, y = y (x; b ) is a function of the explanatory variables, characterised by a set of parameters b = (b1, b2,…, bn)′. Amongst all possible parameterisations that could be considered, the log-liner form y = y ( C; b ) = exp {b1 x1 + b 2 x 2 + ... + b n x n } = exp
{å
n j =1
bj xj
}
has become the most popular one [3]. Therefore, the proportional hazards model takes the form ïì n ïü λ (t ) = exp íå b j x j ý λ 0 (t ), îï j =1 þï
(37.10)
where l0(t) is an unspecified function called baseline hazard function. For a particular vector xi, y = y(xi; b) has a useful interpretation, being the hazard of a patient with a vector of explanatory variables xi relative to the hazard of a patient with x = (0, 0, ... ,0)′, at time t. Similar interpretation holds for each one of the bj’s, j = 1, 2, ... ,n, as e b j is the relative hazard of a unit change of Xj, with the rest of the explanatory variables having exactly the same values. Furthermore, for patient i, the quantity from the relative hazard hi = exp {b1x1i + b2x2i + ... + bnxni} is the linear component of the model and is also known as score or prognostic index of the ith patient.
37
How to Perform Analysis of Survival Data in Surgery
The estimates of the regression coefficients b1, b2,…, bn can be obtained from the partial likelihood [2]. The flexibility and broad applicability, together with the simplicity of application, have turned Cox’s regression model into the most popular way of modelling survival data. Consequently, researchers quite often use it with no second thoughts, ignoring the fact that the proportionality of the hazards is a rather restrictive assumption to make. Therefore, it is recommended to perform some preliminary analysis to investigate whether the proportional hazards assumption is reasonable in a particular set of data or not. In the case in which two treatments are compared, this can be done by simply plotting the survival curves of the two treatment groups and observing whether or not the curves tend to diverge as the time increases (Fig. 37.5). A plot with crossing survival curves or curves that appear parallel to each other is a useful sign that the hazards cannot be proportional to each other. In this case, other ways of modelling should be considered, with the parametric models being a natural choice. A more formal way of examining the proportional hazards assumption will be discussed in the section where time-dependent covariates will be introduced.
501
was to avoid having to specify the hazard function completely. However, parametric assumptions that fully characterise either the density or the hazard function are permissible, especially in situations where previous research or simple diagnostics suggest a specific form for the data. There are certain advantages if the assumption of a particular probability distribution for the data is valid. The parameters of the distribution can be clinically meaningful, and inferences based on such assumption will be more precise, leading to standard errors that will tend to be smaller than they would be in the absence of a distributional assumption. The probability distribution that has played a key role in the analysis of survival data is the Weibull distribution, defined as a
f (t ) = qat a -1 e -θ t , for 0 ≤ t < ∞. As soon as the model for survival times is specified in terms of the probability density function, then all the corresponding functions can be obtained from the equations t
S (t ) = 1 - ò f (u )du = e -θ t 0
37.4.3 Parametric
f (t ) and λ (t ) = = θa t α -1 . S (t )
Survival 0.5
1.0
So far, we have discussed the use of non-parametric and semi-parametric techniques for the analysis of survival data. The rationale for using these techniques
α
0.0
50
100
150
Time (weeks)
Fig. 37.5 A visual check of the proportionality assumption
(37.11)
Parameter a is called the shape parameter, as the shape of the hazard function depends critically on its value, while parameter q is called the scale parameter. If a = 1, then f(t) = q e−qt is the probability density functions of the exponential distribution, a popular distribution with constant hazard that can be obtained as a special case of the Weibull distribution. The general form of the hazard function for different values of a is presented in Fig. 37.6. Since Weibull is skewed, similar to most sets of survival data, the most appropriate and tractable summary of the location of the distribution is the median survival time. This is the value of t, say, t50, at which the survival is equal to 0.5. Hence, S (t 50 ) = eθ t50 = 0.5
0
α
Þ
é1 ù t 50 = ê ln(2)ú ëq û
1
α
,
while more generally, for the rth percentile of the Weibull distribution, tr, we have
F. Siannis 0.07
502
0.05
0.06
In a general parametric setting, one could alternatively specify a functional form for the hazard function and then derive the survival and density functions through the relations
0.00
0.01
0.02
Hazard 0.03 0.04
α=0.5 α=1.0 α=1.5 α=5.0
0
10
20
30
40
S (t ) = e -L ( t ) and f (t ) = λ (t ) S (t ),
(37.14)
where L(t) is given by (37.3). Additionally, other distributions could be chosen for fitting the data, with loglogistic and log-normal being two of the most popular ones. These distributions do not benefit from the proportional hazard structure, like Weibull does; however, they present us with alternative shapes for the hazard function and covariates can be easily incorporated into the analysis.
50
Time
37.5 Further Issues in Survival Analysis Fig. 37.6 The Weibull hazard function
1
é 1 æ 100 ö ù t r = ê ln ç ÷ú ëq è 100 - r ø û
α
.
(37.12)
From the expression for the hazard function in (37.11), we can see that the proportional hazards structure holds for the Weibull distribution, as there is a clear separation of the part of the hazard that includes t. Therefore, explanatory variables can be easily incorporated into this parametric set up. If we have p explanatory variables X1, X2,…,Xp with values x1, x2,…,xp for the ith subject, i = 1,2,…,n, then the hazard function takes the form
{
}
λi (t ) = exp b 1 xi 1 + b 2 xi 2 + K + b p xip l 0 (t ), (37.13)
a form similar to Cox’s regression model. If we consider a subject for whom all the p explanatory variables are equal to 0, then the hazard of this subject is assumed Weibull with scale parameter q and shape parameter a, such that l0(t) = qata−1. Hence, from (37.13), the survival time of subject i will follow a Weibull distribution with scale parameter exp{b1xi1 + b2xi2 + … + bpxip}q and shape parameter a. This shows that the presence of covariates affects the scale parameter, while the shape parameter is left unchanged.
The methods for the analysis of survival data presented so far refer to common data structures. There are situations, however, in which data are collected from more complicated designs, and hence, more sophisticated and specialised statistical methods are required in order to analyse them. In this section, some of these more complicated data designs are presented, together with some descriptions of how they can be efficiently handled.
37.5.1 Competing Risks Consider the extension to the standard survival analysis framework wherein, contrary to what we have assumed so far, individuals are subject to more than one type of failure. As a result, apart from the time to failure T, the observed outcome also comprises C, the cause or type of failure. The failure time T is usually taken to be a continuous variate and the cause C can take one of a small number of values, labelled 1, 2,…,q. It will be assumed that to every failure one and only one cause will be assigned from the given set of q possible causes. They are called risks before failure and causes afterwards, so that the risks compete to become the cause [4]. One way of understanding and modelling the presence of competing risks is to assume that we have
37
How to Perform Analysis of Survival Data in Surgery
q notional failure times, T1, T2, …, Tq, also known as latent failure times. Therefore, once a failure time occurs, we observe
{
T = min T1 , T2 ,K , Tq
} and C = c
if T = TC (37.15)
the minimum of all possible failure times and the cause of failure. The remaining failure times are lost to observation, and the only thing we can state is that, if they exist, they are simply greater than T. The vector _ T = (T1, T2,…, Tq) will have a joint survival function G (t) = P(T > t), where t = (t1, t2,…,tq) and marginal survival func_ tions G j(t) = P(Tj > t). It is true that the structure that describes the associations between the competing risks is something that cannot be determined by simply collecting the data of the form of (37.15). By observing data of this kind, what can actually be determined is the distribution of each one of the competing risks, in the presence of all the other ones. This is characterised by the sub-survival function F (j, t) for cause j, also know as crude survival function. Knowledge of the sub-survival functions is not sufficient to determine the joint behaviour of the risks, unless independence between the risks is assumed. This will mean that the observable subfunctions are equal to the marginal ones, and hence, the joint density function is simply the product of all the marginal ones. In simple words, this means that the risks run independently, and therefore, we may study them as if the remaining ones do not exist. In any other case, detailed and sometimes complicated assumptions are needed to overcome this lack of information. Possible objectives in competing risks are to study (i) the distribution of failure time of, say, type A failures, with the remaining types of failure having been eliminated, (ii) the comparison of, say, type A of failure in two or more groups of individuals having different properties for the other types of failure and (iii) the effect on the marginal distribution of failure time of eliminating or reducing the type A failures [3]. Although this notion of “eliminating” a specific type of failure might have a clear interpretation when it refers to separate subsystems in a reliability type of problem, cautious interpretation is required, however, when it refers to human survival.
503
37.5.2 Time-Dependent Covariates In many studies with survival endpoints, individuals are followed-up for the durations of the study. As a result, together with variables that are recorded at the time origin of the study (baseline covariates), we could as well include variables whose values change over time. These variables are known as time-dependent variables, and they may be referred to as either internal or external [1]. Internal variables are those which relate to a particular individual and whose value is known as long as the patient is still alive and under follow-up. Examples are measures of blood pressure, white blood cell counts and so on. On the other hand, external variables are those which change in a way that their value is known as long as an initial value of this variable was measured at the time origin. For example, if the age of a patient is known at the time of entry to the study, then that patient’s age will be known at any future time. There are many reasons why time-dependent covariates are useful. The first and most obvious one is that measurements of variables made during a study can now be incorporated into the analysis. The idea is that the survival experience of an individual is more likely to depend on the most recent measurement of a variable rather than the measurement taken at the start of the study. Therefore, Cox’s regression model becomes ïì n ïü li (t ) = exp íå b j xij (t )ý l 0 (t ), îï j =1 þï
(37.16)
which is similar to (37.10). The only difference is that xij(t) is allowed to depend on time t, and hence, the relative hazard li(t)/l0(t) now depends on t. This means that the hazard of death at time t is no longer proportional to the baseline hazard and hence, the model is no longer a proportional hazards model. The interpretation of parameters bj is similar to that given earlier in this chapter; therefore, bj can be seen as the log-hazard ratio for two individuals whose value of the jth variable differs by one unit at time t, while the values of the remaining variables are the same. Other applications of time-dependent variables include (i) testing the validity of the proportional hazards models, (ii) modelling the effect of patients switching treatments and (iii) serving as a function of the history of failures, censoring and of any other random
504
feature of the problem (evolutionary covariates) [3]. The first application in particular is very important, as it provides with a formal way of examining the ever popular proportional hazards assumption. This can be achieved by introducing time t, or a function of it, as a time-dependent covariate in the model. If the regression coefficient of this time-dependent variable is shown to be significantly different from 0, then this is an indication that the proportional hazards assumption is not appropriate, while the opposite is concluded otherwise.
37.5.3 Missing Data In everyday practice, it is quite common to have data sets with parts missing. This problem can appear for a number of reasons and is frequent in studies where individuals are repeatedly observed. For example, missing values could be generated if respondents in a household survey refuse to report their income, as well as if in the follow-up survey, a few years later, some individuals cannot be located. Methods for the analysis of partially missing data can be grouped into the following four categories: (i) procedures based on complete or available data, (ii) imputation methods, (iii) weighting procedures and (iv) model-based procedures [12]. In order to analyse these kinds of data, an assumption regarding the process that generates the missing values is needed. A common assumption is that this process can be ignored, implying that missing data are somehow accidental. However, although this assumption might sometimes be correct, it is still very difficult to be verified. Thus, conditions on the process that causes missing data have been introduced, under which it is appropriate to ignore this process when analysing the data [15]. The missing data are missing at random (MAR), if the probability of the observed pattern of missing data depends on the value of the observed data but not on the possible value of the missing data. If additionally, the probability of the observed pattern of the missing data does not depend on the value of the observed data, then the observed data are observed at random (OAR). In the latter case, where data are MAR and OAR, we can simply state that the missing data are missing completely at random (MCAR). As a result, different missing data methods require different
F. Siannis
conditions for the process that generated the missing data, in order to provide correct inferences. The MCAR assumption is required for the complete-case and available-case analyses, together with imputation methods, while the MAR assumption is sufficient for likelihood-based methods [12]. Survival analysis is characterised by the appearance of missing data in the form of censoring. Actually, these observations may be partially missing, as they provide with some information that failure time exceeds the observed censoring time. In this case, the most common assumption regarding the censoring mechanism is that it is not associated with the failure process and hence is ignorable. Ignorability is the main assumption behind the development of all the standard methodology in survival analysis. It serves as the starting point in every analysis, even if we have evidence to believe otherwise. Unfortunately, it is rather impossible to verify assumptions with respect to the censoring mechanism. Therefore, there is a greater need for collecting all available information regarding the reasons why individuals are being censored. Procedures are being introduced in several studies so that these individuals can be located after they are censored, using various means, in order to provide additional information about their status. For example, in a registry with systemic lupus erythematosus patients, an effort was made to trace patients who were defined as lost to follow-up according to specific criteria [6]. This provided with additional information on the mortality experience of the patients lost to follow-up, which was used to investigate the presence of bias due to patients being lost to follow-up. Things can become complicated if we assume that the censoring mechanism is indeed associated with the failure process. Censoring is then termed nonignorable and contains information that the data analysis, somehow, needs to be adjusted for. This assumption invalidates most if not all the standard statistical tools, and new techniques, sometimes ad hoc, need to be developed for the analysis of such data. Methods that allow for some form of sensitivity analysis have been investigated [16] under the assumption that departures from ignorable censoring are small. Nevertheless, there is a lot of ongoing research in this area of survival analysis, as we need to understand how censoring can affect the inferences about the failure process.
It is possible that there is some heterogeneity amongst the patients in a study that has not been accounted through explanatory variables. Therefore, omitting such heterogeneity would lead to misleading results and inferences about the variables included in the model. This situation may arise, e.g., when a number of patients who receive some form of medication can be considered cured from the disease, and hence, they will never experience the event of interest. Although this might be seen as an extreme type of heterogeneity among patients, it clearly demonstrates that different groups of patients might exist, one of which might not be at immediate risk of failure. The kind of patients who survive much longer than the remaining ones have been named long-term survivors, and modelling their behaviour is of interest [5, 9, 13]. Consider Y as a binary random variable, where Y = 0 indicates a long-term survivor. If x is a set of explanatory variables, the time-to-event for those with Y = 1 is then characterised by the conditional probability density function f(t | Y = 1,x), with a well-studied option being the Weibull distribution [5]. In the general case, patients may come from two or even more populations with different hazards of experiencing the event of interest. Following the long-term survivors paradigm, we assume that patients come from two separate populations. By letting P(Y = 1) = p, the mixture distribution is written as f (t ) = p f (t | Y = 1, x) + (1 - p ) f (t | Y = 0, x), (37.17) where f(t | Y = 0, x) and f(t | Y = 1, x) are modelled separately. The hazard function then becomes λ (t ) =
pf (t | Y = 1, x ) + (1 - p ) f (t | Y = 0, x ) , pS (t | Y = 1, x ) + (1 - p )S (t | Y = 0, x ) (37.18)
with S(t | Y = 0, x) and S(t | Y = 1, x) being the corresponding survival functions. In the case of long-term survivors, we have S(t | Y = 0, x)= 1 at any time t, leading to a density function equal to 0 and a hazard function of the form pf (t | Y = 1, x ) λ (t ) = . (37.19) pS (t | Y = 1, x ) + (1 - p ) In Fig. 37.7, we can observe that in a hypothetical scenario in which a subgroup of individual experience the
Survival 0.5
37.5.4 Long-Term Survivor Models
505 1.0
How to Perform Analysis of Survival Data in Surgery
0.0
37
0
25
50 Time
75
100
Fig. 37.7 Long-term survivors
event of interest according to an exponential distribution while the other subgroup consists of long-term survivors and p = 0.5, then the survival experience will reach the value of p, as time increases.
37.5.5 Meta-Analysis The development of statistical methodology for pooling together and analysing studies with time-to-event endpoints is recently receiving a lot of attention. Aggregate or summary data are usually available in papers reporting results on the analysis of specific data sets. These are usually hazard ratios and confidence intervals, as these are the main outcomes of interest, defined in study protocols. Methods for synthesising evidence of this form for time-to-event data [19] are borrowed from the meta-analysis of randomised control trials, where summary statistics, other than the widely used odds ratio (OR), may be used in the analysis. Nevertheless, attention has been paid on how to extract summary statistics of interest from papers or reports where they are not clearly presented [14]. The logarithm of the hazard ratio is the most prevalent summary measure used in the meta-analysis of timeto-event endpoints. This is a useful measure that quantifies the comparison between two groups in a single number, the value of which indicates which of the groups has smaller hazard and hence performs better
506
than the other one. Although some argue that it is justified to consider the log-hazard ratios in every case where time-to-event data are discussed, this strategy is most naturally adopted in the presence of the proportional hazards structure. The proportional hazards structure is the dominating assumption in individual patient data meta-analysis of time-to-event endpoints [18]. As individual patient data are not widely available, this type of analysis is rarer than the aggregate data analysis. However, it is considered to be the gold standard of meta-analysis [17] because all the available data are being utilised, and approximations needed to make use of summary measures for an aggregate data meta-analysis are avoided. The proportional hazards structure allows flexibility in modelling. Stratified analyses by trial, with fixed or random treatment effects, can be extended to allow a random trial term to generate models that can be fit when individual patient data are available.
References 1. Collett D (2003) Modelling survival data in medical research, 2nd edn. Chapman & Hall/CRC, Boca Raton 2. Cox DR (1972) Regression models and life-tables (with discussion). J R Stat Soc Ser B 34:187–220 3. Cox DR, Oakes D (1984) Analysis of survival data. Chapman & Hall/CRC, Boca Raton 4. Crowder M (2001) Classical competing risks. Chapman & Hall/CRC, Boca Raton 5. Farewell VT (1982) The use of mixture models for the analysis of survival data with long-term survivors. Biometrics 38:1041–1046
F. Siannis 6. Farewell VT, Lawless JF, Gladman DD et al (2003) Tracing studies and analysis of the effect of loss to follow-up on mortality estimation from patient registry data. Appl Statist 52:445–456 7. Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley, Hoboken 8. Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53:457–481 9. Klein JP, Moeschberger ML (1997) Survival analysis. Springer, New York 10. Lawless JF (2003) Statistical models and methods for lifetime data. Wiley, Hoboken 11. Leung KM, Elashoff RM, Afifi AA (1997) Censoring issues in survival analysis. Annu Rev Public Health 18:83–104 12. Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, Hoboken 13. Maller RA, Zhou X (1996) Survival analysis with long term survivors. Wiley, Hoboken 14. Parmar MKB, Torri V, Stewart L (1998) Extracting summary statistics to perform meta-analysis of the published literature for survival end-points. Stat Med 17:2815–2834 15. Rubin DB (1976) Inference and missing data. Biometrika 63:581–592 16. Siannis F, Copas J, Lu G (2005) Sensitivity analysis for informative censoring in parametric survival models. Biostatistics 6:77–91 17. Stewart LA, Tierney JF (2002) To IPD or not to IPD? Eval Health Prof 25:76–97 18. Smith CT, Williamson PR, Marson AG (2005) Investigating heterogeneity in an individual patient data meta-analysis of time to event outcomes. Stat Med 24, 1307–1319 19. Williamson PR, Smith CT, Hutton J et al (2002) Aggregate data meta-analysis with time-to-event outcomes. Stat Med 21:3337–3351
Further Reading For statistical software visit the R-project at http://www.r-project.org
Risk Stratification and Prediction Modelling in Surgery
38
Vassilis G. Hadjianastassiou, Thanos Athanasiou, and Linda J. Hands
Contents
38.6
Development and Validation of Risk Stratification Models ................................. 523
Abbreviations ..................................................................... 507
38.6.1 38.6.2 38.6.3 38.6.4
Discrimination .......................................................... Calibration ................................................................ Sub-group Analysis .................................................. Recalibration ............................................................
38.1
Introduction ............................................................ 508
38.2
Historical Perspective............................................. 509
38.3
Overview of Examples of Risk Stratification Models .............................................. 509
38.3.1 American Society of Anaesthesiology (ASA) Grade ............................................................ 38.3.2 Acute Physiology and Chronic Health Evaluation (APACHE) Methodology ....................... 38.3.3 The Simplified Acute Physiology Score (SAPS) Methodology ............................................... 38.3.4 The Mortality Prediction Model (MPM) Methodology ............................................................ 38.3.5 The Surgical Mortality Score (SMS): A Model Based on Administrative Data .................. 38.3.6 The Physiology and Operative Severity Score for the Enumeration of Mortality and Morbidity (POSSUM) Methodology................. 38.3.7 Other Risk Stratification Models .............................. 38.4
510 510 512
512 515
Logistic Regression in Risk Stratification Modelling ......................................... 516
38.4.1 Multiple Logistic Regression Analysis .................... 516 38.4.2 Hierarchical (Multilevel) Logistic Regression Analysis Models .................................... 517 38.4.3 The Need for Progress in Statistical Methodology of Non-Randomised Designs ............. 518 38.5
Artificial Neural Networks .................................... 521
38.5.1 Clinical Applications of ANN .................................. 523
V. G. Hadjianastassiou () Consultant Transplant Surgeon Clinician Lead in Pancreas Transplantation Department of Transplantation, Directorate of Nephrology Transplantation and Urology, Guy’s & St. Thomas’ NHS Foundation Trust, Guy’s Hospital, St. Thomas’ Street, London SE1 9RT, UK e-mail: [email protected]
References ........................................................................... 525
Abbreviations AAA ANN APACHE
512 512
524 524 524 524
APS ASA ECG ICU MPM O:E POSSUM
ROC RSM SAPS SMR SMS VBHOM
Abdominal aortic aneurysm Artificial neural network Acute physiology and chronic health evaluation Acute physiology score American Society of Anaesthesiology Electrocardiogram Intensive care unit Mortality prediction model Observed:expected Physiology and operative severity score for the enumeration of mortality and morbidity Receiver operating characteristic curve Risk stratification modelling Simplified acute physiology score Standardised mortality ratio Surgical mortality score Vascular biochemistry and haematology outcome models
Abstract For more than three decades, multivariable risk factor analysis has been the main statistical technique for identifying and quantifying treatment outcome differences adjusted for patient characteristics, these differences being treated as associations with
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_38, © Springer-Verlag Berlin Heidelberg 2010
507
508
outcomes, and not causes. There is no guarantee that risk factor analysis is an effective strategy for discovery of a cause-and-effect mechanism. Multivariable logistic regression appears to be very suitable for epidemiological surgical research, especially for dichotomous outcomes such as mortality, as disease occurrence has multiple risk factors, which can be mutually correlated. The mathematical model known as a multiple logistic regression assumes that the dependent variable (the outcome of interest, such as death in this case) is linearly and additively related to the independent variables (patient risk factors) on the logistic scale. The technique is useful primarily because it produces a direct estimate of the odds ratio of any risk factor, which reflects the dose–response relationship between these and the outcome. However, if this actual relationship is non-linear, then an adequate fit can usually be achieved by adding polynomial and product interaction terms to the model, although the meaning of the odds ratios in such an interaction model would be severely circumscribed. An essential step in risk stratification modeling (RSM) is to evaluate whether there is evidence of an interaction between the variables used in a model. This interaction would imply that the effect of one of the variables is not constant over levels of the other. An important drawback of logistic regression is that it can increase bias because of misclassification and measurement errors in confounding variables and differences between conditional and unconditional odds ratio estimates of treatment effects. In this chapter, we outline all the basic components that the surgeon needs to know about risk stratification and predictive modelling.
38.1 Introduction Health care outcome depends on the case-mix of the patients (demography, diagnosis and co-morbid state), non-patient-related factors such as the structure of the health system (staff, equipment) and the process (type, efficiency, skill effectiveness) and quality of care pertinent to the institution delivering it [1]. It is widely accepted that for an objective outcome such as mortality to reflect health care delivery performance, it has to be adjusted for the underlying case-mix of the patients (all the patient-related risk factors) by using a risk stratification model.
V. G. Hadjianastassiou et al.
RSM has been used in various roles [2]: 1. In comparative audit, by comparing actual with expected (as predicted by the risk stratification model) outcomes for groups of patients, one may compare different health care delivery providers or can monitor the learning curve of a surgeon or an individual unit. Regular audit can then identify suboptimal practice and provide the opportunity to disseminate best practice. 2. In observational studies when the populations under study have to be adjusted for differences in their case-mix, to help identify factors associated with improved outcomes. In randomised studies, RSM can be used to define similar strata of risk for patients to be randomised within them. Results can be subsequently analysed within more homogeneous strata of risk (case-mix), in this way isolating the effect of an intervention from the bias of casemix differences (confounders). 3. In clinical management, risk stratification scores can be used as a quantitative surrogate measure of a patient’s clinical status to aid exchange of information between clinicians. At present, risk stratification models do not predict outcome accurately for individual patients [3], and hence they cannot be used alone to determine decisions about specific patient care, even though these models have been shown to be as good as clinicians in predicting mortality outcome [4]. Furthermore, the predictive aspect of RSM can be used as an adjunct in the process of informed consent [5] or to guide clinicians as to the prognosis of a patient [6, 7]. The process of care of surgical patients comprises the pre-hospital treatment and transfer to the hospital (most relevant to emergencies), in-hospital pre-operative optimisation, the operation and anaesthesia (peri-operative events) and the post-operative care leading to hospital discharge. The timing of the construction of a riskstratification model with regard to the process of care dictates the potential uses of such a model. That is, construction of such a model at the earliest time-point in the process of care, such as in the pre-hospital setting, would very likely be inaccurate owing to the large number of processes and therapeutic interventions that ensue. A model created just before surgery may be useful as an adjunct to the clinician for the process of informed consent. Such a model would not be useful for pure “surgical” audit as the events following include the operation
38
Risk Stratification and Prediction Modelling in Surgery
itself, the anaesthetic input and of course all of the postoperative care before the patient’s discharge. Therefore, any comparisons between expected and observed outcomes in this group of patients can only generate inferences for the totality of care received from the operation itself to the patient’s discharge. The concept of pure “surgical” audit is a misnomer as the surgery itself is only part of the chain of events in the process of care. A risk stratification model constructed from data immediately after the operation may be useful in elucidating aspects of the post-operative care alone distinct from the perioperative events. Specifically, such a model may be used in comparative audit of the post-operative care facilities.
38.2 Historical Perspective There is evidence of collection of mortality data for several centuries in England, primarily to track epidemic illness [8]. British hospitals, which were usually charitable institutions serving the less affluent members of society, accumulated statistics on their patients since the 1,600s as evidence to attract new benefactors to these hospitals. They looked at their death rates as the best indicator of their relative health-achievements. The urban migration that accompanied the industrial revolution of the eighteenth century led to public interest in the formation of various statistical societies in England by the 1830s. However, support was not universal; the prominent Victorian Charles Dickens satirised statistics, which he believed harmed individuality. In the same era, Florence Nightingale (1820–1910) was very interested in the study of mathematics [9]. In 1837, the registration of births, deaths and marriages was introduced, making social statistics a “sociable” subject for conversation. Florence Nightingale worked as the Superintendent of Nurses at the British army hospital in Scutari, Albania, for wounded soldiers of the Crimean War (1854–1856). She observed that most of the deaths were caused by “zymotic” (infectious) or “scorbutic” (poor nutrition and frostbite) diseases. Nightingale made fundamental changes to cleanliness, nutrition, patient care and hospital administration, achieving a reduction in the rate of mortality from 40 to 2% after 18 months, proving to herself that much of the suffering of the army was unnecessary. As soon as Nightingale returned to England after the war, she used statistical tables of mortality, risk-adjusting for age, to demonstrate the high mortality in England of
509
army personnel when compared with civilians in order to convince the authorities to institute changes to the unhygienic city hospitals.
38.3 Overview of Examples of Risk Stratification Models The essential feature of risk stratification models is to take into account the features (independent predictor variables) of patients that affect their risk of an outcome (the dependent variable), irrespective of the structure and process of care they receive [10]. A multiple logistic regression model [11] is designed to reduce all the independent risk factors to a single predictor value linked to the outcome by a mathematical equation. Such a model describes the strength of association of each risk factor with the outcome, while simultaneously allowing (adjusting) for the effect of all the other risk factors. Risk stratification models can be generic for all types of patients or disease-specific [2], and they may be based on anatomical (such as the extent of injury) or physiological information (the effect of an injury or insult to the bodily functions). In-hospital mortality is the most frequently used outcome in RSM, which is more easily verifiable than timespecific mortality following a surgical intervention (such as 30-day mortality). The former risks missing patients who die following discharge from hospital although the latter risks missing patients who die in hospital beyond the 30-day threshold, which is not infrequent in the Intensive Care Unit (ICU) setting. Other measures of outcome such as morbidity, functional disability and quality of life are equally important, although they pose a greater logistical challenge in terms of their assessment. The accuracy of models in generating estimates of probability of an outcome is evaluated by “group” statistics. When these probabilities of outcome are used interchangeably to provide a risk estimate for an individual, a binary prediction method is employed, which is more concerned between concordance of predicted and observed outcomes in individual patients for which group statistics cannot test [3]. In binary predictions, a group estimate of the probability of an outcome of 0.50 is usually taken as the threshold for individual patient classification. A patient with a group estimate of probability of 0.51 means that approximately 51 out of 100 patients with the same set of values of the predictor variables
510
V. G. Hadjianastassiou et al.
would be classified as having a positive outcome. As the threshold of 0.50 is exceeded, the particular patient would be individually classified as positive for the binary outcome, while another patient with an estimate of probability of 0.49 would be classified as having a negative outcome. In addition, patients with a group estimate of probability of 0.03 and 0.49 would be individually classified in the same category in binary prediction. Therefore, equally accurate risk stratification models in terms of group estimates of probability can produce very different results for individual patient prediction.
38.3.1 American Society of Anaesthesiology (ASA) Grade This has been used extensively since 1963 for the stratification of pre-operative patients according to their physical state based on a subjective assessment of taking a medical history and physical examination, without the use of any tests [12]. This grading system (Table 38.1) was not, however, designed to be used as a risk stratification model [13].
38.3.2 Acute Physiology and Chronic Health Evaluation (APACHE) Methodology The basis of this methodology was the hypothesis that the severity of acute disease can be defined by objective physiologic measurements independent of therapy. The Acute Physiology Score (APS) was developed by a Table 38.1 ASA classification Grade Description ASA grade I
Healthy patient with normal examination
ASA grade II
Mild to moderate systemic disease that does not limit activity
ASA grade III
Severe systemic disease, limiting activity but not incapacitating
ASA grade IV
Incapacitating systemic disease that is already life-threatening
ASA grade V
Moribund patient with little chance of survival beyond 24 h, with or without surgery
“team of experts” who identified 34 physiologic variables, which were associated with hospital mortality [14]. Any deviation of these variables from normal physiologic values was scored according to an arbitrary point assignment. To take into account the pre-existing physiological reserve of the patients, the “Chronic Health Evaluation” was also developed categorising patients according to their pre-existing health. The combined system was termed APACHE [15] but had several disadvantages. This led to a revision of the score to APACHE II [16] where the APS was simplified to only 12 variables (Table 38.2), age was included as a prognostic factor and Chronic Health evaluation was modified in order to be quantifiable. The result was an APACHE II score (Table 38.2) comprising the sum of the APS (maximum score of 60), Age (maximum score of 6) and Chronic Health (CH) points (maximum score of 5). The worst physiologic values in the first 24 h of ICU admission were recorded, although the authors admitted that taking the value on ICU admission would make the score more appropriate as the prediction would be more independent of treatment. The area under the Receiver Operating Characteristic (ROC) curve [17] (see Sect. 38.6.1) was 0.85 in the development sample of this model. The APACHE II still required some subjectivity in determining Chronic Health status and deciding what constitutes an emergency operation [18]. A further modification of the methodology resulted in the creation of APACHE III [19] but its use was limited by the fact that the equation required to calculate the probability of in-hospital death was only available by purchase and was never published [12]. Information to evaluate the goodness of fit of the APACHE III model was not reported [20], although the ROC area quoted by the authors in the development sample was 0.90. The APACHE II model has been validated in surgical intensive care patients [21] in Switzerland. A comparison of APACHE II and III in a UK ICU did not show an overall significant difference between the two models [22]. Another study from the USA showed that APACHE II did not overestimate mortality as much as APACHE III in a surgical ICU [23]. External validation studies (applying the models to new populations) of these models have almost invariably shown good discrimination but poor calibration [24], suggesting potential problems if the model were to be applied to sub-groups of the population. This was confirmed in the Intensive Care Society’s APACHE II study in Britain and Ireland [25]. The standardised mortality ratio (SMR,
38
Risk Stratification and Prediction Modelling in Surgery
511
Table 38.2 The APACHE II model scoring board. Adapted from [16] High abnormal range
Degree +2
Low abnormal range
Physiological variable
+4
+3
+1
0
Temperature – rectal (°C)
³41.0
39.0–40.9
+1
+2
+3
+4
38.5–38.9
36.0–38.4
34.0–35.9
32.0–33.9
30.0–31.9
≤29.9
Mean arterial pressure (mmHg)
³160
130–159
110–129
70–109
50–69
Heart rate (ventricular response)
³180
140–179
110–139
70–109
55–69
Respiratory rate (non-ventilated or ventilated)
³50
35–49
³500
350–499
25–34
12–24
10–11
≤49 40–54
≤39 ≤5
6–9
Oxygenation FiO2 ≥ 0.5 record A-aDO2
200–349
<200 >70
FiO2 < 0.5 record only PaO2 (mmHg) Arterial pH
³7.70
7.60–7.69
Serum sodium (mMol/L)
³180
160–179
Serum potassium (mMol/L)
³7.0
6.0–6.9
Serum creatinine (mg/100 mL) double point score for acute renal failure
³3.5
2.0–3.4
155–159
61–70
55–60
<55
7.50–7.59
7.33–7.49
7.25–7.32
7.15–7.24
<7.15
150–154
130–149
120–129
111–119
≤110
5.5–5.9
3.5–5.4
1.5–1.9
3.0–3.4
0.6–1.4
2.5–2.9
<2.5
<0.6
Hematocrit (%)
³60.0
50.0–59.9
46.0–49.9
30.0–45.9
20.0–29.9
<20.0
White blood count (total/ mm3) (in 1,000s)
³40.0
20.0–39.9
15.0–19.9
3.0–14.9
1.0–2.9
<1.0
22.0–31.9
18.0–21.9
Glasgow coma scale (GCS) (score = 15 minus actual GCS) Total Acute Physiology Score (APS) = sum of above 12 parameters Serum HCO3 (venous – mMol/L) use if no arterial gases
³52.0
Age points (B)
Definitions Organ insufficiency or immunocompromised state must have been evident prior to the hospital admission and conform to the following criteria:
Age (years) <44 45–54 55–64 65–74 >75
Points 0 2 3 5 6
Chronic Health points (C) If the patient has a history of severe organ system deficiency or is immunocompromised, assign points as follows: a. (five points) For non-operative or emergency post-operative patients) b. (two points) for elective post-operative patients
41.0–51.9
Liver: Biopsy proven cirrhosis and documented portal hypertension episodes of past upper GI bleeding attributed to portal hypertension; or prior episodes of hepatic failure/ encephalopathy/coma. Cardiovascular: New York Heart Association Class IV Renal: Receiving chronic dialysis
32.0–40.9
15.0–17.9
<15.0
Respiratory
Glasgow
Chronic restrictive, obstructive or vascular disease resulting in severe exercise restriction, i.e unable to climb stairs or perform household duties; or documented chronic hypoxia, hypercapnia, secondary polycythaemia, severe pulmonary hypertension (>40 mmHg), or respiratory dependency.
Coma SCALE
Immuno-compromised:
Best Verbal Response (5)
The patient has received therapy that suppresses resistance to infection [e.g. Immunosuppression, chemotherapy, radiation, long-term or recent high dose steroids] or has a disease that is sufficiently advanced to suppress resistance to infection [e.g. leukaemia, lymphoma, AIDS]
(1) No verbal response (2) Incomprehensible sounds (3) Inappropriate words (4) Confused (5) Orientated
APACHE II SCORE = Sum of APS + B + C
(GCS) score Best Eye Response (4) (1) No eye opening (2) Eye opening to pain (3) Eye opening to verbal command (4) Eyes open spontaneously
Best Motor Response (6) (1) No motor response (2) Extension to pain (3) Flexion to pain (4) Withdrawal from pain (5) Localising pain (6) Obeys Commands
512
ratio of observed to APACHE II predicted deaths) of the UK APACHE II study was not significantly different from unity [26], while the APACHE III performed significantly less well (SMR significantly different to unity, poor “goodness of fit”) when applied in a study in the UK involving 17 general adult ICU [27].
38.3.3 The Simplified Acute Physiology Score (SAPS) Methodology In an attempt to simplify data collection from the original APACHE methodology, 14 of the 34 APACHE variables were used to create the SAPS model to predict death [28]. SAPS II was a subsequent revision of the model [29] using 13 physiological variables (Table 38.3), the type of admission and chronic health points. In a study comparing three risk stratification methodologies in a British ICU population, APACHE II generated the most uniform predictions across the spectrum of risk with SAPS II coming a close second, while APACHE III produced the worst calibration [30].
38.3.4 The Mortality Prediction Model (MPM) Methodology This model was first created in 1985 [31] and it has been revised since (Table 38.3). It was unique in that it used a logistic regression equation containing all of the predictor variables to provide a direct estimate of the probability of in-hospital mortality, rather than first calculating a score. This model used data from patients on ICU admission in order to create a risk stratification model independent of ICU treatment, for use when comparing different ICUs or monitoring the quality of care in a single ICU [32]. In a direct comparison of MPM with APACHE II in more than 8,000 patients in Britain and Ireland, the APACHE II achieved a higher degree of overall “goodness of fit”, for both discrimination and calibration [33]. In an external validation study involving a total of 21 general ICU in Scotland, the APACHE II achieved the best calibration when compared with APACHE III, SAPS II, MPM II and the UK-APACHE II (coefficients derived from the UK APACHE study), although the SAPS II achieved a better discrimination [34].
V. G. Hadjianastassiou et al.
38.3.5 The Surgical Mortality Score (SMS): A Model Based on Administrative Data Methods of risk adjustment based on physiological scoring systems have been criticised for their use in comparative audit [35] owing to the fact that emergency care can influence the score prior to the intervention, which is formally being audited. Methods of risk adjustment based on administrative databases have also been criticised [36], as they are limited by coding bias and recording inaccuracies. Retrospective administrative databases based on hospital discharge data are considered unsuitable for risk-adjustment in surgery [37] when the outcome is morbidity. This is because discharge data may not distinguish between pre-existing co-morbid conditions and post-operative acquired morbidity. However, these limitations are minimised when the recording accuracy is optimal and the outcome is unequivocal, such as mortality. The SMS model [1] was developed as a risk-stratification tool for in-hospital mortality (capped at 30 days) in patients undergoing surgical procedures across a range of surgical specialities. The model was intended to be easily applicable and inexpensive by using data from an already existing administrative database: the hospital Patient Administration System, used for the various audit commitments by UK hospitals. The model was well calibrated and had a high discriminant function (ROC area: development set 0.84, validation set of 0.82). The internal validity of the model was also confirmed by subgroup analysis, both for emergency and elective cases and across all the specialities used in the model, including general (and vascular) surgery. Model performance was expected to deteriorate when applied to other hospitals, as it was specifically drawn from data at a single hospital with the particular case-mix, structure and process of care present at the time of taking the data.
38.3.6 The Physiology and Operative Severity Score for the Enumeration of Mortality and Morbidity (POSSUM) Methodology The POSSUM model was developed to predict 30-day post-operative mortality in general surgical operations [38] for use in comparative audit. 12 pre-operative
38
Risk Stratification and Prediction Modelling in Surgery
513
Table 38.3 Predictor variables in various risk stratification models APACHE II APACHE III SAPS II
POSSUM Physiology
MPM II
Temperature
+
+
+
Blood pressure
+
+
+
+
+
Pulse rate
+
+
+
+
+
Respiratory rate
+
+
PaO2
+
+
pH
+
+
Bicarbonate
+
SMS
+
+
Haemoglobin
+
Haematocrit
+
+
+
White cell count
+
+
+
+
Sodium
+
+
+
+
Potassium
+
+
+
Creatinine
+
Platelets
+
+
Albumin
+
Bilirubin
+
+
Glucose
+
+
Urea
+
+
Urine output
+
+
+
+
Glasgow Coma Scale
+
+ + +
+
ECG
+
+
Cardiac signs
+
Respiratory signs
+
Chronic health
+
+
+
+
Age
+
+
+
+
+
+
Emergency operation
+
+
+
+
+
+
+
+
ASA grade Acute diagnoses
+
Surgical speciality
+
Gender
+
Timing of surgery
+
Duration of surgery
+
physiology factors, found to be independent predictors of outcome by multivariate analysis, were graded and scored exponentially to produce a “Physiology” score between 12 and 88 points (Table 38.4). Data were collected as close to the start of the operation as possible. In addition,
an “Operative Severity” Score (6–44 points) comprising six predictors graded exponentially was also generated. The two scores were then applied to a logistic regression equation to produce an estimate of the probability of the outcome.
514
V. G. Hadjianastassiou et al.
Table 38.4 The POSSUM sheet. Adapted from [38] Physiology 1 2 Age (years) <60 61–70 Cardiac signs (or No failure Diuretic, digoxin, chest radiography) anti-anginal or anti-hypertensive
4 >71 Peripheral oedema warfarin borderline cardiomegaly
8 Raised JVP cardiomegaly
Respiratory history (or chest radiography)
No dyspnoea
Dyspnoea on exertion
Limiting dyspnoea (one flight) moderate COPD
Dyspnoea at rest (>30/min) Fibrosis or consolidation
Blood pressure (systolic mmHg)
110–130
131–170 100–109
≥171 90–99
<90
Heart rate (beats/min) 50–80
81–100 40–49
101–120
≥121 <40
Glasgow Coma Scale
15
12–14
9–11
£8
Haemoglobin (g/dL)
13–16
11.5–12.9 16.1–17.0
10.0–11.4 17.1–18.0
£9.9 ≥18.1
White cell count (×1012/L)
4–10
10.1–20.0 3.1–4.0
≥20.1 £3.0
Urea (mmol/L)
£7.5
7.6–10.0
10.1–15.0
≥15.1
Sodium (mmol/L)
≥136
131–135
126–130
£125
Potassium (mmol/L)
3.5–5.0
3.2–3.4 5.1–5.3
2.9–3.1 5.4–5.9
£2.8 ≥6.0
ECG
Normal
Atrial fibrillation (rate 60–90/min)
Other abnormal rhythm or ≥5 ectopics/min, Q waves ST/T wave changes
Operative severity
1
2
4
8
Operative category
Minor
Moderate (append/my, Chol/my, maste/my, TURP)
Major (laparotomy, major amputation, peripheral vascular surgery, choledochotomy)
Major + (aortic surg, A-P excision, pancreatic/Liver surg, oesophagogastrectomy)
Multiple procedures
1
2
>2
Total blood loss (mL)
£100
101–500
501–999
>999
Contamination
None
Minor (serous fluid)
Local pus
Free bowel content, pus or blood
Presence of malignancy
None
Primary only
Nodal metastases
Distant metastases
Mode of surgery
Elective
Emergency Resus of >2 h wait possible, op <24 h after admission
Emergency (Immediate surgery < 2 h needed)
However, the exponential analysis employed [39] made it difficult to assign a risk score to an individual patient [40]. As a result, a modification of the predictor equation was achieved [41] by using the same POSSUM predictors but new data from Portsmouth surgical patients and the use of linear analysis techniques. This resulted in the generation of the P-POSSUM (the Portsmouth modification of the model) equation, which
in fact predicted in-hospital mortality (rather than 30-day mortality) and the pre-operative physiology data were collected on admission of the patient to hospital (rather than as close to the time of operation as possible, which was the case in the original POSSUM). The P-POSSUM equation was subsequently validated in a larger group of patients from Portsmouth [42]. The P-POSSUM methodology successfully predicted both morbidity and
38
Risk Stratification and Prediction Modelling in Surgery
mortality after arterial surgery [43] and this study also produced disease-specific models predicting mortality for arterial surgery, the Vascular-POSSUM (V-POSSUM and its physiology-only variant). The P-POSSUM, V-POSSUM and V-POSSUM physiology-only models [44] accurately predicted mortality for elective abdominal aortic aneurysm (AAA) surgery but did not predict outcome for emergency AAA alone or for both elective and emergency AAA surgery as a combined group. A new model for emergency AAA was therefore developed (the disease-specific Ruptured AAA-POSSUM model (RAAA-POSSUM) ) from a group of 213 patients and tested on 107 patients. The corresponding RAAAPOSSUM physiology-only model did not predict outcome adequately however, as it failed to achieve good calibration in the test set. A number of other diseasespecific models have been published, such as for colorectal [45] and oesophago-gastric [46] surgery. The POSSUM data set has been criticised for use in comparative audit because of the inclusion of surgeondependent variables in the Operative Severity part of the equation such as blood loss, multiple procedures and peritoneal soiling [47, 48]. This would produce a higher predicted mortality for “poorly performing” surgeons (who may cause peritoneal soiling, blood loss and have to re-operate for complications), which would mask their poor performance in comparative audit where Observed:Expected (O:E) ratios are compared. This was the main impetus to create models based on the physiology-only component of POSSUM.
38.3.7 Other Risk Stratification Models The group [49] that did most of the work in developing the P-POSSUM and the vascular-disease-specific POSSUM models admitted that models based on the POSSUM data set had disadvantages. Specifically, they were vulnerable to incomplete data due to the sheer size of the information required for every patient and the subjectivity of some of the components of the data set. They therefore attempted to model mortality using the Vascular Biochemistry and Haematology Outcome Models (VBHOM) items in AAA patients. They anticipated that these items (Age, Urea, Sodium, Potassium, Haemoglobin, White Cell Count and Mode of Admission) would almost invariably be known for all the patients. In fact, only about 65% of the patients’ data had complete information for inclusion in their study,
515
excluding 35% of the cases. The VBHOM approach again failed to model outcome for both elective and emergency AAA as a combined group, necessitating the formation of two separate models. Both these models were developed from samples with an inadequate number of deaths (outcome events) to use logistic regression. Furthermore, there was no specific information as to the timing of collection of the data [49], although in their original study [50] when they first introduced the idea of a “national clinical minimum data set for general surgery”, they stated that the data were collected “at the first opportunity after admission” to hospital. Predicting mortality using physiological data so early on in a patient’s hospital admission is a limitation, as resuscitation attempts can alter the severity of illness and hence the eventual risk prediction for the individual. More recently, another VBHOM model was developed [51] from 2,700 patient records submitted to the UK National Vascular Database (a voluntary national registry) and validated on 327 patients from Cambridge. This time, this new model (despite the use of the old “VBHOM” term) had the additional prognostic variables of patient sex and mode of admission in the logistic regression equation, and was able to predict 30-day mortality in both emergency and elective patient groups. No information was given as to how those predictive variables were chosen and the model under-predicted outcome (O:E ratio of 0.905) in the validation group. In addition, there were other methodological ambiguities in the manuscript, which would need to be addressed in future prior to this model’s validation [52]. The APACHE-AAA model was a disease-specific risk stratification model developed using hierarchical logistic regression, in a combined group of both elective and emergency AAA patients in the immediate postoperative setting, based on the principles of the APACHE II methodology [53]. This model successfully predicted outcome in this patient population, as evidenced by all measures of internal validity such as calibration and discrimination properties and sub-group analyses. The model was advocated for use in comparative audit of the post-operative critical care facilities; for investigators wishing to compare the impact of different levels of post-operative critical care on patient outcome, and to health-care strategists planning demand for the use of critical care facilities by this patient population. The APACHE-AAA was found to be more accurate in quantifying prognosis than the corresponding artificial neural network (ANN) model and predictions from Intensive Care resident doctors [7]. The model was advocated as a
516
means of supplementing clinicians’ judgement, empowering them to provide “informed prognosis”, by decreasing uncertainty and promoting communication among clinicians and patients’ families. The APACHE-AAA model was successfully externally validated in a patient population independent from the one used to develop it [54] and was more accurate than existing risk stratification models (POSSUM or VBHOM-based) advocated for use in AAA patients [55]. This model exemplified the technique to allow comparisons between different units and the methodology to set up a national reference system for use as a benchmark for quality assessment and identification of an outlier unit performance. The Veterans Affairs Surgical Risk Study is the largest risk stratification project, which has been applied in the USA [56–58]. The data were sourced from 44 Veterans Affairs Medical Centres, which performed more than 85,000 major non-cardiac operations from eight surgical specialities in the early 1890s. Logistic regression analysis was used to identify predictors of 30-day operative mortality and operative morbidity. Risk adjustment was employed before ranking hospitals [58] in comparative performance tables. The major limitation of this study was the patient population selection, which comprised low socio-economic status middle aged to elderly men who had previously served in the military. This is a major limitation in terms of the generalisability of the results to other populations. The Glasgow Aneurysm Score [59, 60] was created as a “predictor” of in-hospital mortality of AAA patients. The Hardman Index [61] focussed on the subject of selecting ruptured AAA patients for surgery. A recent study to evaluate the Glasgow Aneurysm Score and the Hardman Index in Scotland found both to be poor predictors of outcome [62].
V. G. Hadjianastassiou et al.
independent effect of a risk factor on a disease, the confounding effects attributed to other factors must be held constant. For two or three confounding factors, it is possible to statistically adjust for them by performing the analysis on a stratified sample [64], but when the confounders become multiple, multiple logistic regression is the most efficient technique of achieving adjustment for these confounders [11]. The mathematical model known as a multiple logistic regression assumes that the dependent variable (the outcome of interest, such as death in this case) is linearly and additively related to the independent variables (patient risk factors) on the logistic scale. Logistic transformation has the ability to transform an S-shaped relationship (Fig. 38.1) between two variables into a linear one. This S-shaped pattern is prevalent in real clinical practice, in which there are a large number of cases with risk close to zero at a variable low exposure to a risk factor. Similarly, at the other extreme of the spectrum, there are significant numbers of cases with increasingly higher exposure to a risk factor but not much change in their risk of death [14]. The technique is useful primarily because it produces a direct estimate of the odds ratio of any risk factor, which reflects the dose–response relationship between these and the outcome. However, if this actual relationship is non-linear, then an adequate fit can usually be achieved by adding polynomial and product interaction terms to the model, although the meaning of the odds ratios in such an interaction model would be severely circumscribed [64]. An essential step in RSM is to evaluate whether there is evidence of an interaction between the variables used in a model [11]. This interaction would imply that the effect of one of
1
38.4 Logistic Regression in Risk Stratification Modelling 38.4.1 Multiple Logistic Regression Analysis Multiple logistic regression appears to be very suitable for epidemiological research, especially for dichotomous outcomes such as mortality [63], as disease occurrence has multiple risk factors, which can be mutually correlated. Therefore, to determine the
M O R T A L I T Y R A T E
9 8 7 6 5 4 3 2 1 0 −6
−4
−2
0
2
4
6
PREDICTED RISK
Fig. 38.1 S-shaped relationship between clinical risk and mortality
38
Risk Stratification and Prediction Modelling in Surgery
the variables (such as Emergency mode of surgery) is not constant over levels of the other (such as Age). Statistical principles suggest that when two units (such as ICU) are compared, statistically significant differences would be seen one time in 20 (assuming the usual significance level of P < 0.05 is used), even when there is no true difference between them. When multiple comparisons of routine data are made false outliers may therefore be identified (the statistical “type I error”). This problem can largely be avoided by prespecifying the key predictor variables of interest and testing statistical significance only for these. To obtain reliable estimates of regression coefficients when fitting a logistic regression model, a minimum of ten outcome events per predictor are required [65]. To avoid “over-fitting” the model to the development population, the generalisation error (that is, the error in applying the model in a different population to the one used to develop it) has to be minimised. There are several ways of achieving this, such as using a split-sample technique [66] wherein the study population is randomly divided into a development and a validation test set (usually in a 75:25 split). The model derived from the development set is applied to the test set to evaluate whether the development population sample is representative of the whole population. A more complex resampling technique is cross-validation [67]. This is equivalent to repeat data splitting whereby a large number of models are developed and the results are “averaged” at the end. Another method is the non-parametric bootstrap technique [68]. It is a complex process of repeated sampling with replacement from the original population, producing nearly perfectly unbiased estimates of the regression coefficients using the entire data set, thereby using all the precious data for model development and validation at the same time [17]. Multiple logistic regression analysis cannot be applied in the analysis of data measuring the time to an outcome (such as death). This type of “survival” data involving “censored” values can be analysed by using a Cox-proportional hazards regression model [69].
38.4.2 Hierarchical (Multilevel) Logistic Regression Analysis Models Conventional single-level models assume that all patients are randomly drawn from the same population. When a sample comprises statistically independent
517
patients, then logistic regression of the patients’ variables can be used for “predicting” a binary outcome. However, when the patients have a degree of autocorrelation among them, in other words patients are clustered within groups, the assumption of statistical independence may not hold. Multilevel or “hierarchical” models are specifically used in observations that may have a clustered structure. In a well-known study of primary school children carried out in the 1970s [70], it was claimed that children exposed to a particular style of teaching benefited more than those who were not. The analysis was done using single-level logistic regression techniques, which used the individual children as the units of analysis, ignoring their “clustering” within classes (sharing the same teachers). However, it was subsequently demonstrated [71] that when the analysis accounted properly for the clustering into a higher level of grouping (into classes), there were no significant differences between the children. Children within any one cluster (class) tended to have a similar performance because they were taught by the same teachers. Consequently, the “information” provided by children from the same class should be “weighted” less than would have been the case if the same number of students had been taught separately by different teachers. That is, the basic unit for comparison should have been the classes instead of the students. By increasing the number of students (Level 1 units) analysed per class, one would achieve a more precise estimate of the measure of the teachers’ effectiveness (the Level 1 measure). Increasing the number of classes (Level 2 units) to be compared, with the same or even smaller number of students per class, one would improve the precision of the teachers’ comparisons (the Level 2 measure). A multi-level analysis provides more “conservative” standard errors, confidence intervals and significance tests than a traditional single-level analysis, which is obtained simply by ignoring the presence of clustering. In addition, it enables the researcher to explore the extent to which differences in the students’ performance (the Level 1 measure or outcome) between schools are accountable for by factors such as organisational practice (Level 2 covariates). Finally, the methodology would allow the relative ranking of individual schools, using the performances of their students after adjusting for both Level 1 and Level 2 covariates. Likewise, in the health services when each hospital is treated separately, fitting a different regression model for each one would imply that the clustered structure is ignored. This may be acceptable where there are very
518
V. G. Hadjianastassiou et al.
few hospitals and large numbers of patients in each or if the interest is in making inferences about just those hospitals. If, however, inferences are needed about the variation between hospitals, then the hospitals have to be regarded as a random sample from a population of hospitals and then a multilevel approach is needed. Likewise, if some of the hospitals have very few patients, more precision can be achieved by using the information available from the whole sample data when making estimates for any one hospital. Multilevel models have already been used extensively in comparison studies: of paediatric cardiac deaths following the Bristol enquiry [72]; among in vitro fertilisation clinics in the UK [73]; in the assessment of the influence of hospital and clinician workload on survival from colorectal cancer [74]; in gastrooesophageal surgery from 24 hospitals in the UK [75]; of operative mortality in colorectal cancer involving 73 hospitals in the UK [5]; and more recently in RSM studies in vascular surgery [7, 53–55]. The graph [54] in (Fig. 38.2) clearly shows Oxford to have an outlier performance, which is eliminated after adjustment for the Level 2 variance (Fig. 38.3). The difference in the structure and process of care between different ICU, which has been quoted as one of the main reasons for the deterioration of a model’s performance in external validation studies [3, 24, 76, 77], would be circumvented by a model developed by a multi-level technique.
38.4.3 The Need for Progress in Statistical Methodology of Non-Randomised Designs There is evidence that observational studies can be designed with rigorous methods that mimic those of clinical trials and that well-designed observational studies do not consistently overestimate the effectiveness of therapeutic agents. Comparison of randomised and observational studies shows that treatment effects may differ according to research design, but that “one method does not give a consistently greater effect than the other”. The treatment effects are most similar when the exclusion criteria are similar and when the prognostic factors are accounted for in observational studies. A specific method used to strengthen observational studies (the “ restricted cohort” design) adapts principles of the design of randomised, controlled trials to the design of an observational study as follows: it identifies a “zero time” for determining a patient’s eligibility and base-line features, uses inclusion and exclusion criteria similar to those of clinical trials, adjusts for differences in base-line susceptibility to the outcome and uses statistical methods (e.g. intention-to-treat analysis) similar to those of randomised, controlled trials. Data in the literature of other scientific disciplines support our contention that research design should not be considered a rigid hierarchy.
Oxf 25 Lew
Institution
20
15
10
5
0
Fig. 38.2 In-hospital mortality (95% C.I.) before adjustment for the structure and process of care of the individual units [54]
0
20 observed
40 Risk (%) predicted
60 95% interval
80
Risk Stratification and Prediction Modelling in Surgery 25
Fig. 38.3 In-hospital mortality after adjustment for the structure and process of care of the individual units [54]
519
15
Oxf
0
5
Institution
20
Lew
10
38
0
40 Risk (%)
20
observed
One possible explanation for the finding that observational studies may be less prone to heterogeneity in results than randomised controlled trials is that each observational study is more likely to include a broad representation of the population at risk. In addition, there is less opportunity for differences in the management of subjects among observational studies. For example, although there is general agreement that physicians do not use therapeutic agents in a uniform way, an observational study would usually include patients with coexisting illnesses and a wide spectrum of disease severity, and treatment would be tailored to the individual patient. In contrast, each randomised, controlled trial may have a distinct group of patients as a result of specific inclusion and exclusion criteria regarding coexisting illnesses and severity of disease, and the experimental protocol for therapy may not be representative of clinical practice. In the past, several statistical techniques were used to increase the credibility of non-randomised study designs including the following: 1. 2. 3. 4.
Matching Multivariate logistic regression analysis and adjustment Use of multilevel and hierarchical modelling Use of Bayesian methods and statistical simulation techniques 5. Use of balancing scores
predicted
60
80
95% interval
38.4.3.1 Matching Matching is a method used to ensure that two study groups are similar with regard to “nuisance” factors that might distort or confound a relationship that is being studied. Matching can be implemented using two main approaches: (a) Pair (individual) matching (b) Frequency matching Frequency matching can be implemented in various ways, including category matching, caliper matching, stratified random sampling, or a variant of pair matching. Matching is most commonly used in case–control studies, although it can also be used in cohort studies [78, 79]. Theoretical analysis has shown that matching in cohort studies completely controls for any potential confounding by the matching factors without requiring any special statistical methods and is associated with loss of statistical power. On the other hand, in case–control studies, matching does not completely control for confounding, thus requiring the use of statistical methods such as the Mantel Hanzel approach, standardisation or logistic regression. There can be substantial loss of power if a case–control study matches on a factor that is not actually a confounder. It is obvious that with matching selection factor effects (called bias), which case-matching is intended to
520
reduce, may increase bias if unmatched cases are simply eliminated.
38.4.3.2 Multivariable Logistic Regression Analysis and Adjustment This was described in Sect. 38.4.1.
V. G. Hadjianastassiou et al.
flexibility issues such as convergence of the simulations needs to be established before they are used for estimation and inference; otherwise, biased estimates will be produced. Bayesian statistics have now permeated all the major areas of medical statistics, including clinical trials, epidemiology, meta-analyses and evidence synthesis, performance comparison, spatial modelling, case–control studies and measurement error longitudinal modelling, survival modelling and decision-making in respect of new technologies.
38.4.3.3 Use of Multilevel and Hierarchical Modelling 38.4.3.5 Use of Balancing Scores This was described in Sect. 38.4.2.
38.4.3.4 Use of Bayesian Methods and Statistical Simulation Techniques Bayesian methods can be considered as an alternative to the classical approach to statistical analysis. Such methods have been used in an increasing frequency because of the increased feasibility of their implementation made possible by recent advances in computation and software technology. An important difference between the two approaches is that Bayesian methodology allows the incorporation of information external to the study being included into the analysis. Such information is specified in a “prior distribution” and is combined with the study data, in the form of the likelihood, to produce a “posterior distribution” on which inferences are based. In several studies, the incorporation of prior information is not of primary interest, and all prior distributions placed on model parameters are intended to be “vague”; however, sensitivity analysis can be used to check the stability of the results to different prior distributions. The computation of the posterior distributions for parameters in a Bayesian model are often complex, requiring the evaluation of numerous high-dimensional integrals. Within the broad range of Monte Carlo simulation methods (MCMC), one method, Gibbs sampling, has been increasingly used in applied Bayesian analyses. The appeal of Gibbs sampling is that it can be used to estimate posterior distributions by drawing sample values randomly from the full conditional distributions of each parameter conditional on all others and the data. Under ergodic theory in the limit, the samples will converge to the marginal distribution. A final word of warning regarding use of MCMC methods is in order: although they offer great
Balancing scores are a class of multivariable statistical methods that identify patients with similar chances of receiving one or the other treatment, permitting nonrandomised comparisons of treatment outcomes. The developers of balancing score methods claim that the difference in outcome between patients who have a similar balancing score, but receive different treatments, provides an unbiased estimate of the effect attributable to the comparison variable of interest. In 1983, propensity score analysis (PS) was introduced as an alternative tool to control for confounding [80–82]. The PS is the probability of receiving treatment, or more general any exposure of interest, for a patient conditional on the patient’s observed pre-treatment covariates. PS analysis is a two-step approach in which a model is first built to predict the exposure (treatment model), and second, a model incorporating the information on PS is constructed to evaluate the exposure–outcome association (outcome model). To estimate the PS, usually a logistic regression model is fitted that predicts the exposure and may include a large number of measured pre-treatment covariates. From this model, the summary of each study subject’s pre-treatment covariates yields the expected probability (the person’s PS) of receiving the treatment or exposure of interest for that individual. In theory, it is expected that with increasing sample size, the pre-treatment covariates are balanced between study subjects from the two exposure groups who have nearly identical sums of PS. The three most common techniques that use the propensity score are matching, stratification and regression adjustment. Each of these techniques is a way to make an adjustment for covariates before calculation of the treatment effect (matching and stratification) or during calculation of treatment effect (stratification
38
Risk Stratification and Prediction Modelling in Surgery
and regression adjustment). With all three techniques, the propensity score is calculated in the same way, but once estimated it is applied differently. Propensity scores are useful for these techniques because by definition the propensity score is the conditional probability of treatment given the observed covariates; thus, subjects in treatment and control groups with equal (or nearly equal) propensity scores will tend to have the same (or nearly the same) distributions on their background covariates. Exact adjustments made with the propensity score will, on average, remove all the bias in the background covariates. Therefore, biasremoving adjustments can be made with the propensity scores rather than all the background covariates individually. Despite the broad utility of propensity score methods, when addressing causal questions from non-randomised studies, it is important to keep in mind that even propensity score methods can only adjust for observed confounding covariates and not for unobserved ones. This is always a limitation of non-randomised studies when compared with randomised studies, wherein the randomisation tends to balance the distribution of all covariates, observed and unobserved. In observational studies, confidence in causal conclusions must be built by seeing how consistent the obtained results are with other evidence (such as that generated from related experiments) and how sensitive the conclusions are to reasonable deviations from assumptions. Such sensitivity analyses suppose that a relevant but unobserved covariate has been left out of the propensity score model. By explicating how this hypothetical unmeasured covariate is related to treatment assignment and outcome, we can obtain an estimate of the treatment effect that adjusts for it as well as for measured covariates and hereby investigate how answers might change if such a covariate was available for adjustment. Of course, medical knowledge is needed when assessing whether the proposed relations involving the hypothetical unmeasured covariate are realistic or extreme. Another limitation of propensity score methods is that they work better in larger samples for the following reason. The distributional balance of observed covariates created by sub-classifying on the propensity score is an expected balance, just as the balance of all covariates in a randomised experiment is an expected balance. In a small randomised experiment, random imbalances of some covariates can be substantial despite randomisation; analogously, in a small observational
521
study, substantial imbalances of some covariates may be unavoidable despite sub-classification using a sensibly estimated propensity score. The larger the study, the more minor are such imbalances. A final possible limitation of propensity score methods is that a covariate related to treatment assignment but not to outcome is handled the same as a covariate with the same relation to treatment assignment but strongly related to outcome. This feature can be a limitation of propensity scores because inclusion of irrelevant covariates reduces the efficiency of the control on the relevant covariates. However, recent work suggests that, at least in medium or large studies, the biasing effects of leaving out even a weakly predictive covariate dominate the efficiency gains from not using such a covariate. Thus, in practice, this limitation may not be substantial if investigators use prudent judgement.
38.5 Artificial Neural Networks ANNs are mathematical functions that process information in a comparable way to the networks of human brain neurons, by “learning” from experience [83]. Similarly, ANNs are composed of an interconnected system of artificial “neurons”, or processing elements (nodes), which operate as non-linear summation devices. Each node represents a mathematical function in which all the input information arriving is processed and the resulting nodal output is compared with a preset threshold, which when exceeded allows the node to propagate the signal, just like the generation of an action potential in a neuron [84]. ANN “learn” during a “training” process of iterative exposure to data from cases that have a known outcome value. The input information (such as the values of Age, APS, Operative urgency and CH status shown in Fig. 38.4) is propagated through the network, from the input layer of nodes to the hidden layer and finally to the output layer (the in-hospital mortality shown in the example). Each connection between nodes has a numerical “weight” associated with it, which is used by the node to adjust the sum of its input values, using the mathematical function associated with it, to generate an output [85]. The most commonly used mathematical functions employed in ANN are the threshold function, the hyperbolic tangent function (produces output values from −1 to 1) and the logistic function
522 Fig. 38.4 A three-layer artificial neural network [7]
V. G. Hadjianastassiou et al.
weights
weights
INPUT layer: 4 nodes Age, Acute Physiology OUTPUT layer Score, Operative
HIDDEN layer
Urgency, Chronic Health
15 nodes
Probability of in-hospital mortality status
(produces output values from 0 to 1). The middle layers are termed “hidden” because they have no direct contact with the data other than through the input and output nodes. The ANN generated output is constantly being compared with the known output. The “error” between these values is then fed back through the network (termed back-propagation [86]) to modify the weights of the ANN (termed the delta learning rule [87]) in order to minimise this “error”. The connection weights from nodes that have a greater tendency to accurately predict the desired output are then strengthened. This iterative procedure of “training” continues until the ANN produces a predicted output, which matches the desired one within a specified accuracy level. The information processing capacity of an ANN is determined largely by the architecture of arrangement of the “nodes” and the strength of the weights inter-linking them. Thus, the ANN progresses closer and closer to a computational architecture, which can be used as a prediction engine for new data [88]. The network can then apply this “knowledge” by evaluating cases with unknown outcome: this
validation data set is used only once to determine the final performance of the network. Training a network for too long can result in “overfitting” of the training data with very accurate prediction of this historical data. As a result, the ANN becomes so complex that it produces inferior results for new data. To prevent this, the ANN periodically tests its performance on a test set during “training”. When the performance starts to deteriorate on this test set, the “training” process stops [89]. ANNs are fundamentally constructed to produce estimates of an outcome for an individual. Once ANNs become fully trained, they can then accept input data of several variables from a single case and make a prediction for that same case [88]. ANNs can perform the tasks achieved by conventional parametric statistics, but they can also perform non-parametric statistical tasks, without the need to satisfy assumptions regarding the phenomena under study. Their greatest weakness is that there are few rules regarding how to determine the ANN architecture and their configuration that would allow them to function most efficiently
38
Risk Stratification and Prediction Modelling in Surgery
[87]. It has been argued that a neural network analysis may potentially be more successful than traditional statistical techniques when the prognostic impact of a variable varies over time [90]. It has also been claimed that any function of the sort likely to be encountered in medicine (such as function estimation and pattern classifications) can be approximated closely by an ANN with only one hidden layer of nodes [84].
38.5.1 Clinical Applications of ANN Most of the time, clinicians are forced to take decisions in the light of incomplete evidence of the condition they are treating. This “art of medicine” is practiced by clinicians based on their prior experience and the evidence for the particular case allowing them to formulate a probability assessment. Pattern recognition is the ability of the mind to piece together fragments of evidence to find the best match among several candidate outcomes. The human mind is able to go through many such comparisons of the pattern in question with the database of knowledge it has stored from past experience [89]. ANNs have been extensively applied in clinical medicine in diagnosis, pattern recognition of imaging, ECG and encephalogram investigations as well as in outcome prediction [91]. In addition, the methodology using its ability to separate out non-linear data from large “noise”, has also been used in pathology and medical laboratories [92]. It has been suggested that the complex nature of critically ill patients in ICU, with their potential multiple interacting systemic disturbances, make them an ideal scenario for outcome prediction by ANN [83]. Outcome in ICU patients was predicted more accurately by ANN than both linear (conventional logistic regression) and the more complex non-linear (correlation and regression trees) statistical models [93]. A comparison of ICU mortality prediction using the APACHE II scoring system and ANN did not find a significant difference between the two methodologies [94]. ANN also predicted outcome in colorectal cancer patients more accurately than existing clinico-pathological staging systems and clinicians’ estimates [95]. In contrast, other published studies [96] have suggested that traditional statistical analyses outperformed ANN. A recent comparison of ANN and logistic regression in predicting patient
523
outcome in post-operative AAA patients for the purpose of “informed prognosis” was in favour of the latter methodology by a small statistical margin [7].
38.6 Development and Validation of Risk Stratification Models A risk stratification model specialising in outcome prediction allows the comparison of actual outcome to that of a reference population, as a marker of the effectiveness of care [97]. The objective of such a model is to reduce the uncertainty of clinical practice by defining how to use the risk factors to make predictions. When validating logistic regression (or any other) models, the major question posed is how well the predicted probabilities agree with the observed outcome in an independent sample [98]. When the development sample used to create a model is not randomly taken from a population, then that model cannot be assumed to be statistically valid to other samples from the same population. This is because variations in the case-mix of patients or the structure and process of care not accounted for in the model can have a significant impact on the accuracy of prediction in a different sample of patients. There are various aspects of validity, which are relevant in RSM [10], the most easily verifiable one being statistical validity. This can be evaluated by performing specific tests of the “internal” validity of a model, such as assessing the calibration and discrimination properties of the model as well as sub-group analyses. A further step in evaluating the statistical validity of the model would be to “externally” validate it by applying it to another sample of the population independent of the development sample. This form of validation requires reproducible clinical methods and definitions of the predictor variables and study outcome [99]. This would enable validation to be performed by a different set of investigators to the ones that created the model, in order to duplicate the methods of the original developers [100]. Another type of validity is the clinical validity of a risk stratification model and its effect on patient care. This is the assessment of whether the model is clinically meaningful or whether it can be applied in practice. This can be assessed by comparing the performance of the model with current practice such as with other existing models in use or even with clinicians.
524
38.6.1 Discrimination Discrimination is the property of a risk stratification model to distinguish between patients who die from those who live (when the predicted outcome is death). This can be evaluated by ROC curve analysis [101] and the C-index [102], which is directly related to the area under the ROC curve. The latter can be interpreted as follows: if all the patient samples were divided into every possible pair of one dead and one alive patient, then the ROC area would represent the proportion of the total number of these pairs that the model would assign a higher probability of death to the patient that died than to the alive one [20]. Perfect discrimination would give a ROC curve area of 1.00 and random discrimination an area of 0.50. A value exceeding 0.8 is considered to be very good discrimination [103]. In contrast, to assess a model’s discrimination at a single decision threshold using sensitivity and specificity alone would be inappropriate, as this would not reflect the ability of the model to predict outcome at all other decision criteria [20]. Clinical need may, however, dictate decision at a single diagnostic threshold: a high sensitivity relates to a test needing correct prediction of hospital mortality (true positives) while a high specificity relates more to accurate prediction of survival (true negatives) [104].
38.6.2 Calibration Calibration or “goodness-of-fit” refers to the ability of the model to assign the correct probabilities of outcome to individual patients. The Hosmer–Lemeshow C statistic [11] is used to assess the model’s calibration by ranking the sample of patients in ten groups of progressively increasing predicted risk, each containing approximately the same number of patients (known as natural deciles of risk). Within each decile, the predicted probability of death is estimated, and that is compared with the corresponding observed deaths by using a Chi-Square test. A big difference, yielding a large Chi-Square statistic and a small P-value, suggests poor calibration.
38.6.3 Sub-group Analysis This is intended to assess whether the model is equally valid in all sub-groups of interest (uniformity of fit),
V. G. Hadjianastassiou et al.
such as across all age groups or for both emergency and elective patients. Otherwise, if the model does not perform well in given subsets of patients, when the model is applied to an independent sample of patients, any differences in observed vs. predicted mortality may be misinterpreted as being due to differences in the quality of care, whereas these may be confounded by different proportions of sub-groups in the independent sample.
38.6.4 Recalibration Many researchers believe that prognostic indices inevitably perform less well when tested in an independent population [100]. Specifically, the discriminatory property of the model is preserved at the expense of imperfect calibration [24, 25, 77, 105]. There are various explanations in the literature to account for the deterioration of a model’s performance in an external validation study. The factor with the biggest impact is the difference in the structure and process of care between the development and validation institutions [76, 98]. External validation studies usually adjust only for the patient case-mix in their models failing to address the different structures and processes of care in the independent institution wherein the validation takes place. Other factors that may influence model performance in an external validation study include differences in the case-mix not accounted for by the model, differences in the definition of outcomes, and different strategies to detect the outcome. A discrepancy in local referral patterns and patient selection strategies can produce populations that are prognostically distinct by altering the patient case-mix. This selection bias in a patient population is introduced when the original model was “overfitted” to its development population, by using too many variables resulting in a Type I error. It has been suggested that situations where model performance loses the ability to discriminate in the validation population should be distinguished from those in which it retains such ability but uniformly over- or under-estimates outcomes [100]. A marked deterioration in the discriminative ability of a model cannot be corrected without effectively generating a new model [98], while a deficiency in calibration is said to be easily correctable [106]. If the goal of RSM is to determine temporal trends in mortality in the same population, then recalibration of a
38
Risk Stratification and Prediction Modelling in Surgery
reference model to a local population should not be performed [107]. Calibration rather than discrimination is said to be the more significant measure of model performance to assess quality of care by using O:E ratios [34]. When the goal of RSM is to compare risk-adjusted outcomes, identification of high-risk sub-groups or prognostication, researchers have suggested [107] that recalibration of an existing model can be easier and equally accurate to generating a new model in the new population. The generation of a new model would only be preferable to recalibration when the model is simply not accurate enough (both poor discriminatory and calibration properties in a new population). There are various methods of recalibration, the simplest being adjustment of the equation constant [108]. This method is an essential feature of multi-level models specifically developed to have a different equation constant at each separate institution so as to adjust for the individual structure and process of care of the local institution (the individual unit effect). Another method of recalibration (or customisation) of a model to an independent population is to use the same variables as the original model but customise their coefficients to fit the new population [76, 109]. This equates to the generation of different models as the relation of the prognostic variables (regression coefficients) to the outcome changes and cannot therefore be used for the purpose of externally validating a model. Furthermore, in simulation studies [109], the smaller the sample size, the less likely the model was to perform poorly, as the statistical power to detect lack of fit also decreases and models may misleadingly therefore appear to be well calibrated. The counterargument would be that clinically valid models applied in large ICU databases can have a statistically large chi-square value suggesting poor calibration due to the high sensitivity of a large sample [76]. Thus, it is always important not to rely solely on statistics but to interpret models in their clinical context [110]. Indiscriminate recalibration of models when applied to new populations results in the loss of important information [111] and should therefore always be preceded by an evaluation of the potential reasons (such as differences in case mix and the structure and process of care), which can account for the variations in the original model’s applied performance. In addition, recalibration of a model to a local population may preclude further external comparisons, restricting the use of the customised model to this local population [24].
525
References 1. Hadjianastassiou VG, Tekkis PP, Poloniecki JD et al (2004) Surgical mortality score: risk management tool for auditing surgical performance. World J Surg 28:193–200 2. Gunning K, Rowan K (1999) ABC of intensive care: outcome data and scoring systems. BMJ 319:241–244 3. Lemeshow S, Klar J, Teres D (1995) Outcome prediction for individual intensive care patients: useful, misused, or abused? Intensive Care Med 21:770–776 4. Marcin JP, Pollack MM, Patel KM et al (2000) Combining physician’s subjective and physiology-based objective mortality risk predictions. Crit Care Med 28:2984–2990 5. Tekkis PP, Poloniecki JD, Thompson MR et al (2003) Operative mortality in colorectal cancer: prospective national study. BMJ 327:1196–1201 6. Cullen DJ, Chernow B (1994) Predicting outcome in critically ill patients. Crit Care Med 22:1345–1348 7. Hadjianastassiou VG, Franco L, Jerez JM et al (2006) Informed prognosis [corrected] after abdominal aortic aneurysm repair using predictive modeling techniques [corrected]. J Vasc Surg 43:467–473 8. Joint Commission Resources (1999) Florence nightingale: measuring health care outcomes. Joint Commission on Accreditation of Healthcare Organizations, Oakbrook, IL 9. Nightingale F, Jcaho (1999) Measuring Hospital Care Outcomes. Joint Commission on Accreditation of Healthcare 10. Mourouga P, Goldfrad C, Rowan KM (2000) Does it fit? Is it good? Assessment of scoring systems (severity scoring in the critically ill patient). Curr Opin Crit Care 6:176–180 11. Hosmer DW, Lemeshow S (2000) Applied logistic regression. Wiley, New York 12. Jones HJ, de Cossart L (1999) Risk scoring in surgical patients. Br J Surg 86:149–157 13. Vacanti CJ, VanHouten RJ, Hill RC (1970) A statistical analysis of the relationship of physical status to postoperative mortality in 68,388 cases. Anesth Analg 49:564–566 14. Schuster DP (1992) Predicting outcome after ICU admission. The art and science of assessing risk. Chest 102: 1861–1870 15. Knaus WA, Zimmerman JE, Wagner DP et al (1981) APACHE-acute physiology and chronic health evaluation: a physiologically based classification system. Crit Care Med 9:591–597 16. Knaus WA, Draper EA, Wagner DP et al (1985) APACHE II: a severity of disease classification system. Crit Care Med 13:818–829 17. Harrell FE Jr, Lee KL, Mark DB (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 15:361–387 18. Wisner DH (1992) History and current status of scoring systems for critical care. Arch Surg 127:352–356 19. Knaus WA, Wagner DP, Draper EA et al (1991) The APACHE III prognostic system. Risk prediction of hospital mortality for critically ill hospitalized adults. Chest 100:1619–1636 20. Lemeshow S, Le Gall JR (1994) Modeling the severity of illness of ICU patients. A systems update. JAMA 272: 1049–1055
526 21. Berger MM, Marazzi A, Freeman J et al (1992) Evaluation of the consistency of Acute Physiology and Chronic Health Evaluation (APACHE II) scoring in a surgical intensive care unit. Crit Care Med 20:1681–1687 22. Beck DH, Taylor BL, Millar B et al (1997) Prediction of outcome from intensive care: a prospective cohort study comparing Acute Physiology and Chronic Health Evaluation II and III prognostic systems in a United Kingdom intensive care unit. Crit Care Med 25:9–15 23. Barie PS, Hydo LJ, Fischer E (1995) Comparison of APACHE II and III scoring systems for mortality prediction in critical surgical illness. Arch Surg 130:77–82 24. Beck DH, Smith GB, Pappachan JV et al (2003) External validation of the SAPS II, APACHE II and APACHE III prognostic models in South England: a multicentre study. Intensive Care Med 29:249–256 25. Rowan KM, Kerr JH, Major E et al (1993) Intensive Care Society’s APACHE II study in Britain and Ireland-II: outcome comparisons of intensive care units after adjustment for case mix by the American APACHE II method. BMJ 307:977–981 26. Rowan KM, Kerr JH, Major E et al (1993) Intensive Care Society’s APACHE II study in Britain and Ireland-I: variations in case mix of adult admissions to general intensive care units and impact on outcome. BMJ 307:972–977 27. Pappachan JV, Millar B, Bennett ED et al (1999) Comparison of outcome from intensive care admission after adjustment for case mix by the APACHE III prognostic system. Chest 115:802–810 28. Le Gall JR, Loirat P, Alperovitch A et al (1984) A simplified acute physiology score for ICU patients. Crit Care Med 12:975–977 29. Le Gall JR, Lemeshow S, Saulnier F (1993) A new Simplified Acute Physiology Score (SAPS II) based on a European/North American multicenter study. JAMA 270: 2957–2963 30. Beck DH, Smith GB, Taylor BL (2002) The impact of lowrisk intensive care unit admissions on mortality probabilities by SAPS II, APACHE II and APACHE III. Anaesthesia 57:21–26 31 Nuttall P. The passionate statistician (Florence Nightingale). Nurs Times. 1983 Sep 28-Oct 4;79(39):25–7 32. Lemeshow S, Teres D, Klar J et al (1993) Mortality Probability Models (MPM II) based on an international cohort of intensive care unit patients. JAMA 270: 2478–2486 33. Rowan KM, Kerr JH, Major E et al (1994) Intensive Care Society’s acute physiology and chronic health evaluation (APACHE II) study in Britain and Ireland: a prospective, multicenter, cohort study comparing two methods for predicting outcome for adult intensive care patients. Crit Care Med 22:1392–1401 34. Livingston BM, MacKirdy FN, Howie JC et al (2000) Assessment of the performance of five intensive care scoring models within a large Scottish database. Crit Care Med 28:1820–1827 35. Boyd O, Grounds RM (1993) Physiological scoring systems and audit. Lancet 341:1573–1574 36. Iezzoni L (1997) Risk adjustment for measuring health care outcomes. Health Administration Press, Chicago 37. Khuri SF, Daley J, Henderson W et al (1997) Risk adjustment of the postoperative mortality rate for the comparative
V. G. Hadjianastassiou et al. assessment of the quality of surgical care: results of the National Veterans Affairs Surgical Risk Study. J Am Coll Surg 185:315–327 38. Copeland GP, Jones D, Walters M (1991) POSSUM: a scoring system for surgical audit. Br J Surg 78:355–360 39. Wijesinghe LD, Mahmood T, Scott DJ et al (1998) Comparison of POSSUM and the Portsmouth predictor equation for predicting death following vascular surgery. Br J Surg 85:209–212 40. Neary WD, Heather BP, Earnshaw JJ (2003) The physiological and operative severity Score for the enumeration of mortality and morbidity (POSSUM). Br J Surg 90:157–165 41. Whiteley MS, Prytherch DR, Higgins B et al (1996) An evaluation of the POSSUM surgical scoring system. Br J Surg 83:812–815 42. Prytherch DR, Whiteley MS, Higgins B et al (1998) POSSUM and Portsmouth POSSUM for predicting mortality. Physiological and operative severity score for the enumeration of mortality and morbidity. Br J Surg 85:1217–1220 43. Prytherch DR, Ridler BM, Beard JD et al (2001) A model for national outcome audit in vascular surgery. Eur J Vasc Endovasc Surg 21:477–483 44. Prytherch DR, Sutton GL, Boyle JR (2001) Portsmouth POSSUM models for abdominal aortic aneurysm surgery. Br J Surg 88:958–963 45. Tekkis PP, Prytherch DR, Kocher HM et al (2004) Development of a dedicated risk-adjustment scoring system for colorectal surgery (colorectal POSSUM). Br J Surg 91:1174–1182 46. Tekkis PP, McCulloch P, Poloniecki JD et al (2004) Riskadjusted prediction of operative mortality in oesophagogastric surgery with O-POSSUM. Br J Surg 91:288–295 47. Bann SD, Sarin S (2001) Comparative audit: the trouble with POSSUM. J R Soc Med 94:632–634 48. Lloyd H, Ahmed I, Taylor S et al (2005) Index for predicting mortality in elderly surgical patients. Br J Surg 92:487–492 49. Prytherch DR, Ridler BM, Ashley S (2005) Risk-adjusted predictive models of mortality after index arterial operations using a minimal data set. Br J Surg 92:714–718 50. Prytherch DR, Sirl JS, Weaver PC et al (2003) Towards a national clinical minimum data set for general surgery. Br J Surg 90:1300–1305 51. Tang T, Walsh SR, Prytherch DR et al (2007) VBHOM, a data economic model for predicting the outcome after open abdominal aortic aneurysm surgery. Br J Surg 94:717–721 52. Hadjianastassiou V (2007) VBHOM, a data economic model for predicting the outcome after open abdominal aortic aneurysm surgery. Br J Surg 94:1308 53. Hadjianastassiou VG, Tekkis PP, Goldhill DR et al (2005) Quantification of mortality risk after abdominal aortic aneurysm repair. Br J Surg 92:1092–1098 54. Hadjianastassiou VG, Tekkis PP, Athanasiou T et al (2007) External validity of a mortality prediction model in patients after open abdominal aortic aneurysm repair using multilevel methodology. Eur J Vasc Endovasc Surg 34:514–521 55. Hadjianastassiou VG, Tekkis PP, Athanasiou T et al (2007) Comparison of mortality prediction models after open abdominal aortic aneurysm repair. Eur J Vasc Endovasc Surg 33:536–543 56. Daley J, Forbes MG, Young GJ et al (1997) Validating riskadjusted surgical outcomes: site visit assessment of process and
38
Risk Stratification and Prediction Modelling in Surgery
structure. National VA Surgical Risk Study. J Am Coll Surg 185:341–351 57. Khuri SF, Daley J, Henderson W et al (1995) The National Veterans Administration Surgical Risk Study: risk adjustment for the comparative assessment of the quality of surgical care. J Am Coll Surg 180:519–531 58. Khuri SF, Daley J, Henderson W et al (1998) The Department of Veterans Affairs’ NSQIP: the first national, validated, outcome-based, risk-adjusted, and peer-controlled program for the measurement and enhancement of the quality of surgical care. National VA Surgical Quality Improvement Program. Ann Surg 228:491–507 59. Samy AK, Murray G, MacBain G (1994) Glasgow aneurysm score. Cardiovasc Surg 2:41–44 60. Samy AK, Murray G, MacBain G (1996) Prospective evaluation of the Glasgow Aneurysm Score. J R Coll Surg Edinb 41:105–107 61. Prance SE, Wilson YG, Cosgrove CM et al (1999) Ruptured abdominal aortic aneurysms: selecting patients for surgery. Eur J Vasc Endovasc Surg 17:129–132 62. Tambyraja AL, Fraser SC, Murie JA et al (2005) Validity of the Glasgow Aneurysm Score and the Hardman Index in predicting outcome after ruptured abdominal aortic aneurysm repair. Br J Surg 92:570–573 63. De Ritis G, Giovannini C, Picardo S et al (1995) Multivariate prediction of in-hospital mortality associated with surgical procedures. Minerva Anestesiol 61:173–181 64. Lee J (1986) An insight on the use of multiple logistic regression analysis to estimate association between risk factor and disease occurrence. Int J Epidemiol 15:22–29 65. Harrell FE Jr, Lee KL, Matchar DB et al (1985) Regression models for prognostic prediction: advantages, problems, and suggested solutions. Cancer Treat Rep 69:1071–1077 66. Picard RR, Berk KN (1990) Data splitting. Am Stat 44:140–147 67. Hadorn DC, Draper D, Rogers WH et al (1992) Crossvalidation performance of mortality prediction models. Stat Med 11:475–489 68. Efron B, Tibshirani R (1993) An introduction to the Bootstrap. Chapman & Hall, New York 69. Cox DR (1972) Regression models and life-tables. J R Stat Soc B 34:187–202 70. Bennett N (1976) Teaching styles and pupil progress. Open Books, London 71. Aitkin M, Anderson D, Hinde J (1981) Statistical modelling of data on teaching styles (with discussion). J Royal Statist Soc A 149:148–161 72. Spiegelhalter DJ, Aylin P, Best NG et al (2002) Commissioned analysis of surgical performance by using routine data: lessons from Bristol inquiry. J R Statist Soc A 165:1–31 73. Marshall EC, Spiegelhalter DJ (1998) Reliability of league tables of in vitro fertilisation clinics: retrospective analysis of live birth rates. BMJ 316:1701–1704; discussion 1705 74. Kee F, Wilson RH, Harper C et al (1999) Influence of hospital and clinician workload on survival from colorectal cancer: cohort study. BMJ 318:1381–1385 75. McCulloch P, Ward J, Tekkis PP (2003) Mortality and morbidity in gastro-oesophageal cancer surgery: initial results of ASCOT multicentre prospective cohort study. BMJ 327: 1192–1197 76. Beck DH, Smith GB, Pappachan JV (2002) The effects of two methods for customising the original SAPS II model for
527 intensive care patients from South England. Anaesthesia 57:785–793 77. Moreno R, Miranda DR, Fidler V et al (1998) Evaluation of two outcome prediction models on an independent database. Crit Care Med 26:50–61 78. Cologne JB, Shibata Y (1995) Optimal case-control matching in practice. Epidemiology 6:271–275 79. Rosenbaum PR (1989) Optimal matching in observational studies. J Am Stat Assoc 84:1024–1032 80. D’Agostino RB Jr (1998) Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med 17:2265–2281 81. Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55 82. Rubin DB (1980) Bias reduction using mahalanobis-metric matching. Biometrics 36:293–298 83. Drew PJ, Monson JR (2000) Artificial neural networks. Surgery 127:3–11 84. Cross SS, Harrison RF, Kennedy RL (1995) Introduction to neural networks. Lancet 346:1075–1079 85. Ramesh AN, Kambhampati C, Monson JR et al (2004) Artificial intelligence in medicine. Ann R Coll Surg Engl 86:334–338 86. Werbos P (1974) Beyond regression: new tools for prediction and analysis in the behavioral sciences. Harvard University Press, Cambridge 87. Burke H (1994) Artificial neural networks for cancer research: outcome prediction. Sem Surg Oncol 10:73–79 88. Sawyer MD (2000) Invited commentary: artificial neural networks – an introduction. Surgery 127:1–2 89. Wei JT, Zhang Z, Barnhill SD et al (1998) Understanding artificial neural networks and exploring their potential applications for the practicing urologist. Urology 52:161–172 90. De Laurentis M, Ravdin PM (1994) A technique for using neural network analysis to perform survival analysis of censored data. Cancer Lett 77:127–138 91. Baxt WG (1995) Application of artificial neural networks to clinical medicine. Lancet 346:1135–1138 92. Dybowski R, Gant V (1995) Artificial neural networks in pathology and medical laboratories. Lancet 346:1203–1207 93. Dybowski R, Weller P, Chang R et al (1996) Prediction of outcome in critically ill patients using artificial neural network synthesised by genetic algorithm. Lancet 347: 1146–1150 94. Wong LS, Young JD (1999) A comparison of ICU mortality prediction using the APACHE II scoring system and artificial neural networks. Anaesthesia 54:1048–1054 95. Bottaci L, Drew PJ, Hartley JE et al (1997) Artificial neural networks applied to outcome prediction for colorectal cancer patients in separate institutions. Lancet 350: 469–472 96. Wyatt J (1995) Nervous about artificial neural networks? Lancet 346:1175–1177 97. Moreno R, Apolone G, Miranda DR (1998) Evaluation of the uniformity of fit of general outcome prediction models. Intensive Care Med 24:40–47 98. Miller ME, Hui SL, Tierney WM (1991) Validation techniques for logistic regression models. Stat Med 10:1213–1226 99. Wasson JH, Sox HC, Neff RK et al (1985) Clinical prediction rules. Applications and methodological standards. N Engl J Med 313:793–799
528 100. Charlson ME, Ales KL, Simon R et al (1987) Why predictive indexes perform less well in validation studies. Is it magic or methods? Arch Intern Med 147:2155–2161 101. Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36 102. Harrell FE Jr, Califf RM, Pryor DB et al (1982) Evaluating the yield of medical tests. JAMA 247:2543–2546 103. Weijnen CF, Numans ME, de Wit NJ et al (2001) Testing for Helicobacter pylori in dyspeptic patients suspected of peptic ulcer disease in primary care: cross sectional study. BMJ 323:71–75 104. Schafer JH, Maurer A, Jochimsen F et al (1990) Outcome prediction models on admission in a medical intensive care unit: do they predict individual outcome? Crit Care Med 18:1111–1118 105. Apolone G, Bertolini G, D’Amico R et al (1996) The performance of SAPS II in a cohort of patients admitted to 99 Italian ICUs: results from GiViTI. Gruppo Italiano per la Valutazione degli interventi in Terapia Intensiva. Intensive Care Med 22:1368–1378
V. G. Hadjianastassiou et al. 106. Poses RM, Cebul RD, Collins M et al (1986) The importance of disease prevalence in transporting clinical prediction rules. The case of streptococcal pharyngitis. Ann Intern Med 105:586–591 107. Ivanov J, Tu JV, Naylor CD (1999) Ready-made, recalibrated, or remodeled? Issues in the use of risk indexes for assessing mortality after coronary artery bypass graft surgery. Circulation 99:2098–2104 108. Lemeshow S, Klar J, Teres D et al (1994) Mortality probability models for patients in the intensive care unit for 48 or 72 hours: a prospective, multicenter study. Crit Care Med 22:1351–1358 109. Zhu BP, Lemeshow S, Hosmer DW et al (1996) Factors affecting the performance of the models in the Mortality Probability Model II system and strategies of customization: a simulation study. Crit Care Med 24:57–63 110. Altman DG, Royston P (2000) What do we mean by validating a prognostic model? Stat Med 19:453–473 111. Teres D, Lemeshow S (1999) When to customize a severity model. Intensive Care Med 25:140–142
The Principles and Role of Medical Imaging in Surgery
39
Daniel Elson and Guang-Zhong Yang
Contents 39.1
Introduction ............................................................ 530
39.1.1 Imaging for Pre-Operative Planning ........................ 530 39.1.2 Intra-Operative Imaging ........................................... 532 39.1.3 Post-Operative Imaging............................................ 538 39.2
The Future of Surgical Imaging ............................ 540
39.3
Summary ................................................................. 542
References ........................................................................... 542
D. Elson () Department of Biosurgery and Surgical Technology, Institute of Biomedical Engineering, Imperial College London, London SW7 2AZ, UK e-mail: [email protected]
Abstract This chapter reviews the current uses of medical imaging in surgery, from pre-operative planning to recent advances in image-guided interventions and its future trends. Imaging has assumed an increasingly important role in surgery since the dawn of the X-ray era, moving from being a primarily diagnostic modality towards a therapeutic and interventional aid, facilitated by advances in surgical technology and the emergence of novel biomarkers, prostheses and targeted therapies. The continuing evolution of imaging techniques, particularly in bringing laboratory-based tissue characterisation techniques to an in vivo, in situ setting will see a paradigm shift in how imaging is used in future surgery. It is feasible that advanced keyhole techniques will be based entirely on image guidance, seamlessly integrating pre-operative data with intra-operative tissue morphology and function. To provide an overview of the current and future trends of medical imaging in surgery, this chapter first summarises the main modalities used for pre-operative imaging in surgery, including ultrasound, X-rays/CT, magnetic resonance imaging (MRI) and radionuclide techniques. In each section, the current state of the art is described, together with their potential future developments. Next, the current use of intra-operative imaging and image-guided intervention is described, outlining the existing techniques and current major developments in progress. This includes endoscopy, optical-imaging methods, stereotactic frameworks, augmented reality (AR) and robotic-assisted interventions.For further information on medical imaging, we refer the reader to the texts by Suetens [1], Bankman [2], Sonka [3] and Webb [4], as well as reviews on current surgicalimaging methods [5, 6].
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_39, © Springer-Verlag Berlin Heidelberg 2010
529
530
39.1 Introduction The main purpose of medical imaging in conventional surgery is to make the diagnosis process easier and more accurate. This generally involves the use of preoperative images such as ultrasound, MRI or computed tomography (CT) to help visualise the target anatomy and formulate an intervention strategy before the commencement of surgery. The type of tissue contrast available with different imaging modalities determines their suitability for a particular surgical task, and understanding the strengths and weaknesses of each method is an important component of surgical training. In general, using medical imaging for surgery can be broadly divided into the following categories: • Pre-operative planning – to establish the operating target and its surrounding anatomical structures for planning out detailed access and operating trajectories; it also involves the establishment of framerelated coordinate systems based on anatomical or fiducial markers for linking the patient and imaging coordinate systems; • Intra-operative guidance – to provide real-time surgical guidance by providing anatomical and functional information about the operating site; increasingly, imaging is used in surgery for providing in situ tissue structure and function characterisation such that the direct outcome of the surgical procedure can be assessed. AR is one important form of intra-operative guidance whereby the imaging data are combined with the exposed anatomical surface to provide “seethrough” vision of the operating site; • Post-operative assessment – to assess the efficacy of the surgical procedure and the long-term benefit of therapeutic processes; for patients implanted with vessel prostheses or those who have undergone cancer surgery, serial image assessment provides an important monitoring tool for outcome management. These different aspects of imaging in surgery and their relationship are illustrated in Fig. 39.1. With recent advances in bio-mechanical modelling, imaging is increasingly used to provide subject-specific information for pre-operative and intra-operative simulation such that each surgical step is simulated, pre-rehearsed and optimised before being applied to the patient.
D. Elson and G.-Z. Yang
39.1.1 Imaging for Pre-Operative Planning Common imaging techniques for providing pre-operative planning include ultrasound, X-rays/CT, MRI and radionuclide techniques. Ultrasound has become the preferred non-invasive imaging method because of the high degree of safety associated with its normal use and the good contrast observed from soft tissues. It is able to image organ boundaries and internal organ structures, including changes caused by tumour growths. A major advantage of ultrasound is that it allows live imaging of the tissue at video rate and it is portable and cost-effective. For ultrasound imaging, the resolution of the technique scales as the inverse of the acoustic frequency, which for modern equipment is generally less than 1 mm (∼2.5 MHz). However, as the acoustic frequency is increased, the penetration depth of the ultrasound is reduced because of increased absorption, so there is an inherent compromise in imaging resolution and penetration depth. The main applications of ultrasound, particularly with the recent introduction of 3D techniques, are in the initial investigation of cardiovascular, hepatobiliary, pancreatic, renal or gynaecological problems, or in obstetrics to visualise the foetus.
Fig. 39.1 A schematic diagram showing how imaging is used for pre-operative planning, intra-operative guidance and postoperative assessment in surgery, as well as the key processes linking these components
39 The Principles and Role of Medical Imaging in Surgery
Ultrasound may also provide clear contrast of regions filled with fluids or drainage of abscesses. X-rays are high-energy electromagnetic waves that only interact very weakly with tissue, although different tissues will absorb the radiation to varying degrees. Images of the radiation that has passed through the body can be recorded after the X-rays have directly or indirectly interacted with a phosphor screen that emits light. This light can be recorded on photographic film or a digital camera or can be amplified by an image intensifier. X-ray imaging is the most common form of pre-operative planning, and in conventional radiology, the contrast in the images is generated by differences between soft tissue (mainly water), bone, fat and air or gas. This method is typically used for musculoskeletal imaging as bone absorbs X-rays more strongly than soft tissue, and the images can be used to detect fractures, tuberculosis of the spine, benign and malignant tumours, osteoarthritis, rheumatoid arthritis and gout. To improve the contrast available in X-ray imaging, iodine (for intravenous and intraarterial angiography) or barium sulphate (for investigation of the GI tract) may be administered, as these atoms highly absorb X-rays. The interpretation of the 2D projections produced in X-ray imaging can be difficult, especially in anatomically complicated structures and overlapping tissues, and it is common to use X-ray CT in these situations. In CT, the absorption of the tissue is recorded as the X-ray source and a one-dimensional array of detectors are rotated around the patient’s body, resulting in a number of one-dimensional projections recorded at different orientations of the body (called sinograms). In CT, the X-rays are collimated so that only a thin slice of the patient is probed during each acquisition. Image reconstruction can then be performed to create a cross-sectional image, where each pixel of the image represents the absorption of a given region of the crosssection. This whole procedure can then be repeated at different slice positions to build up a 3D volumetric model of the patient. Common applications of CT include investigations in staging of various tumours, calcification, lung cancer, cerebral infarction, haemorrhage and abscesses, complex fractures, pulmonary diseases such as emboli and detection of acute and chronic pancreatitis. MRI allows the density of nuclei in the tissue, particularly hydrogen found in water, to be imaged. The main advantages of MRI lie in the excellent soft tissue
531
contrast it provides, the absence of absorption and shadow artefacts caused by the presence of bones and also the absence of ionising radiation required for CT/ radionuclide methods. The intrinsic contrast of the MR signal can be programmed to reflect proton density and different relaxation effects of the tissue. It is emerging as an important tool for pre-operative planning because of its safety, versatility and the high-quality images it produces that allow accurate and reproducible quantification of structure and function such as blood flow and perfusion [2, 3, 1, 4]. However, the method is unsuitable for patients with metallic implants. Also, care must be taken not to bring ferromagnetic objects near to the scanner during surgical intervention as they can be forcibly attracted by the strong magnetic fields. Applications of MRI include cardiovascular, brain, spine and joint imaging, where it may be used to discriminate between subtle variations in soft tissue with excellent spatial resolution. Other applications include the detection of intracranial tumours, knee cartilage tears, lumbar disc disease and the staging of malignant tumours. It is also possible to add contrast agents including intravenous gadolinium to reveal tissue perfusion, soft tissue tumours and perform MR angiography. Nuclear techniques such as SPECT and PET involve the application of a radioactive isotope (tracer) that allows the measurement of tissue function, for instance, perfusion, metabolism and innervations [1]. The tracer is usually injected intravenously, and the molecule will tend to localise along a particular metabolic process depending on its chemical properties. The tracer is radioactively unstable and decays, subsequently leading to the emission of a gamma ray. Depending on the radiopharmaceutical selected, many different organs or functions may be investigated, including hepatobiliary, and lung perfusion and bone to be imaged. These include localisation of tumour tissue by the enhanced metabolism associated with the tumour, investigation of gastrointestinal bleeding, myocardial function and perfusion, lung embolism. In general, the resolution is a few millimetres. In SPECT, one or more gamma rays are emitted that are detected by a gamma camera, which may be rotated around the patient to reconstruct the tracer concentration distribution in 3D using analogous techniques to those in CT scanning. In PET, a different class of radioisotopes are used that emit two oppositely directed gamma rays after decay. These gamma rays are detected simultaneously on two
532
detectors from a photomultiplier array that surrounds the patient and the requirement for two gamma rays to trigger a detection event makes the technique more sensitive than SPECT. It should be noted that each imaging modality has its own advantages and disadvantages, and it is becoming more common to acquire images from multiple modalities to improve the diagnostic capabilities and pre-operative planning. In the simplest case, images acquired by different techniques may be examined individually to allow the assessment of complementary information provided. For accurate surgical planning, it requires the registration of images acquired from different modalities by taking into account global motion as well as local tissue deformation of the patient so that complementary image information can be fused together. In imaging, there is now an increasing drive in developing fully integrated multi-modality scanners, such as PET/CT and PET/MR to accelerate functional imaging and simplify the co-registration process. One of the key applications of imaging for preoperative planning is to perform subject specific modelling and use virtual reality (VR) simulation to assess the potential consequence of different surgical approaches. These modelling schemes also incorporate biomechanical factors of the tissue and its response to given surgical procedures. An example of how subject-specific biomechanical modelling based on preoperative imaging can be used for pre-operative interventional planning is in examining the levator ani to understand its role in pelvic floor dysfunction. Injury to this muscle is often caused by childbirth and can lead to pain, constipation and faecal or urinary incontinence. The interventional treatment of this condition is difficult due to the lack of quantitative assessment of the complex function of the muscle before surgery [7]. An example MRI image set showing the structure of the muscle is shown in Fig. 39.2, which allows subsequent segmentation of the levator ani so that the dynamics can be studied and modelled with finite-element analysis [8]. This modelling may also allow the surgeon to practice pre-operatively in a VR environment with artificial instruments that provide haptic feedback (see later sections). VR is also used in pre-operative imaging and planning during virtual endoscopy and bronchoscopy, where a virtual fly-through of the organ can be simulated by computing the required rendered views from
D. Elson and G.-Z. Yang
the 3D tomographic data. One example of the use of this is in virtual colonoscopy, where the morphology of the colon may be scanned for polyps without requiring the invasive procedure of real colonoscopy. Another example of these techniques is to create patient-specific, high-fidelity simulation data sets for training bronchoscopists and gastrointestinal endoscopists, based on real bronchoscopy images and 3D CT data from the same patient [9]. 2D/3D registration is first used to match the bronchoscopy images to the CT scan. View and illumination invariant surface texture is then extracted to allow virtual bronchoscope images to be generated from any viewing directions, as shown in Fig. 39.3. In the future, these methods are expected to expand into surgical training devices for many different laparoscopic and endoscopic procedures, offering significantly improved realism over the current training methods and allowing a safer learning environment for trainee surgeons.
39.1.2 Intra-Operative Imaging The images acquired pre-operatively are generally used to formulate the overall surgical strategy before actual surgical procedures. During surgical intervention, significant changes in tissue morphology and topology can occur. For effective navigation, particularly for seeing beyond the exposed surgical field of interest, real-time intra-operative imaging is essential. In recent years, the popularity of MIS has motivated a surge of interest in developing novel image-guided surgical techniques. This is because, in MIS, one has to navigate and operate in tight, constrained space through the use of flexible or rigid endoscopes. In many cases, stereo vision is lost and navigation is further complicated by constraints of MIS instruments such as the fulcrum effect and loss of tactile and force feedback. For intra-operative imaging, the following issues need to be considered: • Real-time adaptive imaging – to develop real-time imaging techniques that are compatible with different surgical settings. In MIS, this requires the development of miniaturised imaging probes that can be integrated with the MIS instruments or endoscopes. • Co-registration with pre-operative data – the use of real-time imaging for intra-operative guidance
39
The Principles and Role of Medical Imaging in Surgery
533
a
b
c
d
e
f
g
h
i
Fig. 39.2 Example MR images recorded with an open access MR scanner showing the levator ani (indicated with the white arrow) from anterior to posterior (a–c). The open nature of the scanner also allows for imaging of pelvic floor manoeuvres – at rest (d), at contraction upwards (e) and at strain downwards (f). The SNR for this open scanner is relatively low when compared
with standard magnetic resonance imaging (MRI), but it allows for imaging of pelvic floor manoeuvres while the subject is in a natural upright position. Figures (g–i) show the results of finiteelement simulation of the levator ani at rest, at contraction, and at strain, respectively, where the colour represents the Von Mises strain
imposes constraints on the information content of the image data. Most real-time imaging techniques can only provide a limited field-of-view and depth penetration of the surgical area of interest. It is therefore important to map intra-operative data to the preoperative frame of reference so that effective navigation and real-time guidance can be achieved.
• Visualisation, and modelling for real-time guidance – the use of imaging guidance requires the fusion of imaging data with the normal visual field of the surgical site. This is a challenging issue that involves dealing with graphics rendering, data abstraction, human interfacing and perception fidelity. To map pre-operative data to the surgical scene, it also
534
D. Elson and G.-Z. Yang
a 2D/3D registration
Patient examination via bronchoscope
Video bronchoscope image
3D CT scan Patient specific model
Illumination independent texture map
Surface reconstruction of bronchial tree
b Virtual endoscopy frames
Real in vivo video
Fig. 39.3 (a) Real bronchoscopy image captured from the video stream output. Radial distortion is corrected and then the bidirectional reflectance distribution function is calculated using the
known bronchoscope lighting conditions. From this, photo-realistic renderings may be calculated (b) from a lighting-independent shading map of the tissue registered with CT information
requires the handling of tissue deformation and changes in topology. This entails real-time biomechanical modelling and visualisation, which is a key topic of AR.
orthopaedic procedures when positioning prostheses, or the placement of cardiovascular devices such as pacemakers, stents and catheters. Lower energy X-rays are used in this technique because of the longer patient exposure times, and it becomes necessary to use an image intensifier to improve the detection efficiency and image quality. It is also possible to use a C-arm CT scanner that rotates around the patient so that complete tomographic images can be recorded during surgery without requiring the larger standard CT scanner, which would restrict the surgical procedure. These systems can be used to provide up-to-date images during the intervention.
39.1.2.1 Real-Time Adaptive Imaging X-ray imaging is perhaps the most established intraoperative imaging modality, in which case the technique is referred to as fluoroscopy. In this configuration, images are acquired continuously while a procedure is performed under image guidance, for instance, during
39 The Principles and Role of Medical Imaging in Surgery
In MRI, there is a significant effort to develop open access and interventional MR systems. Using openaccess MRI allows increased movement and mobility of the patient. This can be used for image-guided interventional procedures also potentially with robotic guidance. These applications include, for example, the use of a biopsy needle, or a laser or cryogenic catheter for thermotherapy. A number of precautions must be exercised if MRI is to be used for image-guided surgery, including the use of MRI-compatible surgical instruments, screening of any RF producing sources and care over the placement of electrical leads within the RF field, which may cause localised “hot spots” [1]. Magnetic fields used in these interventional scanners are often lower, which is actually a benefit for performing longer procedures as it minimises the radiofrequency energy absorbed by the surgeon and patient. An example of the use of MRI for providing image guidance in an interventional therapy is shown in Fig. 39.4, which illustrates the use of focussed ultrasound therapy to treat a uterine fibroid. The ultrasound wave is focussed to a small focal volume, causing a localised heating effect, which causes cell damage. MRI sequences are able to detect the temperature rise and thereby provide real-time feedback and image guidance of the local dose to allow the most effective
Transducer
Fibroid
Treated area Foam pad Fig. 39.4 Sagittal T2-weighted MR image demonstrating sonication pathway from the acoustic transducer into the uterine fibroid during focussed ultrasound surgery. The planned area of thermocoagulation is demonstrated by the red polygon centred within the fibroid, and the pitch of the transducer is illustrated by the green box. A foam pad allows good sonic contact between the patient and the transducer
535
treatment with less collateral damage. This treatment has also been demonstrated in the brain, liver and breast, and the example in the figure illustrates an image from a study into the effect of administering gonadotrophin releasing hormone before focussed ultrasound application. More recently, ultrasound probes have been miniaturised for access to more restricted regions such as intravascular ultrasound (IVUS) and endoscopic ultrasound (sometimes called endosonography). IVUS has potential applications in the diagnosis of different types of atherosclerotic plaques, especially for determining the relative thickness of the cap and the size of the underlying lipid pool, which are indicators of the likelihood of plaque rupture. These ultrasound transducer arrays are mounted on the end of a probe with approximately 1-mm cross-section and are inserted following guidewires similar to those used in angiographic techniques. Endoscopic ultrasound may prove useful in MIS techniques as a complementary method to opticalimage guidance, which provides underlying tissue structure and function, as well as the relative position of the surgical instruments. Providing an intuitive feedback of this information to the operator is a major challenge to the practical application of the technology. Endoscopic ultrasound may be used to detect lesions in the oesophagus or to image the heart through the oesophagus wall. Similar endoluminal probes also exist for transvaginal and transrectal scanning. Intraoperative, image-guided surgery using MRI and CT may also become more common through the collocation of MRI, CT, X-ray angiography, and the operating theatre [10]. State-of-the-art MRI and CT can provide a 3D display of soft tissue, whereas X-ray angiography can help one visualise vascular structures, and images from these different modalities may be registered or superimposed for diagnosis and/or for treatment. With the recent advances in miniaturised optical and ultrasonic imaging probes, the conventional endoscopes used for MIS are also rapidly evolving. Endoscopy was the first imaging method used in surgery and was initially demonstrated using a tube inserted into the body with a candle mounted at the end to illuminate the internal tissues. Modern-day endoscopy arrived with the creation of the rigid endoscope systems involving a series of lenses, and subsequently the rod lens system created by Hopkins in 1951. This allowed high-quality images to be transmitted out of the body using a rigid tubular device similar in design
536
to laparoscopes and arthroscopes currently in clinical use, with good light transmission, image resolution, field-of-view and magnification. The development of fibre image guides that transmit images along a flexible bundle of fibres allowed curved paths to be traced for GI investigation and surgery. More recently, fibre image guides have been replaced by miniaturised digital camera chips that can be mounted at the tip of the endoscope, permitting increased flexibility and potential application in Natural Orifice Translumenal Endoscopic Surgery (NOTES). These cameras have also been incorporated into small ingestible capsules that are able to pass through the entire GI tract through natural processes, allowing images to be recorded and wirelessly transferred out of the body. Although much endoscopic surgery is for investigative and screening purposes, surgical instruments can be inserted through specially designed biopsy ports in the instruments, allowing biopsy specimens to be excised, e.g., in monitoring Barrett’s oesophagus in gastro-oesophageal reflux disease. Endoscopic GI investigation has also partially replaced Barium meals and enemas for CT imaging, thereby reducing the exposure to radiation. Interventional endoscopy is commonly used, for example, for performing tissue biopsies for different cancer-screening programmes, removing suspicious polyps from the colon, and excision of tumourous tissue from the lung.
39.1.2.2 Co-Registration with Pre-Operative Data The registration of pre-operative images to the patient to guide surgery was originally developed for neurosurgery owing to the relative immobility of the brain within the skull, and the possibility of using the skull as a fixed reference to define a coordinate system in which the surgery can be performed [6]. These types of neurosurgery usually use a fixed frame that is attached to the skull using screws and are called stereotactic techniques, allowing instruments to be guided through small excisions in the skull to the surgical site using a precise geometrical framework. The images that are used for planning are generally CT, PET or MRI, and registration must be performed to orient these preoperative images with the stereotactic frame attached to the patient.
D. Elson and G.-Z. Yang
To avoid having to invasively attach the stereotactic frame directly to the scull, various other registration methods have been developed for neurosurgery, including registration using anatomical features on the patient’s head, or a number of artificial features known as fiducial markers that are glued to the skin. These techniques are often combined with robotic devices that are designed to precisely measure the position of the fiducial markers before the intervention to allow the registration to be made. This may be a cumbersome and time-consuming process, so instead optical methods may be used to detect the position of the fiducials in 3D by using calibrated stereo camera systems and reflective markers, which also allows the positions to be updated if there is any patient movement. The camera system is also able to track the position of optical markers attached to the instrument, so that precise control over the tissue is possible. There are now commercial systems available for spinal surgery that use a camera-based optical tracking and infra-red markers fixed to the instruments or fixed to the patient’s bone. There are also equivalent systems that use electromagnetic tracking of markers. An example tracking system is shown in Fig. 39.5, demonstrating the coordinate transforms required in co-registering different fiducial markers in aligning pre-operative data with the patient and surgical instruments. For soft tissues, it is desirable to have an intraoperative imaging method available that can allow the pre-operative plan to be updated based on the current situation. Current examples of intra-operative imaging include the use of white-light endoscopy, angiography, fluoroscopy and ultrasound. In some cases, these live images may be used to modify the pre-operative images so that an updated plan can be formulated. One example of this is the use of stereo endoscopy, which allows the shape of the tissue surfaces to be accurately determined, and subsequently matched to the pre-operative image. This is an example of 2D/3D registration, where a 2D view is mapped onto a 3D surface, or a live 2D view of the tissue is used to register a pre-operative 3D data set by warping the pre-operative data to the current tissue state [11]. To this end, ultrasound has been proposed as an imaging modality that can be used for live 2D/3D registration by the use of landmarks in the pre-operative MRI and live ultrasound images. These allow the MRI images to be warped to match the ultrasound images, which can also be extended for 3D ultrasound registration.
39 The Principles and Role of Medical Imaging in Surgery Fig. 39.5 Example tracking system using optical tracking of markers to register the instrument with a preoperative MRI scan. In this illustration, the camera system is able to track the position of the optical markers attached to the surface of the patient and to the laparoscopic instruments. These optical markers are also visible in a pre-operative MRI scan. This allows the instruments, pre-operative data and patient coordinates to be mapped into the laboratory frame through the transformations illustrated so that the image registration can be performed
537
Coordinate transformation between the optical training device and pre-operative image space
Pre-operative MRI scan
Coordinate transformation between instrument and optical tracking device
Optotrak camera
Tracked tools
Reference markers
39.1.2.3 Visualisation, Modelling and Augmented Reality Using intra-operative imaging alleviates the need for invasive fiducials and provides more accurate operational constraints for the instrument. AR is one of the emerging applications that may help in planning, guidance, intervention and follow-up, and is based on enhancing the surgeon’s normal view of the tissue by adding additional diagnostic or guidance information [6]. Some implementations involve the use of a head-mounted display that contains miniature screens that display different images to the wearer’s eyes. These screens may be made to be semitransparent so that the wearer is able to see both the real world and the information on the screens simultaneously. One of the challenges is to accurately transform the pre-operative image data into the surgeon’s frame of reference, which requires accurate tracking of the position of the head-mounted display and a constant update of the images displayed on it. These head-mounted displays may be cumbersome for the surgeon and there are also difficulties in registration and allowing both the real world and augmented view to be observed, and the technology is not yet commonly used. In endoscopy the alignment of the real and virtual objects depends on the
Coordinate transformation between patient and optical tracking device
accurate tracking of the position and orientation of the viewing source with respect to the object of interest. The complexity of tissue deformation during surgery therefore imposes a major challenge to the AR display. One new approach in AR called inverse realism aims to improve the depth perception of embedded image information behind the exposed tissue surface in 3D stereoscopic displays. The technique gives the impression of a semi-transparent tissue surface with some important features retained, and a clearer impression of the underlying augmented pre-operative image information. This is demonstrated in Fig. 39.6, which illustrates two stereo image pairs based on pq-space, one showing the tissue surface, and one revealing the embedded object [12]. This technique contrasts with the traditional overlaid object information by using a non-photorealistic rendering of the scene. The application of AR in surgery is expected to accelerate with the recent introduction of roboticassisted MIS procedures where registration and overlay may be simpler to achieve due to the known geometrical relationships between the endoscope and the tissue. This may in the future allow surgical visualisation and control to be integrated into one intuitive surgical console.
538
D. Elson and G.-Z. Yang
a
b
c
d
Fig. 39.6 The use of inverse realism in the da Vinci surgical robot. Images (a) and (b) are a stereo image pair showing the exposed tissue (lung) surface. Images (c) and (d) are a stereo image pair showing a “see-through” representation of an underlying object based on pq-space non-photorealistic rendering. The application of augmented reality (AR) in surgery is expected
39.1.3 Post-Operative Imaging Post-operative imaging is important for following up the surgery to assess the quality or efficacy of the procedure. As in pre-operative imaging, the availability of X-ray imaging meant that post-operative follow-up using imaging was initially used for skeletal repairs, such as monitoring the healing of broken bones. At present, the imaging methods that are used consist of all of those mentioned in Sect. 2, increasingly including MRI and ultrasound for longer term or sustained follow-up with the advantage of not requiring the use of ionising radiation. This post-operative imaging is
to accelerate with the recent introduction of robotic-assisted MIS procedures where registration and overlay may be simpler to achieve because of the known geometrical relationships between the endoscope and the tissue. This may in the future allow surgical visualisation and control to be integrated into one intuitive surgical console
typically performed for investigating the potency and success of the surgery, for instance, when inspecting completeness of tumour removal or the quality of a vessel anastomosis. Often, serial image assessment provides an important monitoring tool for outcome management, which requires that the images acquired at different times can be effectively registered and compared. Similar registration methods may be used for pre- and intra-operative image registration. For postoperative imaging, it is expected that MRI will take a key role in clinical practice because of its versatility and the high-resolution images that it produces. Recent advances in MRI have resulted in the development of a
39 The Principles and Role of Medical Imaging in Surgery
broad range of new techniques for providing detailed structural and functional information on most parts of the body. Research is increasingly relying on many different sequences coupled with novel targeted contrast agents to achieve accurate diagnosis and follow-up. For the assessment of atherosclerosis, for example, complementary information concerning arterial remodelling, vulnerability of atherosclerotic plaque and the role of inflammation is increasingly being used. With the current capabilities of MRI, it is possible to characterise the lipid core, fibrous cap, normal media and adventitia in atherosclerotic plaques in vivo. It is also possible to characterise intra-plaque haemorrhage and acute thrombosis, which are essential for post-surgical examination after vascular intervention. Post-operative imaging is also important for monitoring implantable devices and prostheses so that biocompatibility and autoimmune response issues may be continuously monitored and further intervention or drug therapy implemented as required. The onset of such responses may vary widely from subject to subject or depending on the type of implant used, and monitoring may continue indefinitely. One example of this is illustrated in Fig. 39.7, which reveals the
a
539
build-up in calcium present on aortic valves with time for homografts and Medtronic Freestyle valves [13], which is believed to be a cause of stenosis and regurgitation after implantation. These images were recorded using a variation on CT imaging called electron-beam tomography (EBT). This is used in tomography of the beating heart because the electron beam is swept electronically rather than mechanically, which allows complete scans in less than 25 ms, fast enough to image the beating heart. Images of the valves are presented in Fig. 39.7, showing a slice of the 3D volume (a) before and (b) after the injection of the contrast agent. These images were subsequently registered using a hierarchic 3D free-form volume registration to account for respiratory motions between the data sets. This permits calcification scores to be calculated as illustrated in Fig. 39.7 (c–f) for two different subjects at two follow-up periods. The subject with a homograft (c–d) exhibits calcifications both at the suture locations and across the flaps. In the Freestyle (e–f) valves, the calcifications were mainly located at the suturing sites, implying that the function of these valves will not be significantly affected by calcification after 24 months [14].
c
e
d
f
Attenuated calcium signal
Valve leaflets
b
Fig. 39.7 Bronchoscope images in (a) standard white light (b) narrow band imaging (NBI) and (c) fluorescence imaging modalities
540
D. Elson and G.-Z. Yang
The image information may also be complemented with implantable sensing devices to allow continuous monitoring of relevant analytes or signals. Although imaging and extensive measurement of biomechanical and biochemical information is available in almost all hospitals, this diagnostic and monitoring utility is generally limited to brief time intervals and unrepresentative physiological states such as being supine and sedated, or via artificially introduced exercise tests. Transient abnormalities, in this case, cannot always be captured. The last decade has witnessed a rapid surge of interest in new sensing and monitoring devices for healthcare and one key development in this area is implantable in vivo monitoring and intervention devices. With the rapid development in sensing technology, it is possible to envisage a large percentage of the population having permanent implants, which would provide continuous monitoring of the most important physiological parameters for identifying the precursors of major adverse events. Such technological development should have a profound impact on surgical technology and the way how imaging and sensing is integrated for post-surgical care.
39.2 The Future of Surgical Imaging With the increasing miniaturisation of imaging and sensing devices, the future of surgical imaging is in the integration of real-time sensing and imaging techniques with surgical instruments by bringing laboratory-based tissue characterisation techniques to an in vivo, in situ setting. For example, there have been further developments in a
b
Fig. 39.8 Electron beam tomography (EBT) images (a) before and (b) after the application of a contrast agent, and before 3D registration. Images showing calcification of (c) and (d) a
endoscopy that are aimed at increasing the functional diagnostic information available to the user, over the existing colour and texture information provided by white-light reflectance imaging. Some examples of these techniques are fluorescence endoscopy, scattered light endoscopy and narrow band imaging (NBI). Many of these techniques are also aimed specifically at detection of cancer, either for tumour boundary demarcation or in screening for the pre-invasive forms of the disease. This is expected to permit screening programmes and earlier therapy for populations known to have an increased risk of a particular cancer type, and more effective removal of tumours leading to lower remission rates. In fluorescence endoscopy, the tissue is excited by short wavelength (blue or violet) light, and subsequently fluorescence light is emitted at a range of different wavelengths in the green and red part of the spectrum. Different tissues and molecules within the tissue emit differently coloured light, and by analysing the emitted spectrum it is possible to contrast certain changes in the tissue associated with disease. These techniques have been shown to be sensitive to changes in the epithelial layer associated with pre-invasive cancer, and there is potential to use fluorescence endoscopy (or scattered light endoscopy, which is sensitive to cellular morphologic changes in the epithelium) as a screening method for patients with Barrett’s oesophagus. This will reduce the requirement to perform multiple tissue biopsies of the Barrett’s affected areas. Endoscopes using simple applications of fluorescence are now available from a number of commercial suppliers including Olympus, Storz, and Richard Wolf. Example images from a bronchoscope are shown in Fig. 39.8, illustrating whitelight, NBI and fluorescence imaging. c
homograft and (e) and (f) a Medtronic Freestyle valve showing spatial distribution of calcification on the valve flaps
39 The Principles and Role of Medical Imaging in Surgery
Optical coherence tomography (OCT) is another emerging optical technique that is able to non-invasively record the structure of the tissue up to a depth of a few millimetres with a resolution of a few micrometres or less [14]. It can in some respects be considered as an optical equivalent to ultrasound as the light reflected off different structures is timed, and the reflected signal allows a depth-resolved slice to be reconstructed if the probe is scanned laterally across the surface. In real instruments, a number of different acquisition methods are used including frequencydomain techniques [15]. Although the main application to date has been in ophthalmology, both in imaging the cornea and the different layers of the retina, the technique has been demonstrated in endoscopy, where the structures within the tissue epithelium and stroma can be distinguished in oesophagus and colon, for instance. These probes are being commercialised although currently the resolution is not high enough to be able to resolve the cellular morphologic changes that precede invasive cancer. NBI is a recent adaptation to white-light endoscopy that uses spectral filters to select only a specific colour of light that is reflected from the tissue. It is a simple modification to standard endoscopy, but it produces images that contain some depth information, since longer wavelengths penetrate deeper into the tissue on average. This may for instance be used to better distinguish blood vessels located beneath the tissue surface. Alternatively, the superficial structure may be preferentially imaged by using shorter wavelength light to assist in screening of Barrett’s
541
oesophagus by observing the capillary and mucosal layers (see Fig. 39.8). One final development that will have an impact on surgical imaging is confocal endoscopy. Confocal fluorescence microscopy has revolutionised the field of cell biology because of its ability to acquire sectioned images of cells that have been fluorescently labelled with an antibody-bound fluorophore, or genetically modified to express GFP, and thus elicit inter- and intra-cellular processes in live cells [16]. The miniaturisation of this technology by Mauna Kea and Olympus allows confocal probes to be used in GI endoscopy and bronchoscopy to image cell and nuclear membranes and give information on the morphology. This often currently involves the addition of exogenous fluorophores that stain or preferentially accumulate in a particular cellular, sub-cellular or structural component, but developments in the technology may in future allow the autofluorescence signal from the tissue to be detected. Applications may also emerge in tumour margin detection because the confocal endoscopes reject light outside of the focal plane and are therefore not as sensitive to scattered light from deeper in the tissue. There are also other emerging optical-imaging modalities that use other interactions between light and tissue to provide a tissue-specific signal, including Raman, elastic scattering, diffuse scattering and fluorescence lifetime, and further information on these techniques can be found in [17]. One example of the type of enhanced contrast that may be achieved is shown in Fig. 39.9, which presents a white-light
Fig. 39.9 White light reflection and fluorescence lifetime images of freshly resected unstained tissue illustrating contrast between liver and a metastatic colorectal cancer
542
reflection image and a fluorescence lifetime image of fresh human liver tissue containing metastatic colorectal cancer, showing healthy liver, radiofrequency ablation injury and the invasive cancer. These techniques may be adapted into articulated instruments to provide improved navigation and manipulation through curved pathways, as well as to provide increased diagnostic information and visualisation through the use of AR. This is expected to overcome some of the current limitations that are encountered with NOTES, an emerging method for scarless minimally invasive surgery that uses small incisions made through the internal wall of the GI or vaginal tracts or the bladder wall. Through the manipulation of complex flexible endoscopes containing large ports for the insertion of endoscopic surgical instruments, this will enable various minimally invasive procedures to be attempted. Current challenges in this field include the construction and safe removal of the translumenal ports while maintaining peritoneal sterility, the navigation along unguided curved paths and the manipulation of the endoscopic surgical instruments. Extensive experiments have recently been carried out on animal models and the first human transvaginal cholecystectomy has been completed by a team in the University Hospital Strasbourg, France. It is very likely that industrial and academic innovation in device engineering will encourage further in vivo trials and eventually clinical uptake for many procedures. From an imaging perspective in particular, additional device guidance methods based on localisation relative to the surrounding anatomy may be achieved using intraoperative ultrasound or fluoroscopy similar interventional radiology.
39.3 Summary In this chapter, we have introduced the imaging modalities that are currently used for pre-operative surgical planning and described how they may also be used for image-guided surgery. These image-guided surgical approaches build on the increasing computational power available to register live and pre-operative images and allow better navigational aids to the surgeon. Using AR and VR may also provide a more
D. Elson and G.-Z. Yang
immersive environment that can assist the surgeon and overcome some of the current limitations of minimally invasive surgery. It is anticipated that, over the coming years, these techniques will be increasingly applied to soft tissue surgery, where currently the registration methods are confounded by the movement of the tissue. Acknowledgements The authors gratefully acknowledge assistance from Adrian Chung, Su-Lin Lee, Danail Stoyanov and Mirna Lerotic in the production of the figures in this chapter.
References 1. Seutens P (2002) Fundamentals of medical imaging. Cambridge University, Cambridge 2. Bankman I (2002) Handbook of medical imaging: processing and analysis. Academic, Burlington, MA 3. Beutel J, Sonka M (2000) Handbook of medical imaging, Vol 2. Medical image processing and analysis. SPIE–The International Society for Optical Engineering, Bellingham WA 4. Webb AG (2002) Introduction to biomedical imaging. WileyIEEE, Chichester 5. Russell RCG, Williams NS, Bulstrode CJK (2004) Bailey AND love’s short practice of surgery. Hodder Arnold, London 6. Peters TM (2006) Image-guidance for surgical procedures. Phys Med Biol 51:R505–R540 7. Lee SL, Darzi A, Yang GZ (2005) Subject specific finite element modelling of the levator ani. Med Image Comput Comput Assist Interv Int Conf Med Image Comput Comput Assist Interv 8:360–367 8. Lee SL, Horkaew P, Caspersz W et al (2005) Assessment of shape variation of the levator ani with optimal scan planning and statistical shape modeling. J Comput Assist Tomogr 29:154–162 9. Chung AJ, Deligianni F, Shah P et al (2006) Enhancement of visual realism with BRDF for patient specific bronchoscopy simulation medical image computing and computer-assisted intervention – MICCAI 2004: 7th International Conference. pp 486–493 10. Matsumae M, Koizumi J, Fukuyama H et al (2007) World’s first magnetic resonance imaging/X-ray/operating room suite: a significant milestone in the improvement of neurosurgical diagnosis and treatment. J Neurosurg 107: 266–273 11. Maintz JB, Viergever MA (1998) A survey of medical image registration. Med Image Anal 2:1–36 12. Lerotic M, Chung AJ, Mylonas G et al (2007) Pq-space based non-photorealistic rendering for augmented reality. Med Image Comput Comput Assist Interv Int Conf Med Image Comput Comput Assist Interv 10:102–109
39
The Principles and Role of Medical Imaging in Surgery
13. Melina G, Horkaew P, Amrani M et al (2005) Threedimensional in vivo characterization of calcification in native valves and in freestyle versus homograft aortic valves. J Thorac Cardiovasc Surg 130:41–47 14. Fujimoto JG (2003) Optical coherence tomography for ultrahigh resolution in vivo imaging. Nat Biotechnol 21: 1361–1367 15. Fercher AF, Drexler W, Hitzenberger CK (2003) Optical coherence tomography – principles and applications. Rep Prog Phys 66:239–303
543 16. Hoffman RM (2005) The multiple uses of fluorescent proteins to visualize cancer in vivo. Nat Rev Cancer 5: 796–806 17. Vo-Dinh T (2003) Biomedical photonics handbook. CRC, New York 18 Elson D, Requejo-Isidro J, Munro I et al (2004) Timedomain fluorescence lifetime imaging applied to biological tissue. Photochem Photobiol Sci 3 795–801
How to Read a Paper
40
Hutan Ashrafian and Thanos Athanasiou
Contents 40.1
Introduction ............................................................ 545
40.2
The Conceptual Basis of A Scientific Paper ......... 546
40.3
Reasons to Read a Research Journal.................... 546
40.4
The Psychology of Reading a Paper ..................... 546
40.5
Originality ............................................................... 547
40.6
Types of Paper and Quality of Evidence .............. 548
40.7
Core Components ................................................... 549
40.8
Title .......................................................................... 550
40.9
Authorship and Ancillary Information ................ 550
40.10 Abstract/Summary ................................................. 551 40.11 Introduction/Backround ........................................ 551 40.12 Materials and Methods .......................................... 551 40.13 Results, Tables, Figures.......................................... 552 40.14 Discussion ................................................................ 552 40.15 Acknowledgements and Declarations ................... 553 40.16 References and Bibliography ................................ 553 40.17 Supplementary files ................................................ 553 40.18 Conference Discussion ........................................... 553 40.19 Editorial .................................................................. 553 40.20 The importance of assessing a paper .................... 553 40.21 Conclusions ............................................................. 554 References ........................................................................... 554
H. Ashrafian () Department of Biosurgery and Surgical Technology, Imperial College London, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, 10th Floor, St. Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
Abstract To adequately equip surgeons with the required skills necessary to successfully read a paper, one needs not only time and energy but also a core level of experience. This chapter aims to classify the components that make up a scientific paper with the goal of presenting the audience to some of the analytical concepts that will enable the successful reading of a surgical paper.
40.1 Introduction Scientific papers are the most favoured vehicles through which research is communicated. Each manuscript has been specifically designed to allow the reader to understand why a research question was addressed, how this was done and what the implications are for the newly discovered results. As a result, unlike the text of a novel, wherein a story develops in sequential order, the text of a scientific manuscript is totally different, wherein it objectively states a problem that needs solving, and states how the authors went about solving it. Thus, the process of reading a scientific text varies significantly from normal prose, and requires a consistent application of both analytical and critical faculties. Although surgical papers in print would have undergone a process of peer-review, the ultimate responsibility of assessing published material lies with the reader. To adequately equip surgeons with the required skills necessary to successfully read a paper, one needs not only time and energy but also a core level of experience. This chapter aims to classify the components that make up a scientific paper with the goal of presenting the audience to some of the analytical concepts that will enable the successful reading of a surgical paper.
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_40, © Springer-Verlag Berlin Heidelberg 2010
545
546
H. Ashrafian and T. Athanasiou
40.2 The Conceptual Basis of A Scientific Paper Some authors suggest that not only are research papers a simple method by which to communicate scientific discoveries, but they themselves can be considered an innovation that permitted scientific progress to occur through the transmission and interaction of information [8]. This concept is not unlike the theory that human thought capacity increased following a developmental improvement in our speech ability (vocalisation) through the evolutionary migration of our larynx further down in the neck when compared with other ape species [15]. Thus, the more we develop and read our scientific papers, the more advances we can make in research. It can be considered that “all knowledge is the result of imposing some kind of order upon the reactions of the psychic system as they flow into our consciousness” [13]. Therefore, an imposition of order and decrease in chaos on our sensory perception is what leads to meaningful information. In keeping with this concept, we read scientific literature in order to decrease the uncertainty and chaos inherent in our current state of knowledge and therefore increase our personal information. Claude E. Shannon in his 1948 paper “A Mathematical Theory of Communication” [19] introduced the concept of information entropy (a measure of chaos) summarised in the equation below: n
H s = - K å p j log 2 p j j =1
where Hs provides a mathematical measure of disorder that may exist in a quantity of given information. K is a constant, and pj is the probability of finding one particular piece of information from among a subset of data. The role of a research paper is to alter the constant K in order to minimise and decrease the uncertainty and chaos in our current state of understanding to a state of better “more meaningful” understanding.
40.3 Reasons to Read a Research Journal Although research journals are a means of communicating scientific papers, and thereby information, we read these in surgery not for academic interest alone, but more importantly for how we can apply these findings to treat our patients. These reasons can be classified into the following (Fig. 40.1):
40.4 The Psychology of Reading a Paper In his seminal work on studying how scientists read scientific papers, educationalist Charles Bazerman closely studied and interviewed seven physicists on how they discerned information from the academic papers that they read [4]. His work was based on the premise that scientific reading habits are affected by psychological and
Molecular biology
To Understand Disease Mechanism
Pathophysiology Disease aetiology
Improving patient managenent
Translational medicine Technical
Identifying ‘Best Care’ Identifying ‘Gold Standard’ procedures
New surgical procedures New diagnostic procedures Treatments
Reading a Surgical Journal
To improve Clinical Treatment
Comparing
Diagnostic Modalities Individuals and Units from top units
Learning
from experienced individuals from the mistakes of others
Academic vanity
Other
As a news source within subspeciality Advertising/applying for employment
Fig. 40 .1 Reasons for reading a surgical journal
40
How to Read a Paper
547 directed by scientist’s own research needs
Reasons to read manuscripts
Necessary periodic scanning of relevant sources
Choice of Paper Trigger words noticed during ‘scanning’ of manuscripts
Mechanism of choice
25% from single word in title 75% from other triggers in the manuscript
Word of mouth Directly found in relevant database (e.g. PubMed) Individuals or Institution who carried out research
Fig. 40.2 Choosing a scientific paper
Research close to researcher’s own field
Speedy selective reading to reveal new facts Reading relies heavily on personal methodological experience Unexpected finding triggers more in-depth reading
In-depth reading of whole paper
Understanding a Paper If poorly written
Increased effort of reading Numerous re-reading of manuscript
Research not directly related to researcher’s own
In-depth reading of whole paper
Fig. 40.3 Mechanisms of understanding a scientific paper
sociological variables, and further came up with the concept that all scientists have a dynamic knowledgebased mind-map (or schema) that can be built upon and expanded by information and data from new papers. Here, he analysed each individual’s choice of paper (Fig. 40.2), identifying that these are picked by personal research needs and the necessary self-updating for each scientist’s own particular speciality. “Must reading” was found to be proportional to the amount of research available in the relevant field. Understanding a paper (Fig. 40.3) relies on whether the manuscript’s subject is close to that of the reader’s own speciality. Increased familiarity of a subject to a reader will allow faster information gathering and a more complete understanding of the paper. However, if the paper is poorly written, then it is obvious that an increased effort for reading will be required, and an increased time requirement to assimilate the information presented.
40.5 Originality Important scientific discoveries are frequently a result of their originality, although this is a notoriously
difficult concept to measure. To help quantify the degree of originality of the data from a new publication, Lynn Dirk, a specialist in science communication, has proposed a method to measure and score originality in a scientific manuscript [9]. Using this technique, each paper is broken down into three component units of hypothesis–methods–results. Each of the three components is assigned a value of originality of “P” – Previously reported or “N” – New. This then allows each paper to have an originality score for each of the hypothesis–methods–results subsections that put together can concisely reflect the originality of the paper subsections. For example, if all three components of hypothesis–methods–results were new, then the paper would be scored as N–N–N, whereas if all three components were previously known, then they would be scored as P–P–P. Using this typology, eight combinations of originality can be assigned to a scientific paper. Dirk went on to perform a mail survey on 301 scientists, 68% of whom responded. They rated papers selected from the “Citation Classics” in Current Contents® – Life Sciences over a 5 year period (Table 40.1), demonstrating that this technique can be used to attain useful insights into assessing the originality of scientific papers.
548
H. Ashrafian and T. Athanasiou
Table 40.1 Originality scoring and frequency of highly cited papers from Current Contents® – life sciences Originality score: hypothesis Frequency (%) + methods + results N+N+N
15
N+N+P
1
N+P+P
4
N+P+N
43
P+N+P
3
P+N+N
11
P+P+N
11
P+P+P
13
Many papers combine these for research elements to varying extents, and as a result, a number of research paper types are used to try and communicate this varied data. Types of surgical research include topics that can: • Assess or improve upon surgical treatments • Assess or improve upon surgical disease diagnosis and screening • Elucidate underlying surgical disease aetiology and pathophysiology • Assess or improve upon surgical skills and training • Assess or reduce surgical errors
N, new; P, previous (Adapted from Dirk [9]).
40.6 Types of Paper and Quality of Evidence Scientific research is not a homogeneous entity and can be broadly categorised into four main types (Fig. 40.4):
Typical types of surgical paper are catalogued below in Fig. 40.5. In clinical research, scientific papers can be assessed by their “quality of evidence”, which can improve with increased subject numbers and randomisation of both patients and treatments (thereby decreasing the likelihood of false results). The traditional hierarchy of evidence in clinical papers has been (in descending order, with the most important first) [20]:
Testing a specific Hypothesis Breaks down a postulation into its component parts
Analytical
Improving research techniques
Methodological
Introducing research techniques
Scientific Research Listing the findings of a study
Descriptive
Formulating a hypothesis
Statistical analysis
Fig. 40.4 Types of scientific research
Comparative
Defining an ‘effect size’
Comparative (with 2 or more groups) Local National International Failure Mode and Effect Analysis (FMEA) Other safety assessment tool studies
Retrospective (Survey) Guidelines
Case-control studies Cohort studies (most)
Case report Non-Comparative Case series Cross-sectional survey Cohort studies Case-control studies
Clinical safety
Parallel group comparison
Decision Interview-based studies Observational study Behavioural/Psychological studies Error Economical Clinical Skills Assessment Simulator-based assessment Behavioural/Psychological studies
Matched comparison
Qualitative Comparative (with 2 or more groups) Analysis
Surgical Papers
Randomised control trials (RCTs) Prospective (Clinical Trial)
Within-participant comparison Single Blinded Double Blinded Crossover Placebo control Factorial design
Quantitative Case series Non-Comparative Cross-sectional survey Systematic Non-systematic Meta-analysis
Molecular biology Reviews Experimental
Fig. 40.5 Catalogue of surgical research papers
Physiology New surgical technique Novel/New surgical technology (e.g. Robotics) Animal experimentation
40
How to Read a Paper
549
1. Systematic Reviews and Meta-analyses 2. Randomised Controlled Trials (RCTs) with Definitive Results (confidence intervals that do not overlap the threshold clinically significant effect) 3. RCTs with Non-Definitive Results (confidence intervals that do overlap the threshold clinically significant effect) 4. Cohort Studies 5. Case-controlled studies 6. Cross-sectional surveys 7. Case reports (only one or two patients) To further reveal the level of quality of a scientific paper, the Centre for Evidence-Based Medicine at Oxford has
come up wisth a classification for papers that grades them according to the level of evidence (Table 40.2) and subsequent grade of recommendation.
40.7 Core Components Before starting to read a paper, it is important not to miss vital “self-evident” information. This includes in which journal or internet site is the paper published, and what audience is it aimed for? Is for instance in a purely a surgical journal where the contents are intended
Table 40.2 Levels of evidence and grades of recommendation modified from the Oxford Centre for Evidence-based Medicine (May 2001) [17] Level Therapy/prevention, aetiology/harm Prognosis Diagnosis 1a
Systematic review (with homogeneity) of RCTs
Systematic review (with homogeneity) of inception cohort studies
Systematic review (with homogeneity) of level 1 diagnostic studies
1b
Individual RCT (with narrow confidence interval)
Individual inception cohort study with > 80% follow-up
Validating cohort study with good reference standards
1c
All or none studies
All or none case-series
Absolute SpPins and SnNouts
2a
Systematic review (with homogeneity) of cohort studies
Systematic review (with homogeneity) of either retrospective cohort studies or untreated control groups in RCTs
Systematic review (with homogeneity) of level greater than two diagnostic studies
2b
Individual cohort study (including low-quality RCT; e.g. < 80% follow-up)
Retrospective cohort study or follow-up of untreated control patients in an RCT
Exploratory cohort study with good reference standards
2c
“Outcomes” Research; Ecological studies
“Outcomes” Research
3a
Systematic Review (with homogeneity) of case-control studies
Systematic Review (with homogeneity) of 3b and better studies
3b
Individual Case-Control Study
Non-consecutive study; or without consistently applied reference standards
4
Case-series (and poor quality cohort and case-control studies)
Case-series (and poor-quality prognostic cohort studies)
Case-control study, poor or non-independent reference standard
5
Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”
Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”
Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”
Grades of Recommendation A
Consistent level 1 studies
B
Consistent level 2 or 3 studies or extrapolations from level 1 studies
C
Level 4 studies or extrapolations from level 2 or 3 studies
D
Level 5 evidence or troublingly inconsistent or inconclusive studies of any level
550
to be read only by one specialist group, or is it published in one of the internationally renowned medical journals such as The Lancet or The New England Journal of Medicine, whereby it might deliver a research message that carries a broader scope of interest. Furthermore, the section under which the paper is published alludes to type of research being presented. Common sections include case reports, clinical trials, reviews and meta-analyses. There are, however, some exceptions, and thus, for example, in the journal Nature, original research papers are divided into the categories of Articles or Letters, although the latter should not be confused with the totally separate Correspondence section. Scientific papers reporting empirical findings are traditionally structured by the IMRD system: Introduction, Methods, Results and Discussion [1]. This system is still currently in use, but has been expanded on, to add a variety of extra information for the scientific reader. To assess international reading strategies of IMRD articles, delegates at the 6th General Assembly and Conference of the European Association of Science Editors (EASE) were surveyed on their reading-order of paper subsections [7]. It was demonstrated that people rarely followed IMRD when reading as scientists (15%), but were more likely to use it if reading as reviewers (42%), and even more likely when reading as editors (56%). “Hard” scientists (physicists and chemists) used IMRD the most, incorporating this sequence in 48% of their reading strategies, biomedical scientists in 33.3% and social scientists the least at 17.8%. Although native-speakerhood can affect reading strategy, age does not seem to be a significant factor. A typical surgical paper is broken down into the following subsections: • Title • Author(s) • List of Departments and Institutions involved in the project • Abstract • Introduction or Background • Materials and Methods • Results • Figures and tables • Conclusion or Discussion • Acknowledgements • References or Bibliography • Declaration of conflicts of interest or sources of funding
H. Ashrafian and T. Athanasiou
• Supplementary data/documents/files • Conference Discussion (at a meeting where the paper may have been presented) The majority of surgical journals use the above format in the sequence listed, though some variation does exist. Thus, for example, the Results and Conclusion sections are sometimes combined, or the declaration of the conflicts of interest may appear earlier in the manuscript, or an overall summary may appear at the end of the paper. Furthermore, each journal is characterised by its own unique use of fonts, printing style and reference format. Once the identifying details of a paper have been elicited, one can discern a fair amount regarding the information that can be derived from that paper. For example, is it original research, has it been by invitation only, is it in a high-impact journal and what is the reputation of the author or the institution writing the manuscript? As tools to equip the reader to a scientific framework in which to appraise a paper, they can apply the well-known SQ3R (Survey, Question, Read, Recite and Review) [18] and PQRST (Preview, Question, Read, Summary, Test) [21] reading strategies.
40.8 Title This is a carefully chosen succinct “sound-bite” that has the dual purpose of attracting the reader’s attention to the manuscript’s topic, stating with the fewest possible words, the contents of the paper and the type of research carried out.
40.9 Authorship and Ancillary Information Each manuscript clearly identifies the contributing authors. The ancillary information further specifies the time and the academic unit from which the research was conducted and written. This helps the reader discern a number of factors. The date gives the reader a perspective on the modernity of the research. Furthermore, some authors and units may have built up a reputation in a particular field and thus their research can be considered in the context of their previous academic work, whilst also standing as an independent source of data.
40
How to Read a Paper
For surgical authorship, the most widely used author order is the classical “sequence-determines-credit” approach, where the first author is the individual who has participated most in the compilation and composition of the paper, whereas the last author is the person who has the most supervisory role on the work. However, the literature is suffused with controversy over authorship, as increasingly this has become synonymous with de facto “academic ownership” of the ideas expressed in the research paper. A recent study examined the instructions to contributors from 234 biomedical journals and revealed that only 21 (9%) journals specified individual authorship contributions to be described [24]. The International Committee of Medical Journal Editors (ICMJE) request authors to disclose their contributions. The ICMJE list three conditions that authorship credit should be based on [12]. They state that authors should meet conditions 1, 2 and 3: 1. Substantial contributions to conception and design, or acquisition of data, or analysis and interpretation of data 2. Drafting the article or revising it critically for important intellectual content 3. Final approval of the version to be published. However, many manuscripts in the medical literature list “Honorary authors”, who do not fulfil all the ICMJE criteria for authorship, and are cited in up to 60% of articles in some journals, accounting for up to 21.5% of all authors listed [3]. This has led to some confusion as how to credit each individual author on the work presented. Other factors to take into account include the communicating (or “corresponding”) author often, but not always taking the role of the lead author, and the fact that the author sequence might be allocated on the basis of non-scientific values. For example, in the United Kingdom, the Research Assessment Exercise (RAE), which determines governmental research funding, gives more credit to the final author than the second or sometimes co-author to a paper, which has led to further uncertainty in the listed author sequence, as these may sometimes be changed so as to favour individual and unit funding values [14]. There are a number of alternatives to the sequencedetermines-credit authorship: these include “equal contribution”, where all authors are listed alphabetically as equals; “percent-contribution-indicated”, where each author has his/her contribution quantified; and the “firstlast-author-emphasis”, where the first author gets the whole impact, the last gets 50%, and the remainder
551
divide the remaining impact [22]. There is currently no universally accepted authorship sequence in the scientific literature, and as a consequence, this has led to numerous propositions [23] for a unanimously accepted ranked sequence so as to specify individual contribution for each piece of work.
40.10 Abstract/Summary The abstract contains a succinct (one or two paragraphs) summary of the paper. It leaves out much of the technicalities of the paper, but identifies the research background and hypothesis, going on to highlight the most prominent results and conclusions. It therefore communicates the major points of a manuscript, and is helpful in distinguishing whether a paper is relevant to one’s field, but is also a good summary to re-read after having read the whole manuscript.
40.11 Introduction/Backround This section places the research question posed by the paper into context. In the first few lines, it will identify the fundamental knowledge in any particular field, and it will then focus on the particular area that the research is being performed. It essentially states the broad current knowledge to-date in that particular field, and will then go on to specify what still needs to be known, and how the subsequent research of the paper is relevant to this. It is in the latter part of the Introduction that the research hypothesis should be stated and this should be a logical progression from the pre-requisite facts detailed in the earlier part of the introduction. Important questions to ask are: 1. Do the stated facts follow a logical sequence? 2. Has a hypothesis of research question been clearly stated?
40.12 Materials and Methods Once the research question has been specified by the Introduction, the Materials and Methods go on to specify in detail how the question was answered. Depending on the research, whether scientific or statistical, each
552
H. Ashrafian and T. Athanasiou
step needs to be clearly outlined, detailing all the tools, techniques and instruments used. Typical examples would include listing, operative manoeuvres, methods of patient randomisation, power calculations, literature searches employed, laboratory equipment utilised or statistical software applied. As a result, this section explains what was done, and how this was archived. It also gives an idea of time period taken, and should give the reader enough data with which to repeat and replicate the exact experiments, should the reader choose to. Reproducibility of data is one of the cornerstones of modern science, and therefore as a result, it is vital that the Materials and Methods section be as accurate as possible in communication on how the experiments chosen were preformed. Important questions to ask are as follows:
6. Was adequate data presented to answer the research question? 7. What is the “quality” of the data/evidence? 8. Are the results valid? 9. Are there adequate controls? 10. How do the results compare with other similar studies? 11. Are there trends in the data that are not mentioned by the author? 12. Are there data that were not presented? 13. What are the practical implications? 14. Do the results offer areas for future research?
1. Are the methods and experiments salient to the questions asked? 2. How good are the chosen methods at answering the questions posed by the authors? 3. Are there better methods that the authors could have considered?
This section performs several functions. It allows the authors to analyse and interpret their results for the audience while also openly specifying any limitations to their study. The results can be placed in the context of the existing field, and they allow for specific conclusions to be made on the study. The work can also be compared with previously published work in the same field, so as to assess whether it agrees or contrasts with them. A broader meaning can be drawn from the stated conclusions and any further necessary research can be listed as a result of the study’s findings. For readers, however, the following questions need to be asked:
40.13 Results, Tables, Figures This section reveals the results of the questions posed in the Introduction. These data will be communicated in the form of words, tables and figures. This information is only stated in this section, and will be analysed in more depth in the Discussion section that follows. The results will list those of the experiments performed in order, and if the paper is clinical, it will begin with stating the patient demographics and baseline features. Text, tables and figures need to be carefully scrutinised, and taken into account while considering the methods used to achieve the results. All tables and figures should be put into a contextual framework in the corresponding text. Important questions to ask are as enlisted below: 1. 2. 3. 4. 5.
Is the data clearly presented? Is the data relevant to the study question? What are the major findings of the study? Are the results expected? Does the data support or conflict with the authors’ claims?
40.14 Discussion
1. What are the “real” conclusions of the work? 2. Is the study significant? 3. Are the conclusions logical and adequately stated? 4. Do the results stated support the conclusions? 5. Do the conclusions, methods and results all make sense when assessed together? 6. Do the results support or refute the stated hypothesis? 7. Are there any limitations to the study that are not listed? 8. Is the study relevant to other populations? 9. How do the conclusions from this study fit in with previous studies? 10. Is there sufficient data to back up the interpretation in the discussion section? 11. Are there any inconsistencies or unsubstantiated claims? 12. How reliable is the evidence in support of the discussion?
40
How to Read a Paper
553
13. What are the conclusions that can be applied to one’s own surgical practice? 14. How could the work have been done better? 15. What new hypotheses and further experiments can be suggested following what was found?
results, or maybe interpretable by only the super-specialist. These files are put at the end of the paper, and typically include long records of statistical analysis or long lists of primary data.
40.15 Acknowledgements and Declarations
40.18 Conference Discussion
Many papers have a short Acknowledgements section, where various contributions of colleagues and assistances who are recognised without having to list them as formal authors. This section is usually optional and typically considers individuals who have helped the research with technical advice or even artistic help. However, it also a location where one can specify sources of monetary support, and increasingly this section specifies the source of funding of a particular area of work. Sources of financial support can also be listed in the Declarations section, where any potential conflicts of interest are also mandatory. This, for example, one can openly discern whether a particular researcher benefited from one particular source such as a surgical equipment company, whilst also publishing positive results as to the efficacy of that equipment.
40.16 References and Bibliography This section lists the sources of information that were used in the paper. These are generally other published manuscripts, but occasionally can also be “unpublished data” or “personal communications”. The majority of these would have been used in the Introduction and Discussion sections, although they can also be mentioned in the Materials and Methods. They are rarely used in the Results section, as this part of the paper is by definition new. The references are typically listed at the end of a paper, and are usually displayed alphabetically or numerically in a format set by each individual journal.
40.17 Supplementary files This section allows researchers to communicate information and data that may be relevant to the paper, but may be too lengthy or complex for placement in the
Some papers are presented at national or international meetings before publication, and therefore once the paper is published, a short precis of the discussion by an expert panel that followed the presentation is also published alongside the paper itself. This allows the reader to gain insights from the thoughts and commnets on the paper by experts in the field.
40.19 Editorial Occasionally, a paper communicates results that are of notable importance or carry significant medical implications. In these cases, the editors of a paper may place a short accompanying piece to the paper, placing it in an appropriate context to the readers, whilst also highlighting its salient features. This can be written by the editors themselves, or may be written following an invitation by the editors for an independent expert in the field to comment on the manuscript.
40.20 The importance of assessing a paper Not all journals are considered equal in the scientific literature, as they do not all publish papers with the same scientific rigour. While some journals set stringent rules for scientific precision, others have less demanding regulations. This is reflected by the general rule that papers in stringent journals are cited more often than papers in the less demanding journals. This has led to journals being ranked according to their citation numbers per article, otherwise known as “impact factor”. Since its introduction in the 1960s, these rankings have to some extent modified editorship, as journals with higher impact factors are generally considered to be of higher academic standing [6].
554
In 2005, CA was the highest ranked impact factor journal with a score of 49.74, The New England Journal of Medicine had a score of 44.016, Science and Nature had scores between 29 and 31, and the highest ranked purely surgical journal was the Annals of Surgery at 6.33. In essence, therefore, much of the papers in these journals have been scrutinised before publication by the editorial staff, so as to ensure high levels of academic accuracy and integrity. This is coupled with the journals’ desire to achieve high-impact factors, many of the papers that are submitted go through an arduous process of review, assessment and modification before they are available for us to read. Nevertheless, the responsibility of reading a paper and making important conclusions in response to the results lies firmly with the individual readers. Some papers can give misleading information that has serious medical consequences. For example, a paper published in the high-impact journal The Lancet led to significant confusion and subsequent decrease in the use of the MMR (measles, mumps and rubella) vaccine, and as result of the subsequent disarray, the Editorial Board of The Lancet apologised for publishing a paper that misled its readers [10]. Although rare, other such cases may occur, and thus it is incumbent on the readers of any paper not to lose an air objectivity and scientific precision when reading manuscripts. Indeed, if flaws are picked up, or there are any potential objections, then these can be communicated through the medical literature (such as in the correspondence section of journals) or even the non-medical media should the need arise. When reading a paper, it is important to take into consideration that the manuscript may have been subjected to bias. This can be of three main types: • Methodological bias – papers are published where the data is flawed due to inaccuracies or mistakes in the research methods applied. • Submission bias – papers are only submitted when the scientists discern “positive” results from their work. For example, a surgical unit may only publish its data if there was a statistical difference between the end points of two compared procedures. • Publication bias – papers are only accepted by editors when there are “positive” results from the research presented. This would result in a wider readership, increased citations and therefore a higher impact for the journal.
H. Ashrafian and T. Athanasiou
Finally, when assessing a paper, it is important to consider if the results presented as “significant” are truly so. This requires knowledge of statistics, and a deeper understanding of the p value and confidence intervals (not in the scope of this chapter). If these values are not fully understood in a paper, then it is important for surgeons to make further endeavours in order to understand this, so as to better grasp the result of the paper.
40.21 Conclusions It has sadly become increasingly common that some doctors do not read research papers [2]. They cite difficult statistics, editorial inadequacy [11] and poorly written manuscripts [16] as the reasons behind this. Indeed, one former editor of the British Medical Journal is famed for quoting that only 5% of published research papers attain the standards of scientific soundness, and in most journals this figure would be less than 1% [5]. In order for scientific information to be adequately disseminated and used, an indispensable equilibrium exists between both readers and authors to participate in the communicative process. Each group needs to attain insight into the others’ role, but readers specifically need to place concerted effort into attaining the necessary skills with which to read, and successfully discern information from scientific papers in the ever expanding fields of medicine and surgery. These skills can be acquired by regular reading of scientific articles, the attendance of local and occasionally international conferences, and the prodigious application of objective thought and scientific reason.
References 1. Albert T (1995) How “Imradiation” has ruined the writing of scientists. Eur Sci 2. Barraclough K (2004) Why doctors don’t read. Br Med J 329:1411 3. Bates T, Anic A, Marusic M et al (2004) Authorship criteria and disclosure of contributions: comparison of 3 general medical journals with different author contribution forms. JAMA 292:86–88 4. Bazerman C (1985) Physicists reading physics: schemaladen purposes and purpose-laden schema. Written Commun 2:3–24 5. Boseley S. Medical studies 'rubbish'. The Guardian 1998, 24 June, p5
40
How to Read a Paper
6. Brown H (2007) How impact factors changed medical publishing–and science. Br Med J 334:561–564 7. Burrough-Boenisch J (1999) International reading strategies for IMRD articles. Written Commun 16:296–315 8. de Solla Price D (1981) The development and structure of the biomedical literature. In: Warren KS (ed) Coping with the biomedical literature: A primer for the scientist and the clinician. Praeger, New York, pp 3–16 9. Dirk L (1999) A measure of originality: the elements of science. Soc Stud Sci 29:765–776 10. Horton R (2004) A statement by the editors of The Lancet. Lancet 363:820–821 11. Ide CW (2005) Why doctors don’t read research papers: editors’ behaviour might have something to do with it. Br Med J 330:256 12. International_Committee_of_Medical_Journal_Editors_ (ICMJE) (2007) Publication Ethics: Sponsorship, Authorship, and Accountability. Available at: http://www.icmje.org/sponsor.htm 13. Jung CG (1981) The structure and dynamics of the psyche. Princeton University Press, Princeton 14. Laurance WF (2006) Second thoughts on who goes where in author lists. Nature 442:26 15. Lieberman P, Laitman JT, Reidenberg JS et al (1992) The anatomy, physiology, acoustics and perception of speech:
555 essential elements in analysis of the evolution of human speech. J Hum Evol 23:447–467 16. O’Donnell M (2005) Why doctors don’t read research papers: scientific papers are not written to disseminate information. Br Med J 330:256 17. Oxford_Centre_for_Evidence-based_Medicine (2001) Levels of Evidence. Available at: http://www.cebm.net/index.aspx? o = 1025 18. Robinson FP (1970) Effective study. Harper & Row, New York 19. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423, 623–656 20. Slade M, Priebe S (2001) Are randomised controlled trials the only gold that glitters? Br J Psychiatry 179:286–287 21. Spache GD, Berg PC (1966) The art of efficient reading. Macmillan, New York 22. Tscharntke T, Hochberg ME, Rand TA et al (2007) Author sequence and credit for contributions in multiauthored publications. PLoS Biol 5:e18 23. Vital MV (2006) Author lists: specify who did what to aid assessment. Nature 443:26 24. Wager E (2007) Do medical journals provide clear and consistent guidelines on authorship? MedGenMed 9:16
41
How to Evaluate the Quality of the Published Literature Andre Chow, Sanjay Purkayastha, and Thanos Athanasiou
Contents 41.1 Introduction ............................................................... 557 41.2 The Meaning of Quality............................................ 558 41.3 The Traditional Hierarchy of Evidence .................. 558 41.4 Measuring Quality .................................................... 559 41.5 Systematic Review and Meta-analysis..................... 559 41.6 Randomised Controlled Trials ................................. 560 41.7 Non-Randomised Trials ............................................ 560 41.8 Studies of Diagnostic Accuracy................................ 560
Abstract There is now an immense amount of literature available to clinicians to advise their practice. It is impossible to read and assimilate this information, and clinicians must know how to selectively pick and choose the evidence that they follow. Inherent to the practice of evidence-based medicine (EBM) is the ability to assess the quality of evidence, recognise high-quality evidence and act upon it. Not all EBM is of equal quality, and blindly following poor-quality evidence may in fact be detrimental to the care of our patients. This chapter describes the available tools that clinicians can use to assess the available evidence.
41.9 The Problems with Quality Assessment Tools ........ 564 41.10 Summary and Conclusions..................................... 565 References ........................................................................... 566
41.1 Introduction Modern surgery is rapidly evolving. Research is continually pushing forward the boundaries of our knowledge on the pathology, prevention, diagnosis and treatments of disease. With an ever more demanding and knowledgeable public, the demands on clinicians to be up to date with current practice is greater than ever. It is no longer acceptable simply to practice surgery based on “the way I learned it”. The practice of evidence-based medicine (EBM) is growing at an exceptional rate, and it is a practice that all clinicians must be aware of. EBM has been defined as [1] “….the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients.”
A. Chow () Department of Biosurgery and Surgical Technology, Imperial College London, QEQM Building, St. Mary’s Hospital Campus, 10th Floor, Praed Street, London W2 1NY, UK e-mail: [email protected]
There is now an immense amount of literature available to clinicians to advise their practice. It is nigh on impossible to read and assimilate this information, and clinicians must know how to selectively pick and choose the evidence that they follow. Not all EBM is of equal
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_41, © Springer-Verlag Berlin Heidelberg 2010
557
558
A. Chow et al.
quality, and blindly following poor-quality evidence may in fact be detrimental to the care of our patients.
as a rudimentary way of ranking the quality of evidence has emerged. The hierarchy orders quality of evidence in the following fashion, from highest to lowest:
41.2 The Meaning of Quality
• Systematic reviews and meta-analysis of RCTs • Systematic reviews and meta-analysis of other clinical trials • RCTs with definitive results • RCTs with non-definitive results (i.e. results that suggest a clinical difference but without reaching statistical significance) • Cohort studies • Case-control studies • Cross-sectional surveys • Case reports • “Expert” opinion and letters, etc.
Inherent to the practice of EBM is the ability to assess the quality of evidence, recognise high-quality evidence and act upon it. High-quality evidence should in theory provide better and more realistic estimates of treatment effects, as well as generating greater acceptance of their findings in the surgical community. The quality of research relates to the design, conduct, or analysis of a trial. It may also relate to its clinical relevance, or the quality of reporting [2]. Thus, when assessing trial quality, attention is paid to internal validity, external validity, as well as statistical analysis [3–5]. The quality of trials has been defined previously as [6] “…the likelihood of the trial design to generate unbiased results”
However, this definition only covers the concept of internal validity. A new definition of quality was proposed by Verhagen et al. [7], which states that quality is: “the likelihood of a trial design to generate unbiased results, that are sufficiently precise and allow application in clinical practice”
Variations of this hierarchy are widespread, including those published by the US Preventive Service Task Force (USPSTF) [8] and the Centre of EBM (CEBM) in Oxford, England [9]. Commonly, this hierarchy is also presented as a pyramid of evidence (Fig. 41.1). Of course, this hierarchical model is not absolute. A poorly conducted meta-analysis for example should not be considered to be of superior quality when compared with a well-conducted, large double-blind RCT. However, the RCT and meta-analysis or systematic review is still widely held as being the “gold standard” of surgical research.
41.3 The Traditional Hierarchy of Evidence Systematic Reviews and Meta-analyses
Research is presented in numerous ways from published hypotheses, through to laboratory based in vitro experiments, animal studies, and through to clinical experiments and trials on human subjects. Of course, it results from human trials that generate the most interest, and that have the greatest chance of being integrated into our daily practice. There are multitudes of ways in which research involving human subjects can occur. The most simple include observational studies of individual cases, and range to the most complex randomised controlled trials (RCTs) or meta-analyses. Based on the subtype of study design, a traditional hierarchy of evidence that is used almost unanimously
Randomized Controlled Double Blind Studies
Cohort Studies Case Control Studies
Case Series Case Reports Ideas, Editorials, Opinions Animal research In vitro (‘test tube’) research Fig. 41.1 The Pyramid of evidence. Adapted from Medical Research Library of Brooklyn - http://library.downstate.edu/ ebm/2100.htm
41
How to Evaluate the Quality of the Published Literature
41.4 Measuring Quality Quality assessment of the published literature can be a subjective process. The process of critical appraisal entails careful reading and interpretation of a study, including aspects of its design, analysis, and conclusions. The final judgement on quality is based on prior knowledge and experience and thus may differ between individuals. The use of an objective measure of quality would thus be useful. Beyond the basic hierarchy of evidence, there are many more detailed assessments of quality that have been created. These typically come in the form of quality assessment tools. These tools examine in detail varying aspects of a research publication to objectively assess its quality. The majority of tools created so far have assessed either the study methodology or the quality of reporting of the trial. The use of these assessment tools is becoming more common in demonstrating high-quality research. The tools used to assess quality generally can be split into two types, first of which are those that assess the presence or absence of key components. Examples of such key components include the process of randomisation in clinical trials. By including an effective randomising process within a trial, such as random computer number generation, as opposed to using dates of birth, the study can avoid selection bias. Similarly, the presence of allocation concealment, or blinding, further substantiates findings by reducing the effects of reporting bias and assessment bias. Assessing a trial for the presence of these two key components will allow assessment of the methodological quality of the trial, and thus the validity of its findings. Other quality assessment tools use scores or scales. In these tools, various items from the reporting or methodology of a trial are allocated a numerical score, which are then totalled, giving an overall quality score for the trial. Quality scoring is especially important in the fields of systematic review and meta-analysis. As these methods involve combining results from many separate primary studies to provide an overall estimate of effect, they are very susceptible to the quality of studies included in their analysis. Differences in study quality may lead to the introduction of bias. This has led to the widespread practice of quality adjustment, where the weight attributed to a study correlates with its quality. In many meta-analyses, studies of “low quality” are actually excluded from
559
analysis to avoid the introduction of bias. Often, the use of quality scores involves two separate assessors to score each included study. Differences in opinion on any particular study are then resolved by consensus to increase validity of the quality score.
41.5 Systematic Review and Meta-analysis Systematic reviews and meta-analysis are useful tools for the modern clinician. They are convenient ways for us to gain an overview of the current available evidence on a topic. A systematic review is a systematic summary of all the available evidence, with the intent of obtaining an unbiased and precise measure of the true magnitude and direction of the association between events that would be widely applicable [10]. The quantitative (statistical) pooling of estimates from individual studies is called a meta-analysis [10]. However, as with any research publication, systematic reviews and meta-analyses may still vary in quality, leading to bias and different estimates of effect for the same intervention. It has been noted that often metaanalyses report different effect estimates than those from subsequent large clinical trials [11–13]. Thus, readers of these publications must be aware that critical appraisal of systematic reviews and meta-analyses is still required, even if they are at the peak of the hierarchy of evidence. In 1987, Sacks et al. [14] examined the process of metaanalysis by examining 86 published meta-analyses. They judged each publication on 23 separate items from six domains that were thought to be important in the conduct and reporting of a meta-analysis on RCTs. These included study design, combinability, control of bias, statistical analysis, sensitivity analysis and problems of applicability. The group found that only 28% of the included meta-analyses addressed all six of these domains. More recently, Dixon et al. [15] assessed the methodological quality of 51 surgical meta-analyses using the Overview Quality Assessment Questionnaire (OQAQ). Overall, the group found that the quality of the included surgical meta-analyses was low. The studies frequently had major methodological flaws, especially in validity assessment, selection bias, reporting of search strategies, and pooling of data. They found that as the quality of the meta-analysis decreased, the estimation of treatment
560
effect increased, and concluded that studies of low quality therefore had results that may not be valid and that they may have a bias favouring the treatment in question. In 1999, Moher et al. [16] released the QUOROM (Quality of Reporting of Meta-analyses) statement. The QUOROM conference had been convened to address the issue of quality in meta-analysis of RCTs. The consensus statement from the conference included a checklist of standards for reporting of meta-analyses (Figs. 41.2, 41.3). The checklist described the preferred way to present the abstract, introduction, methods, results and discussion sections of a meta-analysis report. Under 21 subheadings, authors were encouraged to provide information on searches, selection, validity assessment, data abstraction, study characteristics, quantitative data synthesis and trial flow. Also included in the final statement was the need for a flow diagram providing information about the number of RCTs identified, included and excluded, along with reasons for exclusion. Although many meta-analysis quality assessment tools have been released, the QUOROM statement was the first to be agreed upon by the process of consensus. It is still widely used today by both authors and editors of journals to help ensure high-quality reporting of meta-analyses.
41.6 Randomised Controlled Trials The RCT is the design of choice for studies evaluating the effectiveness of healthcare interventions [17]. However, even these trials are not immune to bias, and strict attention to the trial methodology and its reporting is required to determine its quality. Previous studies have demonstrated that reports of low-quality RCTs, compared with higher quality ones, tend to overestimate the effectiveness of interventions by approximately 30% [18]. There have been numerous attempts at quality assessment tools, with Moher et al. [19] in 1995 identifying at least 24 criteria lists. Verhagen et al. [7] in 2001 estimating at least between 50 and 60. Examples of these assessment tools can be seen in Table 41.1. These assessments vary widely in terms of size, complexity, dimensions covered, and the weighting assigned to key domains such as randomisation and blinding. In 1996, Begg et al. [29] released the CONSORT (Consolidated Standards of Reporting Trials) statement, which has since been revised in 1999 [30]. The
A. Chow et al.
CONSORT statement was developed by an international group of clinical trialists, statisticians, epidemiologists, and biomedical editors. It has become a landmark statement on the quality of RCTs and is supported by many medical journals and editorial groups [30]. The statement itself comprises a checklist and flow diagram for the reporting of an RCT (Figs. 41.4, 41.5). Since publication of this statement, the quality of RCT reporting has appeared to improve [20], although reporting of several recommendations is still suboptimal [31].
41.7 Non-Randomised Trials Although it is widely accepted that the best experimental evidence is gleaned from RCTs, most of the available evidence in surgical practice is still obtained from non-randomised trials [21]. There are several scenarios in which an RCT is unnecessary, inappropriate, impossible or even inadequate [32]. Surgical research remains an area where often an RCT is neither possible nor feasible [33]. However, the results of non-randomised trials are subject to increased bias, and thus attention must be paid to aspects of trial quality. The Cochrane collaboration handbook identified the four major sources of systematic bias in non-randomised trials as being selection bias, performance bias, attrition bias and detection bias [34]. However, it is the presence of selection biases that most differentiates randomised from non-randomised studies. The effects of this bias can be unpredictable with either an overestimation or an underestimation of effect meaning that non-randomised studies may even observe statistically significant effects acting in the wrong direction [34]. Although the effect of selection bias can be partially remedied by statistical methods such as case-mix (or risk) adjustment, failure to recognise this failing of non-randomised studies may potentially lead to ill-judged application of evidence to everyday practice.
41.8 Studies of Diagnostic Accuracy One of the major driving forces of surgical progress is advancement in technological innovation. New techniques and technology has allowed the creation of
41
How to Evaluate the Quality of the Published Literature
561
Fig. 41.2 The QUOROM (Quality of Reporting of Meta-Analyses) Checklist
numerous diagnostic testing modalities, which have the potential to revolutionise diagnosis and investigation of disease.
Studies of diagnostic accuracy allow us to compare these new techniques with the current gold standards, including statistics such as sensitivity, specificity,
562
A. Chow et al.
Potentially relevant RCTs identified and screened for retrieval (n=...) RCTs excluded, with reasons (n=...) RCTs retrieved for more detailed evaluation (n=...) RCTs excluded, with reasons (n=...) Potentially appropriate RCTs to be included in the meta-analysis (n=...) RCTs excluded from metaanalysis, with reasons (n=...) RCTs included in meta-analysis (n=...) RCTs withdrawn, by outcome, with reasons (n=...) RCTs with usable information, by outcome (n=...)
Fig. 41.3 The QUOROM Flowchart
positive and negative predictive values, positive and negative likelihood ratios, diagnostics odds ratios, and receiver operating characteristic (ROC) curves [22]. The differing design characteristics of these studies meant that different quality assessment tools were required. Two quality assessment tools are well quoted in the literature for diagnostic accuracy studies, namely the STARD initiative [23] and QUADAS [22]. The STARD (Standards for Reporting of Diagnostic Accuracy) initiative was aimed at improving the accuracy and completeness of reporting of studies of diagnostic accuracy. The initiative was completed in 2003 following a consensus meeting, and consists of a checklist and flow diagram (Fig. 41.4). QUADAS (Quality Assessment of studies of Diagnostic Accuracy included in Systematic reviews) was also developed in 2003 via the process of consensus. It consists of a 14-point checklist assessing both the reporting and methodological quality of a study of diagnostic accuracy. Interestingly, the authors of QUADAS deliberately did not include a scoring system into the QUADAS framework. This was because they felt that the application of quality scores, without adequate consideration of individual quality items may dilute or miss potential associations [23].
Table 41.1 Commonly used quality assessment tools Author Name of Assessment tool
Year
Target Study
Jadad et al. [6]
N/A
1996
RCTs
Moher et al. [16]
QUOROM (Quality of Reporting of MetaAnalyses)
2000
Meta-Analyses of RCTs
Moher et al. [20]
CONSORT (Consolidated Standards for Reporting of Trials)
2001
RCTs
Slim et al. [21]
MINORS (Methodological Index for NonRandomised Studies)
2003
Observational or Non-Randomised Studies
Whiting et al. [22]
QUADAS (Quality Assessment of Studies of Diagnostic Accuracy included in Systematic Reviews)
2003
Studies of Diagnostic Accuracy
Bossuyt et al. [23]
STARD (Standards for Reporting of Diagnostic 2003 Accuracy)
Studies of Diagnostic Accuracy
Verhagen et al. [24]
The Delphi List
1998
RCTs
Shea et al. [25]
AMSTAR (Assessment of Multiple Systematic Reviews)
2007
Systematic Reviews
Oxman et al. [26]
OQAQ (Overview Quality Assessment Questionnaire)
1991
Review Articles
Wells et al. [27]
Newcastle-Ottawa Scale (NOS)
N/A
Non-Randomised Studies
Timmer et al. [28]
N/A
2003
Abstracts
41
How to Evaluate the Quality of the Published Literature
563
PAPER SECTION And topic
Item
TITLE & ABSTRACT
1
How participants were allocated to interventions (e.g., “random allocation”, “randomized”. or “randomly assigned”).
2
Scientific background and explanation of rationale.
INTRODUCTION Background METHODS Participants
3
Interventions
4
Objectives
5
Outcomes
6
Sample size
7
Randomization -Sequence generation Randomization -Allocation concealment Randomization -Implementation
8 9 10
Blinding (masking)
11
Stastistical methods
12
RESULTS Participant flow
13
Recruitment Baseline data
14 15
Numbers analysed
16
Outcomes and estimation
17
Ancillary analyses
18
Adverse events
19
DISCUSSION Interpretation
20
Generalizability
21
Overall evidence
22
Descriptor
Eligibility criteria for participants and the settings and locations where the data were collected. Precise details of the interventions intended for each group and how and when they were actually administered. Specific objectives and hypotheses. Clearly defined primary and secondary outcome measures and, when applicable, any methods used to enhance the quality of measurements (e.g., multiple observations, training of assessors). How sample size was determined and, when applicable, explanation of any interim analyses and stopping rules. Method used to generate the random allocation sequence, including details of any restrictions (e.g., blocking, stratification) Methods used to implement the random allocation sequence (e.g., numbered containers or central telephone), clarifying whether the sequence was concealed until interventions were assigned. Who generated the allocation sequence, who enrolled participants, and who assigned participants to their groups. Whether or not participants, those administering the interventions, and those assessing the outcomes were blinded to group assignment. If done, how the success of blinding was evaluated. Stastistical methods used to compare groups for primary outcome(s): Methods for additional analyses, such as subgroup analyses and adjusted analyses. Flow of participants through each stage (a diagram is strongly recommended). Specifically, for each group report the numbers of participants randomly assigned, receiving intended treatment, completing the study protocol, and analyzed for the primary outcome. Describe protocol deviations from study as planned, together with reasons. Dates defining the periods of recruitment and follow-up. Baseline demographic and clinical characteristics of each group. Number of participants (denominator) in each group included in each analysis and whether the analysis was by “intention-totreat”. State the results in absolute numbers when feasible (e.g., 10/20, not 50%). For each primary and secondary outcome, a summary of results for each group, and the estimated effect size and its precision (e.g., 95% confidence interval). Address multiplicity by reporting any other analyses performed, including subgroup analyses and adjusted analyses, indicating those pre-specified and those exploratory. All important adverse events or side effects in each intervention group. Interpretation of the results, taking into account study hypotheses, sources of potential bias or imprecision and the dangers associated with multiplicity of analyses and outcomes. Generalizability (external validity) of the trial findings. General interoretation of the results in the context of current evidence.
Fig. 41.4 The CONSORT (Consolidated Standards for Reporting of Trials) Checklist
Reported on Page #
564
A. Chow et al.
Fig. 41.5 The CONSORT Flowchart
Assessed for eligibility (n= )
Excluded (n= ) Enrollment Is it Randomized?
Allocated to intervention (n= ) Received allocated intervention (n= ) Did not receive aloocated intervention (n= ) Give reasons
Allocation
Not meetine inclusion criteria (n= ) Refused to participate (n= ) Other reasons (n= )
Allocated to intervention (n= ) Received allocated intervention (n= ) Did not receive aloocated intervention (n= ) Give reasons
Lost to follow-up (n= ) Give reasons
Lost to follow-up (n= ) Give reasons Follow-Up Discontinued intervention (n= ) Give reasons
Discontinued intervention (n= ) Give reasons
Analysed (n= )
Analysed (n= ) Analysis Excluded from analysis (n= ) Give reasons
41.9 The Problems with Quality Assessment Tools The assessment of the methodological quality of a trial is linked with the quality of reporting. Many quality assessment tools use reporting as a proxy for methodological quality. However, it is important to realise that high-quality reporting and methodology do not always go hand in hand. A well-conducted but poorly reported trial, or a poorly conducted but well reported trial, may for example be misclassified as low and high quality, respectively. A paper by Huwiler-Muntener et al. [17] examined 60 RCTs and compared results of reporting quality as calculated using the CONSORT statement, and methodologic quality, calculated by the presence
Excluded from analysis (n= ) Give reasons
of allocation concealment, blinding, and the intentionto-treat analysis. They demonstrated that similar quality of reporting may hide important differences in methodologic quality, and that well-conducted trials may be reported badly. They urged that a clear distinction should be made between the reporting quality and methodology of RCTs. Similar findings were described by Hill et al. who examined 40 RCTs to compare publications with actual investigator reports of trial practice. They demonstrated that poor reporting of methods of randomisation and allocation concealment misclassified almost 80% of trials as inadequate, when actual trial methodology was proven to be adequate. They concluded that assessment of RCTs on the basis of their reports is quite likely to be inappropriate.
41
How to Evaluate the Quality of the Published Literature
Also, whether these assessment tools truly provide an accurate picture of quality is often unproven. Unfortunately to date, there is no accepted “gold standard” of quality assessment that new methods can be compared against [7]. Thus, new assessment methods can only be judged against the “theoretical gold standards”. Simply put, new assessment tools must have validity and reliability. The most important aspects of validity in these cases are face validity and construct validity. Face validity determines whether the quality assessment tool appears to be valid when examined by a panel of experts. So far, the majority of assessment tools contain a number of accepted criteria from clinical trial textbooks [19]. Thus, the face validity of the majority of assessments is reasonable. The application of formal methods such as consensus may help one to increase the face validity of assessment tools [7]. So far, only a few assessment tools have been created via the process of consensus including the tools developed by Jadad et al. [11] and the Delphi list [24]. However, even the process of consensus may not ensure validity as it did not prevent marked differences appearing between the Jadad and Delphi assessments [7]. Construct validity also plays an important part. This determines if the quality measurement correlates with other methods of quality measurement, i.e. do the differing quality assessments identify the same highquality publications? Work by Herbison et al. [35] suggests that this may not be the case. They examined 65 meta-analyses with binary outcomes, taken from the Cochrane database. The trials included in these meta-analyses were scored for quality using 43 different quality scoring systems. They demonstrated that the quality scales differentiated studies into high and low quality in an inconsistent manner. This further confirmed earlier work by Juni et al. [36]. Their group re-analysed a meta-analysis of 17 trials comparing the effects of low molecular weight heparin (LMWH) with standard unfractionated heparin (UFH) from prevention of postoperative thrombosis. To the included studies, they applied 25 different quality scales. The outcome reached by the meta-analysis was found to depend on the type of quality scale used. For some quality scales, the high-quality studies demonstrated no significant difference between LMWH and UFH, while low-quality studies demonstrated benefits from LMWH. The use of other scales found high-quality studies showing significant benefits of LMWH, whilst low-quality studies found no significant difference.
565
The remaining scales found no difference in benefits for both high- and low-quality scales. The relative weighting allocated to each aspect of a quality scale may also influence results. Whiting et al. [37] used five separate weighting strategies for the QUADAS quality assessment tool in the evaluation of 28 studies examining the use of ultrasound scanning for the diagnosis of vesico-ureteral reflux in children. They found that the different weighting schemes for the same quality assessment tool produced different quality scores. These differing schemes ranked studies in different orders, resulting in different estimates of diagnostic accuracy. The group concluded that quality scores should not be incorporated into diagnostic systematic reviews, although investigation of the association of individual quality items with estimates of diagnostic accuracy was recommended. The reliability of quality assessment tools is also relatively unknown. Varying studies have demonstrated either high [6] or low [38] inter-rater agreement when using the Jadad quality assessment tool. This potential lack of reliability brings into doubt the consistency of quality assessment using these assessment tools. Thus at best, these studies would indicate that the use of quality scales in the assessment of clinical trials is useless. At worst, the use of an inappropriate scale may lead to imprecision and invalidate the findings of the meta-analysis. Users must be aware of the potential pitfalls of quality scores. There are those who dispute the use of quality scores in any situation with Greenland stating that quality scoring is the most insidious form of bias in meta-analysis as it “subjectively merges objective information with arbitrary judgements in a manner that can obscure important sources of heterogeneity among study results” [39]. He argues that quality scoring in meta-analyses should be abandoned and replaced by meta-regression analysis on quality items.
41.10 Summary and Conclusions The trend towards EBM is encouraging clinicians to incorporate the best available evidence in their daily practice. The ability to identify high-quality evidence is paramount should we aim to ensure best patient care. The use of quality assessment tools can be a major aid not only to the readers of scientific journals, but also to the authors of publications. Quality assessment
566
tools can help one to stratify evidence according to quality, allowing selective reading, as well as subgroup analysis of high-quality evidence in metaanalyses. However, these tools also have their share of problems, and users must be careful in their use. This is especially important in the field of meta-analysis, where the choice of a quality assessment tool must be well considered. As estimates of treatment effects may ultimately depend on the exact quality score used, it is important to choose a tool that is appropriate for the area of interest and the type of study being assessed. The use of an inappropriate quality assessment tool may well invalidate the results of an analysis. Ultimately, quality assessment tools will never replace careful critical appraisal of evidence. This can be a complicated process, and requires a basic understanding of scientific method, trial design, statistical analysis, and the potential of bias in all these areas. Only when clinicians are competent in assessing quality of evidence can we truly embrace the practice of EBM.
References 1. Sackett DL, Rosenberg WM, Gray JA et al (1996) Evidence based medicine: what it is and what it isn’t. Br Med J 312:71–72 2. Juni P, Altman DG, Egger M (2001) Systematic reviews in health care: assessing the quality of controlled clinical trials. Br Med J 323:42–46 3. The Standards of Reporting Trials Group (1994) A proposal for structured reporting of randomized controlled trials. JAMA 272:1926–1931 4. Chalmers TC, Smith H Jr, Blackburn B et al (1981) A method for assessing the quality of a randomized control trial. Control Clin Trials 2:31–49 5. Colditz GA, Miller JN, Mosteller F (1989) How study design affects outcomes in comparisons of therapy. I: Medical. Stat Med 8:441–454 6. Jadad AR, Moore RA, Carroll D et al (1996) Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control Clin Trials 17:1–12 7. Verhagen AP, de Vet HC, de Bie RA et al (2001) The art of quality assessment of RCTs included in systematic reviews. J Clin Epidemiol 54:651–654 8. Harris RP, Helfand M, Woolf SH et al (2001) Current methods of the US Preventive Services Task Force: a review of the process. Am J Prev Med 20:21–35 9. Oxford Centre for Evidence-based Medicine (2001) Levels of evidence. Available at: http://www.cebm.net/index.aspx?o = 1025 10. Montori VM, Swiontkowski MF, Cook DJ (2003) Methodologic issues in systematic reviews and meta-analyses. Clin Orthop Relat Res: 43–54
A. Chow et al. 11. Cappelleri JC, Ioannidis JP, Schmid CH et al (1996) Large trials vs meta-analysis of smaller trials: how do their results compare? JAMA 276:1332–1338 12. LeLorier J, Gregoire G, Benhaddad A et al (1997) Discrepancies between meta-analyses and subsequent large randomized, controlled trials. N Engl J Med 337:536–542 13. Villar J, Carroli G, Belizan JM (1995) Predictive ability of meta-analyses of randomised controlled trials. Lancet 345: 772–776 14. Sacks HS, Berrier J, Reitman D et al (1987) Meta-analyses of randomized controlled trials. N Engl J Med 316:450–455 15. Dixon E, Hameed M, Sutherland F et al (2005) Evaluating meta-analyses in the general surgical literature: a critical appraisal. Ann Surg 241:450–459 16. Moher D, Cook DJ, Eastwood S et al (1999) Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Quality of Reporting of Meta-analyses. Lancet 354:1896–1900 17. Huwiler-Muntener K, Juni P, Junker C et al (2002) Quality of reporting of randomized trials as a measure of methodologic quality. JAMA 287:2801–2804 18. Moher D, Pham B, Jones A et al (1998) Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet 352:609–613 19. Moher D, Jadad AR, Nichol G et al (1995) Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists. Control Clin Trials 16:62–73 20. Moher D, Jones A, Lepage L (2001) Use of the CONSORT statement and quality of reports of randomized trials: a comparative before-and-after evaluation. JAMA 285:1992–1995 21. Slim K, Nini E, Forestier D et al (2003) Methodological index for non-randomized studies (minors): development and validation of a new instrument. ANZ J Surg 73:712–716 22. Whiting P, Rutjes AW, Reitsma JB et al (2003) The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol 3:25 23. Bossuyt PM, Reitsma JB, Bruns DE et al (2003) Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Br Med J 326:41–44 24. Verhagen AP, de Vet HC, de Bie RA et al (1998) The Delphi list: a criteria list for quality assessment of randomized clinical trials for conducting systematic reviews developed by Delphi consensus. J Clin Epidemiol 51:1235–1241 25. Shea BJ, Grimshaw JM, Wells GA et al (2007) Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews. BMC Med Res Methodol 7:10 26. Oxman AD, Guyatt GH (1991) Validation of an index of the quality of review articles. J Clin Epidemiol 44:1271–1278 27. Wells GA, Shea B, O’Connell D et al (2008) The NewcastleOttawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses. Available at: http://www. ohri.ca/programs/clinical_epidemiology/oxford.htm 28. Timmer A, Sutherland LR, Hilsden RJ. Development and evaluation of a quality score for abstracts. BMC Med Res Methodol. 2003;3:2 29. Begg C, Cho M, Eastwood S et al (1996) Improving the quality of reporting of randomized controlled trials. The CONSORT statement. JAMA 276:637–639
41
How to Evaluate the Quality of the Published Literature
30. Moher D, Schulz KF, Altman DG (2001) The CONSORT statement: revised recommendations for improving the quality of reports of parallel group randomized trials. BMC Med Res Methodol 1:2 31. Mills EJ, Wu P, Gagnier J et al (2005) The quality of randomized trial reporting in leading medical journals since the revised CONSORT statement. Contemp Clin Trials 26:480–487 32. Black N (1996) Why we need observational studies to evaluate the effectiveness of health care. Br Med J 312:1215–1218 33. McCulloch P, Taylor I, Sasako M et al (2002) Randomised trials in surgery: problems and possible solutions. Br Med J 324:1448–1451 34. Deeks JJ, Dinnes J, D’Amico R et al (2003) Evaluating non-randomised intervention studies. Health Technol Assess 7:iii-x, 1–173
567 35. Herbison P, Hay-Smith J, Gillespie WJ (2006) Adjustment of meta-analyses on the basis of quality scores should be abandoned. J Clin Epidemiol 59:1249–1256 36. Juni P, Witschi A, Bloch R et al (1999) The hazards of scoring the quality of clinical trials for meta-analysis. JAMA 282:1054–1060 37. Whiting P, Harbord R, Kleijnen J (2005) No role for quality scores in systematic reviews of diagnostic accuracy studies. BMC Med Res Methodol 5:19 38. Clark HD, Wells GA, Huet C et al (1999) Assessing the quality of randomized trials: reliability of the Jadad scale. Control Clin Trials 20:448–452 39. Greenland S (1994) Invited commentary: a critical look at some popular meta-analytic methods. Am J Epidemiol 140: 290–296
How to Write a Surgical Paper
42
Sanjay Purkayastha
Contents 42.1
Introduction ............................................................ 569
42.2
How to Find a Successful Title? ............................ 570
42.3
Getting the Salient Point of the Publication Across ....................................... 571
42.4
Writing for a Particular Journal .......................... 571
42.5
Following a Recognised Structure for Surgical Journals .............................................. 571
42.6
Being Prepared to Write, Rewrite and Rewrite Again and Again and Again ............ 572
42.7
An Efficient Utilisation of Tables, Diagrams, Images and Flowcharts ....................... 573
42.8
Careful Formatting and Referencing as Appropriate ........................................................ 573
42.9
A Clear Understanding of How to Submit the Manuscript Appropriately ............ 575
42.10 Being Open Minded to Review Criticism and Answering Reviewer Comments Well ........... 575 42.11 Working as a Team – The Best Papers are Never Written by Individuals ......................... 576 42.12 Case Reports, Letters, Techniques and Images in Surgery ........................................... 576 42.13 Summary ................................................................. 576 Further reading .................................................................. 577
S. Purkayastha Department of Biosurgery and Surgical Technology, Imperial College London, QEQM Building, St. Mary’s Hospital, 10th Floor, Praed Street, London W2 1NY, UK e-mail: [email protected]
Abstract This chapter focuses on the art of writing a paper once data collection and analysis are complete, and endeavours to highlight the points that reviewers consider when evaluating papers. It is important to research the question at the beginning of the journey to ensure that another author or team has not recently published what you are looking at in a similar style or with a similar data set, unless your study is part of a prospective study or randomised trial. A genuine understanding and interest in the speciality of the question to be answered must be apparent to the authors for the work to be useful and successful.
42.1 Introduction Surgery is becoming more academically minded. In the UK, there is a renaissance in academic surgery. Publish or perish is a motto that many surgical trainees have been exposed to. Training systems all over the world attempt to distinguish between candidates on many levels, although peer-reviewed publications are the only factor that surgeons look for in potential juniors, the skill set required to publish successfully shows other positive factors as well. These include self-motivation, an interest in the field in questions, drive, questioning common teachings, systematic thought processes, an understanding of publishing methodology, an ability to write well, perseverance and being part of a team with an academic interest. First, it is important to question the reason for embarking on publishing a paper in surgery. It is safe to say that although purists would say it should be to answer a pertinent and useful question in research or clinical practice that may change the way that we understand processes or carry out research or clinical practise,
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_42, © Springer-Verlag Berlin Heidelberg 2010
569
570
and commonly it is to add lines to individual CVs or resumes. Usually, if it is purely for the latter reason, the quality and utility of the work are questionable and without a genuine interest in the subject and, especially, the question that is being set out to be answered, attempting to publish for the sake of it is usually futile. What can be published as a surgical paper? There are a multitude of avenues for which papers can be written. These include basic sciences relevant to surgery, case reports and case series, technical reports, clinical research, systematic reviews, meta-analyses, new technology, management issues in surgery, cost analysis, surgical education and surgical skills. Publishing successfully in any of the above categories needs careful preparation, planning and good data sets. Studies with formalised research backing including long-term funding usually mean that the methodology has been thought out carefully years in advance, yet publishing the results in a high-impact peer-reviewed journal takes thought, time, precise writing and structured methods and techniques. The topic chosen for publication is crucial with regard to the timing of submission and which journal it is to be submitted for consideration for publication. Certain subjects are topical because of recent events, other new publications and media coverage. The ability to recognise and construct a manuscript in such circumstances to get submission to a respected journal is very hit-and-miss, and carefully-planned, well-structured projects are more useful in the long term. Whichever path is chosen, the following are the essential components for writing a good surgical paper: • Formulating the question that needs to be answered. • A useful, snappy title that is memorable and eyecatching. • Getting the salient point of the publication across. • An understanding of which type of journal, submission will be put forward for, so that the manuscript’s requirements can be planned and structured appropriately. • Following recognised structure for surgical journals. • Being prepared to write, rewrite and rewrite again and again. • An efficient utilisation of tables, diagrams, images and flowcharts. • Careful formatting and referencing as appropriate. • A clear understanding of how to submit the manuscript appropriately.
S. Purkayastha
• Being open minded to review criticism and answering reviewer comments well. • Working as a team – the best papers are never written by individuals. This chapter focuses on writing the paper once data collection and analysis are complete, and endeavours to highlight the points that reviewers consider when evaluating papers. Formulating a question for a manuscript is a key element, and although this is not directly evaluated by readers, without a relevant and interesting question, the peer-reviewed journals will not want a manuscript. It is important to research the question at the beginning of the journey to ensure that another author or team has not recently published what you are looking at in a similar style or with a similar data set, unless your study is part of a prospective study or randomised trial. A genuine understanding and interest in the speciality of the question to be answered must be apparent to the authors for the work to be useful and successful. Finally, the appreciation of the scale of the work to be undertaken is essential as submission to a relevant impact factor journal when the manuscript is finished is very important for it to be accepted.
42.2 How to Find a Successful Title? An eye-catching title, which is informative and makes the reader inquisitive, is a useful start to any manuscript. Incorporating well-known phrases and quotes is one way of generating interest, but make sure the article does not become clichéd or funny by doing so, which often detracts from the work to follow. Many journals also request a short header, which is used by the journal to present the work if it is published in the contents. This should usually be three to five words and should also be thought of alongside the title. Try to avoid very long titles with many abbreviations and technical words, as most high-impact journals are read by the general public and general medical readers who may not be aware of the abbreviations and technical words. Finding a unique angle or perspective of your results or conclusions is good point to start in formulating a useful title. Do not forget to include the authors’ details, institution and corresponding author on the title page, along with the word count, short header (if required), type of paper and any conflict of interest from any of the mentioned authors.
42
How to Write a Surgical Paper
42.3 Getting the Salient Point of the Publication Across Other than the title, a very carefully worded abstract is imperative to getting the whole point of the manuscript across. It is important to select the journal for submission prior to writing the abstract as journal requirements vary greatly from journal to journal. Usually, the structure of the abstract mirrors the structure of the manuscript, and therefore the instructions to the author from the journal web site should be carefully consulted. The results and conclusion section of the abstract are the most important as the readers usually glance at these parts of the abstract the most when flicking through journals and scouring literature databases for papers. Take care of the presentation of the numerical results, and keep the statistical outcome of the results simple so that all readers can understand the outcomes, and not just those with a mathematical background. Also, be careful in strong inferences in your conclusions as saying that your paper is a cure for cancer when it is clearly not is a simple guarantee of rejection from a reviewer. A succinct and eloquent conclusion that is realistic from your data and results is much more appreciated when understated rather than being too elaborate. Using the title, abstract and conclusions to reiterate the main points of a manuscript, reinforces the salient take home messages.
42.4 Writing for a Particular Journal Every journal has a different style and particular approach to getting information out to its readers. Some even use more than one route for publication, such as separate versions for the electronic and paper release and multimedia sections. It is crucial that a manuscript sent to a journal, especially a high-impact journal, looks like something already out of that journal. Therefore, looking at and reading several papers of a similar type that have been published recently from the journal in question is important prior to creating the skeleton and actually writing the paper. Going through the “instructions to authors” section of the journal’s web site is also fundamental for preparing any individual manuscript, so that all the rules are adhered to. Try to find some relevant papers in the
571
journal that you are proposing to submit to that are relevant to your manuscript and include them at least in the references for your introduction or discussion. This will mean that the journal will cite itself more through publishing your manuscript. This may seem a little mercenary; however, many editors to journals (especially up and coming ones) do look out for ways to improve their own citation index. Be sure to look at the preferred style of writing of the journal you are going to submit. Some tend to be very factual and technical, others are less so and include more colloquial language and phrases. Ideally, try to submit to a journal that you read regularly. Do not forget that spelling and grammar differ for the US- and UK-based journals. Carry out the appropriate spelling and grammar check using your software tool prior to submission. For example, an American editor will be frustrated in having to change all the “randomised” trials to “randomized” trials in your paper.
42.5 Following a Recognised Structure for Surgical Journals Title, abstract, introduction, methods (e.g. type of study, data collection and databases used, samples, patients, statistics, software, surgical techniques and literature searches), results, discussion, conclusion and references are the common headings in order for most surgical journals. It is commonplace to include tables, figures and legends at the end of the manuscript. Usually, most journals have a 3000–4000 word count excluding the abstract and references. Most abstracts require approximately 300 words. Adhere to the word count of the journal in question, they are instituted for a reason and it will stop the author (you) rambling on. The introduction should be approximately 400 words and written so as to set the scene and make the reader want to carry on reading the rest of the paper. Try not to use more than six or seven references in the introduction, especially if writing a systematic review as this may limit your use of references later. Try to use journal-specific or highimpact references in the introduction. The methods section is usually the most technically demanding section to write. It is best to look at three or four different papers, by different authors in the same journal on similar topics to get a feel for how best to lay this section out. The order of the methods section is
572
important, and it is usual to first explain the inception of the study from creation and ethical review board clearance for example, and finish with the statistical techniques used. If a literature search is to be carried out for a systematic review or meta-analysis, it is best to keep an active database and flowchart of the references so that this can be included as a figure to help the reader to follow the search criteria when the manuscript is completed. Remember that when discussing surgical technology, equipment, software packages, companies and statistical packages, most require a registered (®) or trademark (™) symbol as well as the location of the company (e.g. town, state/county and country). With regard to statistics, write the methodology, but keep it concise. It should be readable and understandable to clinicians with some statistical knowledge but not solely mathematicians. If in doubt find a statistician and ask them if it makes mathematical sense. Then, rewrite the section, and give it to a trainee or consultant surgeon and see if they understand it. If they don’t, then rewrite it until the mathematical understanding is carried across at a clinical level. Again looking at several papers from the journal you are going to submit may be helpful here. The results section is obviously very important and a clean crisp start here is necessary, as after the abstract this is where most readers flick to when browsing through journals. Try not to repeat findings and do not start to discuss the results here – that is for the discussion section! Write the most important results and briefly their relevance. There rest of the results can be presented in tables or figures (see below). If you find yourself writing out repeated sentences regarding the statistical significance, or numerous values, usually it means that these sections can be tabulated instead. Try to space out the results section into discrete paragraphs. Reading vast amounts of solid text and trying to pick out the important results is a difficult and annoying task for any reviewer. The discussion may start with a brief concise reminder of the main findings and then should go on to include the research or clinical significance of these findings. Try to compare your findings to those who have previously written on the same subject. Look in the recent literature for contentious issues that your results may corroborate or contradict. If clinically relevant, then try to discuss what the results mean from different perspectives, for example that of the patient, the surgeon, the general practitioner and even the
S. Purkayastha
health provider. Are there any cost or management implications to the work presented? A limitations subsection in the discussion is always helpful. This allows the authors to criticise their own work and explain the potential weaknesses and flaws, and therefore maintain realism and humility in the work. Statistical and clinical issues should be discussed here, and realistic and non-correctable issues should be put first. Be careful not to include limitations that are fundamental flaws as this will lead to a certain rejection. If a major flaw is found at this stage, then it is best to go back and re-do the flawed portion, even if it means more data collection and repeat analysis so as to submit high-quality work. What was known before and what has been added to surgical knowledge is a good way to round up your discussion. Future work may be discussed, and this should lead to a well-rounded and tidy conclusion. Hence, journals actually want the authors to create text boxes as summaries of “what was known before”, “what was carried out in this study” and “what is known now” for individual papers, which are a useful exercise to see if you have formulated your main point, and it is easy to get across to the reader. Try to build up momentum in your writing through the discussion until you get to the conclusion where the pace of the work should slow a little and finally end on a sentence that flows well and leaves the reader with few questions and reiterates the major finding/outcome and conclusion.
42.6 Being Prepared to Write, Rewrite and Rewrite Again and Again and Again Manuscripts are never written in one sitting and not re-edited prior to acceptance. Even invited articles are subject to careful scrutiny. Even when the authors have completed a first draft, it should be given to the most senior author for a further edit. Subsequent to this all those involved in the manuscript should go through the manuscript line by line checking the language, grammar and spelling again. The journal style and feel should be looked at and discussed, and sections may warrant further rewrites. Keep every re-edited version of the manuscript in a folder with the date of that edit so that you can go back to it if necessary. Sometimes,
42 How to Write a Surgical Paper
573
however, it may be useful to give the manuscript to a colleague, who has not had anything to do with the work, to read for external opinions and suggestions. They can always be acknowledged in the paper if necessary. Do not be satisfied until you can be proud of the manuscript; in other words it should look, read and feel like a quality piece of work.
42.7 An Efficient Utilisation of Tables, Diagrams, Images and Flowcharts Good tables are important for many different reasons in writing a surgical paper. From the beginning of data collection, it is important to keep well-structured tables with headings that will be used in the final manuscript. It is more efficient to keep the initial tables in a spread sheet format that can then be transferred easily to a word processed format when writing the paper. When formatting the table for the actual paper, keep the rows and columns simple. If a large table must be constructed, then use carefully selected abbreviations and symbols, which can be explained in a suitable legend that accompanies the table (Fig. 42.1). Similarly, if results tables contain lots of numerical values, they should be presented clearly so that the
Table 1
reader can follow the logic of the way that the data are presented (Fig. 42.2). Figures similarly should be kept simple if possible, but if complex, they should be explained with some brief text. A flowchart figure is a useful tool to explain literature searches but is best put together prospectively as the search is undertaken (Fig. 42.3). Photographs for clinical images and surgical techniques may be used, but please ensure that these adhere to the type of digital image and resolution that comply with the journal requirements. Also, it is important to check whether the journal accepts colour images and whether they charge to publish colour images prior to submission; otherwise, the authors may be billed for relatively large sums without recognising such issues. Finally, it is important to adhere to the number of tables and figures as set out in the instructions to authors, as journals have limited space for text, tables and images.
42.8 Careful Formatting and Referencing as Appropriate This chapter has already stressed the importance to adhering to the rules of the journal for submission. Formatting is therefore crucial. This includes font type
Study characteristics
Author 17
Luboldt 14 Lauenstein Ajaj19 20
Pappalardo Leung21 18
No. of patients
No. of examinations
Age in years (range)
Inclusion criteria
Exclusion criteria
Study design
132 24
115 24
60 (18-86) 57.4 (33-78)
1 1
1,2 1
Prospective Prospective
55
55
60.6 (44-77)
2-4
1
Prospective
70
70
59 (19-85)
1,2,5,6
1,3
Prospective
156
156
55.2 (9.1a)
1
7-11
Prospective
60.2 (17-90)
1,2,5-8
1
Prospective
1,9
1
Prospective
1
2, 4-6
Prospective
122
120
Lauenstein22
12
6
Luboldt23
17
17
Ajaj
66 (48-86)
Inclusion criteria: 1, suspected colorectal cancer; 2, positive faecal occult blood; 3, family history of colorectal cancer; 4, diarrhoea; 5, change of bowel habit; 6, overt bleeding per rectum; 7, abdominal pain; 8 deranged liver function tests; healthy volunteer subjects. Exclusion criteria; 1, contraindicattions to MR; 2, acute abdomen; 3, large bowel obstruction; 4 claustrophobia; 5, pregnancy; 6, PPM (permanent pacemaker); 7, barium enema or colonoscopy performed in last 5 years; 8, known diverticular disease or IBD (inflammatoy bowel disease); 9, severe medical co-morbidity; 10 metal prosthesis; 11, anticoagulation. a Standrad deviation.
Fig. 42.1 Taken from: Purkayastha et al. [6] – example of abbreviations and symbols to explain a table of study characteristics
574
S. Purkayastha Table 4. Results of Meta–Analysis Comparing Laparoscopic vs. Open Surgery for Diverticular Disease
Outcome Adverse events (ICD-9 criteria) Mechanical wound complications Infections Wound Infection Urinary complications Pulmonary complications GI tract complications Anastomotic leak Cardiovascular complications System complications Intraoperative complications Overall mortality
No. of Patients 18,796
No. of Studies
OR/ WMD
4
0.52
95 Percent Cl 0.26, 1.04
P Value 0.07 a
Chisquared HG Test
HG P Value
1.55
0.67
19,272 801 18,747 18,934 19,272 433 18,723
10 8 5 7 8 5 5
0.61 0.35 1.18 0.4 0.75 1.38 0.28
0.41, 0.9 0.15, 0.81 0.56, 2.48 0.26, 0.62 0.58, 0.98 0.45, 4.25 0.13, 0.59
0.01 0.01a 0.66 <0.001a 0.03a 0.57 <0.001a
5.11 2.58 4.74 3.31 3.39 1.54 0.08
0.82 0.92 0.31 0.77 0.85 0.82 1
18,486 18,950 19,022
2 5 4
0.98 0.94 0.3
0.6, 1.59 0.55, 1.58 0.07, 1.23
0.93 0.81 0.1
0.85 1.55 4.49
0.36 0.82 0.21
564 219 413 875 286
5 2 3 7 2
0.06
10,204.98 0.01 6.38 13,959.56 1456.17
103 194 236
2 2 2
–2.4 –2.77 –1.01
–2.99, –1.81 –4.68, –0.86 –1.55, –0.47
<0.001a 0.004a <0.001a
0.75 4.21 1.55
19,319
8
–3.81
–4.63, –2.98
<0.001a
201.55
Operative outcomes Blood loss Specimen length Splenic flexure mobilization Operative time Operative cost
–103.27 –208.82, 2.29 –2.19 –3.65, –0.72 0.53 0.18, 1.59 67.59 3.79, 131.39 699.56 –2361, 376
0.003a 0.26 0.04a 0.65
<0.001a 0.95 0.04a <0.001a <0.001a
Functional outcomes Time to normal diet Time to liquid diet Time to first bowel movement Length of hospital stay
0.39 0.04a 0.21 <0.001a
OR = odds ratio; WMD = weighted mean difference; CI = confidence interval; HG = heterogeneity; GI = gastrointestinal. “Results are statistically significant.
Fig. 42.2 Taken from: Purkayastha S, Constantinides VA, Tekkis PP, Athanasiou T, Aziz O, Tilney H, Darzi AW, Heriot AG. Laparoscopic vs. open surgery for diverticular disease: a
meta-analysis of nonrandomized studies. Dis Colon Rectum. 2006 Apr;49(4):446–463
and size, line spacing, page setup, headings, page numbers, headers and footers, the title page, word count, referencing and numbers of references and the inclusion of tables, figures and legends. References in particular can be problematic. Avoid complicating the referencing procedure; instead, try to use a referencing software tool that links to your Word processor. Common examples are Reference Manager® and Endnote® for example. Create your reference file as you search either manually or by search through your referencing software package. Both these techniques allow the author to “drop” in references and “cite while you write”. This creates a linked manuscript so that if you edit the paper and move sentences around, the references will automatically move with the text and be referenced and the end of the
paper. It is useful to note however, that if you are writing a manuscript with double line spacing, keeping the reference software package spacing for 1.5 line spacing is adequate. Also, the type of reference citation should be checked from the instructions to authors and included in the setup for that manuscript. This should include the referencing style (e.g. Vancouver and numbered) and the numbers of names to be included (e.g. all the authors, and the first two and last author only). It is important not to unlink your document from the reference software until you are completely ready to submit your manuscript. Once unlinked, it is impossible to electronically link the references to the manuscript and it will have to be linked manually, which can take many hours for a lengthy manuscript. Save the
42
How to Write a Surgical Paper
Fig. 42.3 Taken from: Purkayastha S, Tilney HS, Georgiou P, Athanasiou T, Tekkis PP, Darzi AW. Laparoscopic cholecystectomy versus mini-laparotomy cholecystectomy: a metaanalysis of randomised control trials. Surg Endosc. 2007 Aug;21(8):1294–1300 [Epub 2007 May 22]
575 36 case series
124 studies identified by computerised search: last date 9th May 2006
12 reviews 14 letters
102 studies excluded by title and abstract review
13 comparisons with traditional open surgery 3 robotic studies 25 other laparoscopic procedure studies
21 studies reviewed in full
5 studies with significant potential for overlapping cohorts and therefore only the largest, highest quality study was included
7 excluded
2 studies that only reported outcomes on respiratory function 1 study as data was not extractable
14 comparative studies
9 randomised control trials
unlinked manuscript with a different name so that you are aware that this is the final version.
42.9 A Clear Understanding of How to Submit the Manuscript Appropriately Manuscript submission can be a lengthy and complicated process. There are not many journals left that require paper submission, but authors should check that this is not the case prior to submission. Most journals now require electronic submission, and authors are asked to log on to a journal- or publisher-specific site and download their manuscripts on line. Usually, the sites are set up so as to break up the manuscript into its basic sections. Therefore, if authors write the paper in the format above, electronic submission is relatively simple, and they only require the manuscript to be separated into the title, abstract, main section of the paper (introduction, methods, results and discussion), references and then tables, figures and legends. Do not forget to complete the competing interests section, word count and write a letter to the editor that will also need to be uploaded on the journal web site. Occasionally, journals ask the authors to suggest names of reviewers for manuscript, so be prepared to put forward the names of two or three individuals with an interest in the field that are
5 non-randomised trials
not known directly to the authors. Figures should be saved and uploaded in the correct format. Most journal sites request JPEG, GIF, TIFF or BMP formats. If these are not the saved images, then PDFs can often be used. Image types can be changed using the “Save as” tool and changing the subtext of the saved file name. Keep a record of the manuscript number and save a copy of the created PDF and print out a copy of the copyright transfer which should be signed by all authors and sent or faxed to the publisher. Some journals have a checklist for submission, but if not it is best to create your own as you read through the instruction for authors.
42.10 Being Open Minded to Review Criticism and Answering Reviewer Comments Well Reviewer comments are made so as to improve the quality of the paper. Do not take them personally as this is not constructive to getting your paper accepted. Address each individual comment individually, and take on board the information unless you feel particularly strong that the reviewers’ comments are incorrect, misunderstood or make the paper weaker. Outline how you have changed the manuscript accordingly and highlight the changes in the manuscript so that there are three versions (one previously submitted, one with
576
changes highlighted and one with the changes but not highlighted). Thank the reviewers and the editor for their comments and time and resubmit the manuscript. Keep a copy of all the resubmitted materials in a separate subfolder so that it can be easily accessed if more changes are requested.
42.11 Working as a Team – The Best Papers are Never Written by Individuals Embarking on writing a paper solo is a daunting and usually very difficult process. Having some senior authors enables guidance and support and suggestions for editing the paper that make a paper more readable and hopefully more precise. Teamwork is also important for literature searches, data collection and extrapolation, quality scoring (if a meta-analysis is being carried out), writing and submission. The team can also be responsible for checking on the submission, making changes and sending in forms and so on, if others are away. Usually most journals do not want more than six authors; however, for large studies and multicentre trials, they will make exceptions. Whichever author writes the paper should go first, and the most senior author and guarantor of the work being last author. It is imperative that each author’s input is documented and noted upon submission. Teamwork also enables more than one project being underway at any one time and creates an environment whereby ideas can be discussed freely and a consensus being achieved for contentious issues. It also allows analysis and data extraction to be checked by more than one individual. Finally, it also enables a variety of skill sets to be brought into completing a project.
42.12 Case Reports, Letters, Techniques and Images in Surgery Short manuscripts such as these are often the first point of contact that many authors have in the journey to publish. Case reports can be individual cases with an interesting message, something unusual or rare or a series reporting a new technique or technology. It is usual to include interesting images/photographs of the case(s) in question as well as histological or technological
S. Purkayastha
images. These are usually short reports and planning a submission for such manuscripts should entail the same details as above. It is imperative that for such cases, the patients’ consent is taken and documented using a formal consent form downloaded from the journal’s web site. This is the same for clinical and radiological images that may be published as individual interesting case notes in some sections of journals. Surgical techniques are also usually accompanied by multiple figures and or photographs. In some journals, there are sections where video footage can be uploaded. For such submissions, careful editing and high-resolution footage should be used and formatted ideally by a professional audio–visual technician for the best results including an edited commentary of the case(s) in question. Again, the patient’s and unit’s consent is needed and should be carefully documented, and every effort is made to keep the patients identity anonymous through the footage. Letters in response to publications can be published quickly and most high-impact journals have an electronic section for letter responses to articles published. Most letters published are those submitted early on subsequent to the article being published and are more likely to be accepted if topical, thorough, fairly critical or if a mistake has been described (e.g. statistical or methodological). Be careful not to be too personal or individual in any criticism of published material as if this is subsequently accepted, it may come across as discourteous! Letters, although when published may be available on literature databases, but do not count as part of the RAE (research academic exercise).
42.13 Summary Writing a surgical paper requires a salient question to answer, careful preparation and structure. Writing as part of a team is useful and taking particular care in adhering to journal requirements is essential for a successful publication. Accepted structure and methodology, alongside precise language, grammar and spelling further increases the changes of having a manuscript accepted. Attention to detail, perseverance and being open minded to positive criticism are characteristics that lead to having manuscripts that are of high quality, interesting to the reader and that allow authors to generate a template for successful publication on a regular basis prepared.
42
How to Write a Surgical Paper
Further reading 1. Cetin S, Hackam DJ (2005) An approach to the writing of a scientific manuscript. J Surg Res 128:165–167 2. Dickersin K, Ssemanda E, Mansell C et al (2007) What do the JAMA editors say when they discuss manuscripts that they are considering for publication? Developing a schema for classifying the content of editorial discussion. BMC Med Res Methodol 7:44
577 3. Lin AE (2006) Writing for scientific publication: tips for getting started. Clin Pediatr 45:295–300 4. Grigg MJ, Rosenfeldt FL (1990) Writing a surgical paper: why and how? Aust N Z J Surg 60:661–664 5. Shukla S (2007) How to write a scientific paper. Indian J Surg 69:43–46 6. Purkayastha S, Tekkis PP, Athanasiou T, Aziz O, Negus R, Gedroyc W, Darzi AW (2005) Magnetic resonance colonography versus colonoscopy as a diagnostic investigation for colorectal cancer: a meta-analysis. Clin Radiol 60(9):980–989
A Primer for Grant Applications
43
Hutan Ashrafian, Alison Mortlock, and Thanos Athanasiou
Contents 43.1
Introduction ............................................................ 579
43.2
Reasons for Application ......................................... 580
43.3
Sources of Funding ................................................. 580
43.4
Who Should Apply to which Grant? .................... 582
43.5
Practicalities of Applying....................................... 582
43.6
Costing ..................................................................... 584
43.7
An insight into Application Processing ................ 584
43.8
Common Reasons for Failure ................................ 585
43.9
After Success or Failure ......................................... 585
Abstract Application for grant funding is an essential component of modern surgical research. The majority of research studies, whether clinical, laboratory-based or technological, are fundamentally dependent on the successful accrual of research funds. These are acquired through successful grant applications. Within this chapter, some of the salient points involved in formulating and submitting a grant application will be addressed in order to familiarise the reader with the necessary steps to prepare a successful research grant application.
43.1 Introduction
43.10 Conclusion............................................................... 585 References ........................................................................... 586
H. Ashrafian () Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust at St. Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
In the early days of science, research was possible with only trivial amounts of funding. Over 2600 years ago, Sushruta was able to develop cosmetic surgical wound closure using insect body parts as surgical staples [1]; more recently, Michael DeBakey used his wife’s sewing machine to construct the first Dacron artery graft [2]. Performing surgical research today, however, requires extensive financial support. Whether in clinics or the laboratory, there is currently an ever increasing application of complex technologies to ascertain superaccurate endpoints in surgical disease processes and treatments. As a result, performing research in such an environment requires the accrual of adequate funds with which to carry out goal oriented research tasks. Furthermore, the nature of research is impacted heavily by the financial circumstances of the research team and the costs of each experiment, with studies in the life sciences accounting for approximately 50% of all national research funding (Fig. 43.1). To accommodate these investigative costs, research groups apply for bursaries through grant funding bodies. The aim of this chapter is to introduce the reader to the nature and the
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_43, © Springer-Verlag Berlin Heidelberg 2010
579
H. Ashrafian et al.
580 Engineering 17%
Environ. Scis. 7%
Other * Social Scis. 4% 2% Psychology 4% Physical Scis. 10%
Life Scis. 50%
Math/ Compu. Scis. 6%
Fig. 43.1 US Federal Funded Research in the Financial Year 2006 Budget, by Discipline. Source: National Science Foundation, Federal Funds for Research and Development. © 2007 AAAS [3]
process of writing a grant application in order to secure financial support for their research.
not only for their future research in academia, but also as a discipline that can be applied in their advancing careers as clinicians. Furthermore, research grants are no longer considered merely as “funds” with which to perform research work; indeed, they hold an almost consecrated position as they bolster and acknowledge scientific reputation and research mandate. In addition to the kudos large amounts of funding bring with them, grants from national charities and institutions represent recognition of research work. This is largely because many of the national funding bodies, such as the National Institutes of Health (NIH) [4] in the United States or the Wellcome Trust [5] and Medical Research Council (MRC) [6] in the United Kingdom, assess and endorse each grant after a strict peer review process involving senior “establishment” scientists. Succeeding in getting a grant from these reputable institutions signifies triumph and a competitive meritocracy among other applications and applicants, and thus carries a certain penchant in the academic world.
43.2 Reasons for Application 43.3 Sources of Funding One of the fundamental goals in applying for grants is to accrue financial resources with which to carry out research. However, this is not the only reason or benefit of applying for such stipends. Applying for a research grant requires a rigorous process of preparing the researcher and the project in a manner that will convince an external body to fund the project. This, by definition, would necessitate clarity in proposing: • The project aims • The nature of the research • A clear breakdown of each experiment so as to justify the amount of funds requested • A realistic timeline within which the proposed project is likely to be completed with the accrued funds Identifying these targets will not only be beneficial for the grant funding body, but will also reveal important insights into the nature of the work the research teams are about to undertake. For junior researchers, the writing of grants will also equip them with many of the skills required to communicate their scientific ideas and proposals to a wider professional audience: skills that are essential
Once you have identified a research question, it is important to consider the sources of the funding for which you and the project may be eligible (Fig. 43.2). This requires a considerable amount of “research” in itself, because if you can identify novel sources of funding, then you may take delivery of a larger overall stipend. Likewise, it is important to consider the remit of the funding organisation and committee to which you are applying, ensuring that your project fits strategically within their portfolio. Knowledge of funding sources is important and can be obtained from the internet, research papers in your field of study, and even by word of mouth. Furthermore, if you are lucky enough in having secured funding from one organisation, you can consider reapplying to them for further support. In extremely rare circumstances, funding bodies themselves may approach an established researcher and invite him/her to apply for a sum of money. To begin with, you may consider applying for some funding from local and familiar sources. This would include departmental sources from your hospital, university or regional health authority.
43 A Primer for Grant Applications
581 University Hospital
Local
Health Authority
Self
Specific Charity National Charity
Sources of Research Funding
NIH in the USA
National
Department of Health in the UK Governmental Body Military
Drug Company Engineering group Venture Capitalists
Disease based groups Research based groups
Other
Industry
Other
World Health Organization International
European Commission
Fig. 43.2 Sources of research funding
Industry
Federal
350,000 300,000 Funding ($million)
250,000 200,000 150,000 100,000 50,000
2005
2001
1997
1993
1989
1985
1981
1977
1973
1969
1965
1961
1957
0 1953
If you are studying one specific surgical disease of a cluster of diseases, then you may apply to a charity that specialises in the disease process at which you are studying. For example, if you are researching the surgical management of Parkinson’s disease, then an application to a Parkinson’s disease charity might be a first port of call. The research may be worthy of funding from national funding bodies such as national research charities (e.g. Wellcome Trust and Medical Research Council in the UK) and Governmental funding bodies (National Institutes of Health [NIH] in the USA and Department of Health [DoH] in the UK). Some projects of international or worldwide significance may also be eligible for funding by international bodies such as the European Commission or the World Health Organization. These institutions would traditionally have supported projects of a public health and epidemiological nature, but increasingly surgical projects with worldwide significance are considered [7]. If the research area of a project is relevant to the development of a new technology or drug, then application to industry might prove fruitful. This may include anything from a small stipend from a local drugs company branch to a massive international project supported by the largest companies in the world [8]. Many of these companies have a vested interest in your research, and therefore are keen to contribute to its development. Industrial contribution, therefore, has grown to equal and exceed that of national funding in the USA (Fig. 43.3) and the United Kingdom, but importantly is amongst the only resource for research in many developing countries. Of the research groups
Fig. 43.3 Federal vs. Industrial Funding in the United States of America. Data supplied by AAAS [3]
that are owned by industry, some do not necessarily publish all their research in order to maintain an industrial advantage. Others protect their work through international patents, but if as a surgical research group is the beneficiary of industrial funding whether university or hospital-based, then the onus is on such a group to explicitly state the contribution to industrial funding to their work, and to be clear about revealing any conflicts of interest in any published work that develops through industrial patronage. Furthermore, it goes without saying that there are certain surgeons, scientists and entrepreneurs who have enough personal wealth through whatever means to fund surgical research. They may do this directly, or indirectly, for example through a charity. As with money from industry, it is important that the source of funding for a research project should be clearly stated
H. Ashrafian et al.
582
in all documentation and publications that arise as a result of such stipends.
43.4 Who Should Apply to which Grant? In the majority of cases, the lead applicant on a grant has come up with a research hypothesis and will intellectually drive the research. There are a variety of funding mechanisms available, tailored to fit the nature of the research project and the career stage of the applicant. Some of the more common funding options are outlined below. Project grants are seen as the traditional method of supporting research projects, and are considered to be the mainstay of funding modern research. Project grants provide support for a defined piece of work with objectives that can be achieved within a specified time frame (often 3 years). The lead applicant, known as the Principal Investigator (PI), will be expected to have a track record within their field of research and is typically at least an assistant professor in the USA, or a lecturer or senior lecturer in the UK. Sometimes, surgical research requires more extensive funding to sufficiently reflect the complexity and/ or multidisciplinary nature of the research. Programme grants are designed to support larger projects, incorporating a number of experiments designed to answer an interrelated set of questions. Programme grants are harder to obtain but are associated with the granting of much larger awards for significant lengths of time (often 4–5 years). In addition to the usual scientific
considerations, strategic issues are also likely to bear on the decision to award a programme grant. Programme grants are awarded to outstanding individuals with an established scientific track record and applications are normally only considered from individuals who have a strong record in winning research funding. There is also scope to obtain funding for individual researchers through personal funding streams. These grants, known as fellowships, provide support at a variety of career stages and are often very competitive. For surgical trainees, personal funding to support research can be obtained through competitive Training Fellowships. These awards are prestigious and if given by one of the well-known grant bodies, carry a high standing in academic circles. Whether applying for a project grant, programme grant or fellowship, it is often necessary to work with another surgeon, scientist or group, naming these individuals as collaborators or co-applicants on your grant. It is particularly important to strengthen the application if you do not have the relevant expertise for important aspects of your proposal.
43.5 Practicalities of Applying Although funding for research has been gradually increasing over the past 35 years, particularly in the life sciences (Fig. 43.4), then so has the competition. If one is therefore going to succeed at winning a grant, the application needs to address all the minimum requirements and ideally attain as many extra points as possible.
Funding ($million)
35 30
Life Sciences
25
Social and Psychological Sciences
20
Mathematics, Computing and Engineering
15 10 5
6
2
20 0
8
20 0
19 9
4
0
19 9
6
19 9
2
19 8
8
19 8
4
19 7
19 7
0
0
19 7
Fig. 43.4 Comparison of Federal Funding by subject in the United States of America. Data supplied by AAAS [3]
43
A Primer for Grant Applications
The most important thing is to prepare and set aside time before writing the application, ideally this should take 3–6 months. The grant application process requires more than filling in a form on a Friday afternoon. Rather, it is a process of writing and re-writing until the application has no flaws, and can scientifically convince a grant funding body to support the research. Time is important, as there are many steps to submitting an application. These include: • • • • • • •
• • •
Writing the grant proposal Preparation of the finances Applicant to review the application (repeatedly) Co-applicants to review the application Departmental research coordinator to review the application Head of department to review the application Prepare any special documentation for the grant (e.g. ethics and animal work licences, letters of support from collaborators) University research services to review the application University finance services to review the application Formally submit the application (online, mail, by hand)
Many grant funding bodies require grants to conform to a specific format. Whether submitting an online application or writing on a word processor, it is vital that word count, spelling, and grammar are flawless in all the sections. A grant application will typically require a short Summary/Abstract, and longer sections on the background (with references) and the plan of investigation. Furthermore, most research grant funding bodies and charities also include lay people on their panels and committees, as representatives of various organisations or charities, and therefore many applications have a special segment for writing a lay summary. This needs to explain the concepts of the research work in a manner that is understandable by a member of the public, but also more importantly, it needs to convey to the lay member, the relevance of the research and why it should be funded. The Background will typically provide sufficient information for a reviewer to appreciate the importance of your research and understand the exact question the research is proposing to answer. The Background should provide an overview of the relevant literature in
583
the field, placing your project and the rationale for funding it within this context. The plan of investigation provides a description of how and where the research will be carried out. Make the questions you are trying to address clear and where possible, include preliminary data to convince the funding committee that the research is feasible. Justify your choice of methodology, particularly if you have selected one experimental method over another and discuss alternative approaches if your strategy is high-risk. Make sure that appropriate controls are included in the plan and show a clear timeline of your work plan. Many grants are assessed by specialist reviewers already in the same research field as the grant being applied for, but the majority of the reviewers will only have a cursory acquaintance with the direct area of research. Therefore, both the background and the plan of investigation, and indeed the whole application needs to be aimed at explaining the research aims to a scientific, but not necessarily super-expert audience. Thus for example, the reasons for employing the various research methods needs embellishing to some degree, as your application needs to equip any potential reviewer with all the tools with which to accept your application. As mentioned above, pilot data can be validly used to strengthen the proposal. Although there is no formal section to describe results, as these do not yet exist, it is prudent to communicate what the expected results might be, or how one would apply any results, whether positive or negative. It is important to convey the likely conclusions derived from expected or non-expected results and how these impact the current and future research and medial work. Furthermore, it is vital to include a specific section within the plan of investigation laying out power calculations that allow you to approximate what size of experimental effect is needed be able to answer the scientific question posed. This should justify and tie in with how many subjects are needed in the experimental plan, and ultimately will also support the subsequent costings of the grant. Fellowship applications contain a section to be filled in by referees. As with all references, the onus is on the applicant to choose the referees most suited to his or her application. It is vital that applicants accommodate plenty of time to inform their referees for this request, and to allow them the required time to complete these statements.
584
Before actually completing the application, it is always judicious to read the funding bodies’ website or application documents. This will allow honing your application in line with the grant funding body, and will inform you of the specific requirements of each particular grant funding institution. The NIH [4], for example, has an in-depth document explaining the necessities of grant application to their institution, and they go on to specify the following review criteria: • Significance: ability of the project to improve health • Approach: feasibility of the methods and appropriateness of the budget • Innovation: originality of the approach • Investigator: training and experience of investigators • Environment: suitability of facilities and adequacy of support from your institution Reviews are asked to consider these above factors, but the final decision rest with their overall judgement regarding an application. Sometimes, they are asked to select a winning entry amongst a midst of applications, whereas on occasion some funding bodies are happy to fund research applications as long as they get the approval of the reviewing body. As with all scientific writing, the work has to be written clearly and lucidly, communicating concepts in an easy-to-digest manner. Diagrams and graphics are both advantageous and informative, and can help in expressing the most complex of concepts. Fundamentally, when writing an application, it is important to bear in mind that one is compiling a list of reasons to ideally compel a funding body to support the proposed research. Any legitimate tools that will facilitate this process should be considered and taken full advantage of.
43.6 Costing Quoting the figure one requires for research initially seems easy, but is amongst the most meticulous elements of applying for a grant. Contrary to common belief, one can no longer ask for a “guesstimated” lump sum that on subsequent arrival would be spent by the principal investigator as he or she sees fit during the course of a research project. Rather, each project needs to clearly stipulate the exact project cost of every part of the project including the use of stationary, basic
H. Ashrafian et al.
laboratory materials and paying for patient travel cost as part of a study. Each cost needs to be accurately calculated from a reputable source, and correlated to the research protocol as to why they are necessary. If, for example, a laptop is required, this needs to be clearly stipulated in the research protocol and then subsequently justified in the costings. As with many financial statements, they need to be mathematically clear, robust and clearly stated in spreadsheet format. Once the grant is written, it needs to be ratified by the applicant’s university or institution accounts department or the research services department, to ensure that the costs are accurate, and have incorporated the necessary costs accrued by each institution for taking part in the research stated. As a result of the complexity of listing these funds, specific software packages such as infoEd [9] and pFACT (Project Financial Analysis and Costing Tool) [10] can be adapted to help academics list their financial needs for university research, and indeed some universities do not allow the submission of grant applications without the research costing being listed in these database tools.
43.7 An insight into Application Processing Once an application is submitted, it is checked internally by the funding body to ensure that it meets the minimum entry criteria. Funding bodies often receive very high numbers of applications and those that have not been filled out correctly or have information missing will be returned to the applicant. Applicants who are not eligible to apply or who have submitted proposals outside the remit of the funding stream will also be rejected at this stage. The application will then be sent for peer-review. In some cases, this will include a triage process in which the funding committee members select those applications they think are potentially fundable and reject applications that will clearly not make the grade. In addition, many funding bodies send applications for external peer-review. Reviewers are selected with appropriate levels or expertise and seniority and asked to comment on aspects of the application such as novelty of research, the applicants standing in the field, the importance, relevance and quality of the proposal and suitability of methodology.
43 A Primer for Grant Applications
585
The final decision is usually made by the funding committee who meet to discuss the merits of the proposals together with any potential problems. When reaching a final decision, the committee will also consider the views of the external referees and take into account other important aspects, such as value for money and the strategic importance of the work. Many funding bodies provide feedback to applicants in the form of external referee comments or committee feedback. Whether your proposal is funded or not, it is always worth requesting feedback.
43.8 Common Reasons for Failure There are numerous reasons why a grant application may not be funded, but these are not necessarily due to a funding body lacking adequate resources. As described above, each grant application undergoes a rigorous assessment before funding is secured. This necessitates that applications are perfected in order to address the many requirements of the funding committees. Reasons for funding failure can be broadly divided into four categories: Science, Applicant, Application manuscript and Financial (these are depicted below in Fig. 43.5).
43.9 After Success or Failure Failure in securing funding for a project does necessarily imply that a project cannot be undertaken, as it is possible to reapply for another grant. Some grant Too much money requested Unsuitable breakdown of costs Expensive way of addressing the question
Does not exist Exists, but is unconvincing Pilot Data Work has already been done Unfocussed/ Unclear “Too Ambitious” Experimental plan unclear Scientific basis for project unsound “Unworthy aim” Project not relevant to funding body applied Inefficient way of addressing the question Better model to address the scientific question Study is poorly powered/inadequate numbers of subjects Power study has not been done Hypothesis not clearly defined
Fig. 43.5 Reasons for funding failure
funding bodies do have an appeal system to reconsider failed grants, though the vast majority remain unaccepted. Nevertheless, the majority of grant funding bodies do give some feedback as to the nature of the failed application, and this can be used by applicants to update and improve on their work so as to increase the chances of any further applications made. If application for funding is successful, then the onus is on the successful applicants to ensure that they strictly adhere to their proposal that they presented to grant funding body. All publishable work that has had a contribution by grant funded work needs to specify the role of the grant funding body in the manuscript. Furthermore, many funding bodies require successful applicants to make presentations of their work at set dates after donation of the fund, so as to ensure suitable use of funds and maintaining continued familiarity with the applicant’s work. Once applications are successful, celebrations are not unusual, but it must be noted that a modern thriving academic career relies on a consistency of successful grant funding applications and one must always be on the lookout for performing the next pertinent experiment, and therefore applying for the next successful grant.
43.10 Conclusion Applying for a research grant is not an easy process. However, by its very nature, it is an edifying task where one is required to rigorously define one’s research hypothesis, and exactly how this will be answered. It requires many skills such as vision, clarity of thought
Financial
Applicant(s)
Poor Track History Poor CV Interviews Poorly Poor References Lack of appropriate expertise Poor publication record (in relevant area)
Reasons for Funding Failure
Science
Application Manuscript
Poorly Written Not completed and submitted on time Poor continuity Does not comply with the requirements of the funding body Poorly written Poor diagrams/graphics
586
and communication. Although it is becoming increasingly competitive, a successful grant application will allow the commencement of a research project to take place as successfully as possible. Whether or not the research results fit in with the original hypothesis is not in question, rather was the project carried out with the proposed scientific acumen and exactitude. In short, in applying for a grant, there are many pitfalls but as with many things, success transforms these efforts as ultimately worthwhile.
References 1. Veith I (1961) The surgical achievements of ancient India: Sushruta. Surgery 49:564–568 2. McCollum CH (2000) The Distinguished Service Award Medal for the Society of Vascular Surgery, 1999: Michael Ellis DeBakey, MD. J Vasc Surg 31:406–409 3. American_Association_for_the_Advancement_of_ Science_(AAAS) (2007) Programs: Science & Policy - R&D
H. Ashrafian et al. Budget and Policy Program. Available at: http://www.aaas. org/spp/rd/guide.htm 4. United_States_Department_of_Health_and_Human_ Services (2008) National Institutes of Health (NIH): Grants & Funding Opportunities. Available at: http://grants.nih.gov/ grants/oer.htm 5. Wellcome_Trust (2008) Funding. Available at: http://www. wellcome.ac.uk/funding/ 6. Medical_Research_Foundation_(MRC) (2008) Applying for a grant. Available at: http://www.mrc.ac.uk/ApplyingforaGrant/ index.htm 7. Villar J, Valladares E, Wojdyla D et al (2006) Caesarean delivery rates and pregnancy outcomes: the 2005 WHO global survey on maternal and perinatal health in Latin America. Lancet 367:1819–1829 8. Bhandari M, Busse JW, Jackowski D et al (2004) Association between industry funding and statistically significant proindustry findings in medical and surgical randomized trials. CMAJ 170:477–480 9. InfoEd_International_Inc (2008) InfoEd. Available at: http:// www1.infoed.org/ 10. Project_Financial_Analysis_and_Costing_Tool (2008) pFACT. Available at: http://www.uclan.ac.uk/other/finan/pFACT/ pfact_06.htm
Key Aspects of Grant Applications: The Surgical Viewpoint
44
Bari Murtuza and Thanos Athanasiou
Contents 44.1
Introduction ............................................................ 587
44.2
The Importance and Contribution of Surgical Research............................................... 588
44.3
Unique Aspects of Surgical Grant Application ... 589
44.4
Funding Availability and Bodies for Surgical Research ............................................. 590
44.5
Funding for Surgical Residents and Fellows ....... 590
44.6
The Grant Review Process .................................... 592
44.7
Writing .................................................................... 592
44.8
Outline of Grant Structure and Format............... 593
Abstract In this chapter, we begin by examining the background of funding for surgical research, the relevance of surgical research and the unique aspects of the surgical research proposal. We shall consider different types of grant application and funding agency, the grant review process and shall use these considerations as the backdrop upon which the principles and mechanics of writing a grant proposal shall be described. Both clinical and basic science research applications will be considered and we shall conclude by discussing the programme grant.
44.8.1 The Investigator, Research Environment and Budget ............................................................... 593
44.1 Introduction
44.9
When considering how to write a grant proposal for surgical research, it will be of use to first take into account the environment of surgical research funding and opportunities. There has been a historical trend for a lesser proportion of national research funding to be available for surgical research projects. Generally, there has been a decline in successful surgical applicants for National Institutes of Health (NIH; United States) funding since 1975, when compared with non-surgical disciplines and this has been partly attributed to the failure of the surgical profession to develop and sustain an adequate research workforce [1]. Between 1992 and 1999, funding for basic surgical research by the NIH was found to be essentially static, though with a slow increase in surgical research productivity. There was, however, still a significantly lower rate of funding support and publication output when compared with the departments of medicine [2]. Similarly, Rangel et al. [3] examined surgical vs. non-surgical proposals submitted to the NIH between 1996 and 2001. Surgeons were involved in 35–65% less peer review activity
The Plan of Investigation ....................................... 594
44.10 Further Points for Basic Science Proposals ......... 594 44.11 Further Points for Clinical Proposals ................... 594 44.12 Programme Grants ................................................ 595 44.13 Conclusions ............................................................. 595 References ........................................................................... 595 Web Links ........................................................................... 596
B. Murtuza () The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_44, © Springer-Verlag Berlin Heidelberg 2010
587
588
relative to non-surgeons when normalised to grant submission activity. Further, success rates for funding were far lower and of lesser financial value for the fiscal years 1995–2001. There has also been variability in the proportion of surgical research articles in surgical journals when comparing basic science and clinical papers. In one study, while the number of basic science research articles from the United States (USA) was steady across five surgical journals, there was a decline in the number of clinical research papers from 82.4% in 1983 to 71.1% in 1993 [4]. In addition, fewer grant applications from surgical departments vs. medical ones have been received by the National Cancer Institute (NCI; USA) [5]. Indeed, this latter study found that the success rates for grant submissions to both the NCI and NIH were lower for surgical departments when compared with medical ones (NCI 25 vs. 34%; NIH 29 vs. 37%). These shortfalls need to be addressed in the context of the overall importance of surgical research as this will facilitate understanding of the mission and purpose of surgical investigation and thus the mechanics of grantsmanship per se.
44.2 The Importance and Contribution of Surgical Research Surgical research is a key aspect of quality improvement in surgery. It has been suggested that investigation “permits surgeons to optimise the care of patients, abandon misdirected therapies early, recognise fundamental pathophysiological principles and apply these observations to patients in a scientifically relevant and socially sensitive fashion” [6]. Research into new technologies and their incorporation into surgical care constitute a major element of quality improvement [1]. Further, in the current climate of cost containment and efficiency of health care delivery [2], it has become apparent that securing funding for surgical research will be a key element of a surgical department’s business strategy [7]. Coran et al. have elaborated upon the relationship between financial security of an academic health centre, clinical activity, revenue and academic activity [8]. Using a model for networking and expansion of academic paediatric surgeons, these authors achieved a significant increase in external grant support from $139, 882 to $6, 109, 971 over a 4-year period, with a concomitant increase in the number of research
B. Murtuza and T. Athanasiou
publications. Thus, clinical activity and research activity in surgery should be seen as complimentary and not as exclusive and surgical research funding in turn may be considered as complimentary to funding for nonsurgical medical research [9]. Surgeons are in a unique position to understand the intimate relationship between structure and function and as such all surgeons are surgical investigators to the extent that each modulates the former to expect effects on the latter – this is at the core of formal experimentalism and scientific method. Indeed, surgeons have been called “the effector arms of science” [6]. Yet, conflicts between demands for revenue generation, practice incomes and creative effort indicate that few surgeons or surgical trainees apply for research funding [1]. Critically, many surgical diseases such as coronary heart disease, traumatic injuries, breast, lung and colorectal cancer continue to rank amongst the top ten leading causes of mortality in the western world [3]. Despite the apparent lower proportion of NIH funding for surgical projects, there is a relationship between NIH funding and the burden of disease, with higher levels of funding available for these “surgical diseases” corresponding to the number of disability-adjusted life years – i.e., the years of healthy life lost due to disability or death (Fig. 44.1 [10]). One reason for less successful surgical funding is the intensity and demands of surgical training and practice. Avis et al. suggested that there should be more opportunities for academic surgeons, protected research time, peer recognition for research and professional and economic rewards from the host institution for surgical investigators [5]. There is both a quantitative and qualitative deficiency of surgical investigators and an urgent need to improve the rigor of surgical research [1]. Indeed, quality improvement in health care “requires application of the best scientific evidence available to the care of the public” [1]. Surgeons require formal training in scientific methods applicable to basic scientific, translational and clinical research. These include experimental design, biostatistics, scientific writing and grantsmanship. Applicants for funding who have received formal training during a PhD are generally more successful, with success rates for MD PhDs>PhDs>MD only [1]. More recently, Dickler et al. similarly found higher success rates amongst physician-investigators with PhD or MD PhD when compared with MD for NIH grants between 1964 and 2004 (Fig. 44.2 [11]).
44 Key Aspects of Grant Applications: The Surgical Viewpoint
1000 Breast cancer
500 NIH Funding (millions of dollars)
Fig. 44.1 Relationship between NIH funding and burden of disease for some surgically treatable conditions modified from Gross et al. [10]
589
Ischemic heart disease Injuries Lung cancer Prostate cancer
100 50
Colorectal cancer
Ovarian cancer
10 5
0.1
0.2
0.5
1
2
5
10
Disability-Adjusted Life-Years (millions)
60
2500
50 Percentage
3000
Count
2000 1500
40 30
1000
20
500
10
0
MD PhD MD and PhD
0 1964
1972 1980 1988 1996 2004 First-Time Applicant Cohort Year
1964
1996 1972 1980 1988 First-Time Applicant Cohort Year
2004
Fig. 44.2 First-time applicants for R-01 NIH grants. The left panel shows the number of applications and the right panel shows the percentage awarded (both according to the degree of the applicant). From Dickler et al. [11]
44.3 Unique Aspects of Surgical Grant Application Surgical grants should address an important question concerning a surgical problem or using a surgical model [12]. Success in obtaining surgical research funding requires understanding of the scientific method. In 1985, the Conjoint Council on Surgical Research (CCSR; USA) highlighted six domains of surgical disease within which to survey research activity, to elucidate promising areas of investigation and to consider methods for increasing the level of surgical research activity. These domains were: cancer, cardiovascular,
gastrointestinal, metabolism/nutrition, transplantation and trauma/burns [1]. In particular, surgeons have an important role as cancer research investigators and the NCI division of cancer treatment has established initiatives specifically for research in surgical oncology [5]. New emerging technologies also lend themselves as opportunities for surgical research funding applications and the growing field of robotics in surgical research and clinical practice is a prime exemplar of this [13]. A key criterion of any funding application is originality and innovation. The practice of surgery requires a degree of ingenuity and improvisation in day-to-day practice and the surgeon-investigator should capitalise
590 Internal Medicine
Otolaryngology
General Surgery
Orhtopedics
1.00 Correlation Coefficient (r)
upon and emphasise these aspects when drafting a grant application. For clinical studies, heterogeneity in the groups of patients as well as surgeons and surgical centres involved should be addressed [14]. Grants may be for basic scientific, translational or clinical research projects; in all cases, the surgical relevance of the work proposed should be highlighted. One may further categorise the application by the type of funding and these include personal fellowships and training grants, project grants and programme grants. The principles of writing for all of these are the same and shall be discussed later in a separate section. Before consideration of this, we shall outline the sources of funding for surgical projects as well as the grant review process.
B. Murtuza and T. Athanasiou
0.75
0.50
0.25
0.00 1996
1998
2000
2002
2004
Fiscal Year
44.4 Funding Availability and Bodies for Surgical Research Individuals and groups may apply for internal funding within their institution, be it a research institute or university, or for external support. External bodies include national ones such as the NIH, Veterans Administration (VA), National Science Foundation (NSF) and Department of Defense in the USA or Medical Research Council (MRC) in the United Kingdom (UK). The NIH offers project grants such as R-01, training grants (K-08) and grants for junior investigators involved in clinical or translational research (K-23), as well as programme grants (P-01). T-32 grants are training grants given to institutions specifically to train residents and students. There are also numerous state agencies or surgical bodies such as the American College of Surgeons (ACS), NCI, Wellcome Trust (UK), Royal Society (UK), Royal College of Surgeons (RCS) of England and the Howard Hughes Medical Institute (HHMI; USA). Most of these consider submissions for both clinical and basic science projects, though the HHMI fund basic science investigators. In France, large funding organisations include Institut National de la Sante et de la Research Medicale (INSERM) and Centre National de Recherche Scientifique (CNRS). Non-profit charitable organisations include the British Heart Foundation, Arthitis and Rheumatism Council UK, American Heart Association and the American Cancer Society. In addition, one may consider the pharmaceutical or medical technology industries as sources of funding. International opportunities and exchange programmes are also offered by
Fig. 44.3 Correlation between NIH funding to medical schools and various departments. The plot shows that funding rates for medical departments were greater than surgical ones. Amongst the latter, general surgery faired best as a specialty. Taken from Ozomaro et al. [15]
some of the national agencies listed earlier. In the United States, the NIH is the largest funding body and offers intra- and extramural opportunities according to whether the work is carried out within the NIH at Bethesda itself. Although the NIH budget increased from 11.2 to 21.9 billion US dollars from 1996 to 2004, surgical departments only account for 4.8% of medical schools’ funding, with general surgery receiving the largest proportion (Fig. 44.3 ref [15]). Interestingly, there seems to be an apparent lag between NIH extramural funding availability and the number of grant submissions and perhaps even an inverse relationship between the total NIH extramural funding available and the success rates for submissions; this would suggest that investigators should not be discouraged by the environment and perception of the “funding market” when considering grant submission (Fig. 44.4 ref [16]). Funding rates for the NIH and VA in the United States are around 15–22% [12].
44.5 Funding for Surgical Residents and Fellows One must promote rigorous scientific training of surgical residents and fellows if the surgical workforce is to evolve and progress to continue to offer surgical care of
44
Key Aspects of Grant Applications: The Surgical Viewpoint
a
591
b 160 140 120 R = -0.375, p < 0.05 4000
100 80
3000
60 2000 40 1000
6000 Extramural Funding per Submission (in 2006 K$)
Total Extramural Funding (in 2006 M$)
R = 0.574, p < 0.0002 5000
20
Total Adjusted Extramural Funding (in 2006 M$)
6000
5000 4000 3000 2000
0.6 R2
NIH Data
0.4
Sinusoidal Fit
0.2 0 -2 -1 0 1 2 3 4 5 6 7 Phase Lag (Years)
1000 0
0 0 20.0% 25.0% 30.0% 35.0% 40.0% 45.0% 50.0%55.0% Success Rate
1970
1975
1980 1985 1990 1995 2000 2005 Year
c Funding Applications Ratio
Periodic Oscillation (Arb.Units)
3.5 3 2.5 2 1.5 1 0.5
2.5 yrs
0 0
3
6 9 12 Time from Funding Peak (Years)
15
Fig. 44.4 Trends in NIH funding from 1970 to 2006. (a) The blue vertical axis, plot and circles show a weak negative correlation between success rate and total NIH institutes extramural funding. This may occur due to an excess of grant applications in times of perceived “luxuriant funding.” The red vertical axis, circles and plot show that the success rate correlates positively with the numerical ratio of the total NIH funding/number of submissions in a given year. (b) There appears to be a sinusoidal variation (period approximately 9 years) in total extramural NIH funding with time with some outlying points (arrow) indi-
cating the period of doubling of the NIH budget. Further, the data seem to suggest a phase lag of approximately 2 years between the total amount of funding and number of grant submissions (inset). Hence, the number of applications received by NIH is predicted best by the extramural funding budget, 2 years before. This is exemplified further in (c), which also shows a sinusoidal variation in the success rate of applications (positively correlated with the ratio between the amounts of funding/ number of applications – red plot – as shown in (a). Taken from Ascoli et al. [16]
the highest standards to patients, bring technological advances into the realm of clinical practice and to ensure success of surgeons in obtaining competitive funding for research once they are independent physicianinvestigators. A recent single-centre study of all graduates from an academic surgery resident programme between 1990 and 2005 found that predictors of funding following a surgical residency included the number of publications during resident research and success in obtaining grant support during residency [17]. This study, however, found that only just over 50% of graduates who had spent 2–3 years in research during their
residency applied for funding after their training. Surgical trainees may elect to support themselves or may apply for institutional or external funding. In the USA, the ACS, Society of University Surgeons and Association for Academic Surgery all offer 2-year fellowships. The Society of Surgical Oncologists, American Association of Plastic Surgeons and Congress of Neurological Surgeons also offer research fellowships [18]. In the UK, the MRC and Wellcome Trust offer Research Training Fellowships for physician-investigators in training as well as intermediate or clinician-scientist schemes for post-doctoral physicians. The RCS England
592
offers 1-year research fellowships for surgical trainees. Junior applicants for funding may consider applying for less demanding grant opportunities with a view to generating preliminary data, which may then be used for more competitive applications for larger sums of money. There is, unfortunately, a lack of structure and organisation for surgical research training as well as variability in mentorship and motivated surgical residents and fellows need to identify a promising host institution and mentor prior to writing an application for funding.
44.6 The Grant Review Process NIH applications initially go to the Centre for Scientific Review (CSR) and training grant applications then go to the relevant institute such as the NCI or the National Heart, Lung and Blood Institute. K-01 applications are assigned to a study section, which comprises around 10–20 scientists. Each study section has an associated executive secretary who assigns the application to a primary and secondary reviewer and it is these reviewers who present the case for a proposal at regular scheduled meetings for the study section. The review committee members vote to approve the application. If successful at this stage, the grant is assigned a priority score. A summary of the committee’s evaluation, “the pink sheet” is prepared by the executive secretary and sent to the applicant. The results of this primary review are then presented at an NIH advisory council where the suitability of an application with respect to the NIH research mandate is determined. Finally, the NIH may conduct site visits to assess the researchers and the proposed research environment, and this is particularly relevant for programme grants. The executive secretary is a good liaison in helping to plan for the site visit and a practice visit using external consultants may be helpful. In the UK, grant applications to the MRC are assessed through a two-stage process. Stage one involves a triage system designed to eliminate applications, which are unlikely to be successful. These preliminary decisions are made by chairs and deputy chairs of research panels and are based on the opinions of independent peerreviewers. Proposals worthy of further consideration are then short-listed for stage two. The MRC uses the socalled “core assessment criteria” in this first stage, which comprise: importance of the proposal; scientific potential and resources requested. All the proposals are
B. Murtuza and T. Athanasiou
refereed by members of a “College of Experts” and other specialist referees if required. The College of Experts includes panels for public health/health services-related research, infections/immunity, neurosciences, molecular/cellular, physiology and clinical sciences. All the short-listed applicants receive a decision and feedback from the board/panel review through an emailed assessment template. It should be noted that many grant applications are not funded on first submission and feedback comments from the NIH “pink sheet” or the equivalent should be carefully addressed when considering resubmission. Most grant agencies and funding bodies usually require resubmissions to be flagged as such on the application form. For all resubmissions, the cover letter should include responses to all previously raised comments. In all cases, the fundamental aspects of what the reviewers look for in an application are very similar and these shall be examined next.
44.7 Writing In general, a grant application must be structured and coherent and must present a plan for investigation that follows a logical progression. Overall, it should be easy to read for the reviewers and should avoid frequent references to appendices. The strength of the hypothesis, the specific aims to address this, the background, methodology and preliminary data are all critical aspects. In addition to the research proposal itself, a review committee looks carefully at the investigators and the research environment. The NIH considers five strategic areas in grant submissions: significance, approach, innovation, investigator background and institutional environment [19]. Applicants should decide which criteria to emphasise for a given application and how competency in each of these areas can be demonstrated. It is also important to ascertain the priorities of a particular grant-awarding body, to highlight the clinical relevance of the question being addressed and to clearly state the surgical methodology or model that will be used to address this question. For surgical resident training grant applications, the choice of mentor is a key one and a mentor with an established track record in publishing and successful grants is invaluable not only for the funding application, but also throughout the period of proposed research if the applicant is successful.
44
Key Aspects of Grant Applications: The Surgical Viewpoint
Funding bodies carefully consider the qualifications of a trainee and their potential to become an independent investigator. The training plan may also need to be outlined in an application together with an indication of how progress will be monitored and appraised. Frequent reasons for failure of a grant application are a weak hypothesis, an inadequately detailed research plan or a proposal which lacks focus [20]. Although grant writing is one of the most useful skills, it is one of the areas for which many surgeons receive little formal training. Some centres have dedicated offices for grant writing or services for scientific writing and editing and these have been shown to enhance scientific productivity in surgical departments [21]. Generally, one must understand the reviewers’ perspective and write in clear, concise language, avoiding jargon. One author has commented that one should “be surgical in editing the text” of a grant application (Griffin; see WebLinks). It is important to allow plenty of time for writing, pre-review and revision prior to submission and this may take 6–12 months or longer. It is helpful in this regard to seek advice from senior investigators and those with experience sitting on grant review committees or NIH study sections. Overall, a grant should present a persuasive argument and be written with consistency from start to finish and with logical subsections and appropriate use of figures to enhance it and facilitate reading. The application should be thoroughly checked prior to submission and be completely free of grammatical and spelling errors.
44.8 Outline of Grant Structure and Format The structure and format for most applications is similar with sections comprising: abstract; introduction/ background/significance; hypothesis and specific aims; preliminary data; research plan and methods; special issues – patients, ethics approval, animal models, gene manipulation committee permissions. The abstract is a distillate of the entire proposal and may often be the only part read along with the introduction for the triage phase of assessment. Most funding bodies also ask for a lay summary which describes the project in non-technical language. The background should detail gaps in the current literature and should form the premise upon which the hypothesis will be based. It may be
593
prudent to include relevant work of potential reviewers in this section. One should state the importance of the clinical problem being addressed and how the project might advance the state of knowledge in this area. The hypothesis should be clearly stated with feasible specific aims which follow in a logical sequence. Up to 45% of grants have a problem with the introduction and specific aims section and may be overambitious or poorly focused [12]. The preliminary data should be relevant and of manuscript quality and must include figures wherever appropriate for clarity. These data should help demonstrate that the research plan is feasible and that the investigators are competent and skilled in relevant techniques and have access to some of the necessary facilities. The research plan section should parallel the specific aims and be logical. One must demonstrate feasibility and availability of techniques and equipment and give some methodological detail. Positive and negative controls should be stated and for clinical studies, specifics of patient accrual, diagnostic tests, treatments, monitoring and follow-up. Finally, one should state the potential implications of the proposed work, how these tie in with the research goals of the host institution and how the work could advance the state of knowledge in the field.
44.8.1 The Investigator, Research Environment and Budget The principal investigator (PI) should be able to demonstrate that he/she is qualified to run the proposed project. He/she should have the appropriate research training and expertise, have some track record in the proposed field and should have generated relevant preliminary data which demonstrates their capability and to some extent the feasibility of the work. Any relevant publications should be listed along with biographical sketches of the PI and other co-applicants. It is often helpful to obtain a previously funded proposal from a colleague; if possible, one that has been funded by the same agency being applied to. The application should also list other grants maintained by the PI and demonstrate that these will not overlap with the submitted proposal, yet form a part of a cohesive programme of activity if appropriate [20]. Within the research budget section, there should be careful justification for all costs. An institutional
594
grants administration office can often provide valuable assistance with this. Collaborators who are established investigators can strengthen the application considerably and letters of support should be included.
44.9 The Plan of Investigation This is the heart of the grant application. The plan should be logical and progressive in time and investigative sequence. It should be focused and parallel the specific aims which have been outlined to address the hypothesis. The hypothesis in turn should be not only interesting and exciting, but “entirely plausible” [20]. The plan should be detailed but not over-burdening with technical jargon. Hence, the right approach is that one should strive for balance. As with all sections, it should be easy to read. It should be explicit and not implicit. Often a timeline diagram or flowchart for the sequence of the project helps, with key goals marked. This section must be written in a convincing manner. In particular, the reviewer should believe that the investigators have thought carefully about the potential problems that may arise during the course of the work and that they have contingency plans for these or give the impression of possessing the necessary experience, judgment and flexibility to meet these challenges. A key supporting element for the plan is the preliminary data. The data should be relevant and provide supportive evidence that the investigators have the appropriate knowledge, skills and expertise to perform the proposed work. The techniques for data acquisition and analysis should be detailed as well as power calculations and details of patient accrual and follow-up for clinical studies. In addition to these general points, there are further details pertinent to either basic science or clinical projects.
44.10 Further Points for Basic Science Proposals A reviewer for a basic science application will expect absolute clarity, consistency and rigor in the research plan. The methodology should be described with an appropriate level of detail; that is, not with overwhelming technical minutiae, but at the same time not nonspecific. The reader should be convinced that the
B. Murtuza and T. Athanasiou
applicants know what they are talking about. In most cases, it is expected that the large proportion of methods cited will have been previously used in the PI’s laboratory and appropriate references may be given for these. For new methods, greater detail should be given and, if possible, evidence that the necessary support is available – such as through a collaborator. Availability of key reagents, including antibodies, prototypical drugs or molecular probes for nucleic acid hybridisation studies should be assured. It is particularly important that there are no “lynchpins” upon which a large body of subsequent work depends and this is the failing point in many applications. Examples include development of a novel antibody or murine genetic knockout strain which may have formidable challenges in themselves and take up an unanticipated amount of time. Alternative strategies for potential problems should be mentioned briefly. One may also wish to include a clause to mention further experiments that could be performed if time permits. This is a useful device as often applications are rejected due to being overambitious and including an unfeasible number of experiments. Finally, a well-written basic science research plan for a grant application which is successful serves as a useful guide during the period of investigation itself.
44.11 Further Points for Clinical Proposals The principles of writing the plan for a clinical project are similar to those for a basic science proposal. It is particularly relevant to emphasise the clinical problem being addressed and to consider the priorities of the funding agency. A review of 66 “pink sheets” from R-01 grant applications to the NIH identified several areas of concern including underdeveloped methodology sections [22]. The plan should carefully detail the research design, study groups, sample sizes and primary and surrogate outcome measures. Preliminary work and pilot studies should be cited and these help in providing some assurance as to the availability of participants. The study design and inclusion and exclusion criteria should be described as well as methods for blinding and randomisation wherever appropriate. Case patients and controls should be described. Sources of potential bias should be acknowledged and methods for dealing with these should be addressed. One must convince the reviewers
44
Key Aspects of Grant Applications: The Surgical Viewpoint
that participant availability will not be a major limiting factor. Methods for data collection and analysis should include power calculations and sensitivity and specificity of any diagnostic tests. Standardisation and quality assurance are key aspects of data collection for clinical work and trained staff at all involved study centres should be available. One should also address the intervention strategy itself and how outcomes will be monitored. As with basic scientific proposals, one should address potential limitations and problems that may be encountered and how these will be addressed. In particular, a realistic estimate of attrition rates during patient accrual and the follow-up period should be given. A full-time biostatistician is invaluable in planning clinical projects and their involvement will strengthen an application.
44.12 Programme Grants These are submitted by more seasoned senior investigators and are usually for substantial amounts of money over a period of perhaps 3–5 years. The application encompasses a number of sub-projects, each with a hypothesis and related specific aims, though such that these various projects are inter-related with an overarching, focused theme and goal. “Synergy and collaboration are the themes of the programme grant” [20]. The programme grant must be just that – i.e., propose an entire programme of research activity. It should be evident that the inter-related projects benefit significantly from being within the structure of the programme as opposed to being carried out individually on separate grant applications; the total should be greater than the sum of the individual parts. As such, the application aims to be “developed around a unified and defined research goal” [20]. Most large grant awarding agencies offer programme grants including the NIH (P-01) and the MRC and Wellcome Trust UK. Typically, a P-01 application is designed to make use of certain core facilities which may already be available or which have been included in the application bid. The core facilities requested are often expensive items of specialist equipment that may also require a trained, dedicated technician to run and maintain, e.g., magnetic resonance imaging, mass spectroscopy and laser capture microdissection. Concerning the applicant, the lead investigator should have an established track record, as should the leaders of the inter-related projects, if these involve
595
different groups. The principal applicant should also have appropriate administrative and leadership qualities and should provide evidence that he/she will devote an appropriate amount of their time to overseeing the proposed programme of activity. Finally, with respect to the host institution, the principal applicant must demonstrate that there is support for the programme locally and at each of the other institutions involved.
44.13 Conclusions In the current era of quality and cost-effectiveness in health care delivery, surgical practice must be seen as an industry in which research and development are critical components as with any corporate enterprise. To this end, appropriate support and funding should be made available for surgical research and, complementary to this, the importance of research within the surgical workforce must be emphasised by leaders in the field. This should result in the early instilment of an investigative philosophy into the surgical culture and this must be supported in turn by appropriate training in research methodology – both basic scientific and clinical. Guidance in grant-writing should be provided by dedicated teams in the host institution and senior investigators with an established track record. One may anticipate that better-funded surgical research will help raise the profile of an institution, strengthen its clinical programme and help offer newer advances to patients in a more expedient manner.
References 1. Jones R, Debas H (2004) Research: a vital component of optimal patient care in the United States. Ann Surg 240:573–577 2. Jackson HH, Jackson JD, Mulvihill SJ et al (2004) Trends in research support and productivity in the changing environment of academic surgery. J Surg Res 116:197–201 3. Rangel SJ, Efron B, Moss L (2002) Recent trends in national institutes of health funding of surgical research. Ann Surg 236:277–287 4. Nahrwold DL, Pereira SG, Dupuis J (1995) United States research published in major surgical journals is decreasing. Ann Surg 222:263–269 5. Avis FP, Ellenberg S, Friedman MA (1988) Surgical oncology research: a disappointing status report. Surg Oncol Res 207:262–266
596 6. Harken AD (2001) The role, focus and funding of research in a department of surgery. Arch Surg 136:154–157 7. Souba WW, Wilmore DW (2000) Judging surgical research: how should we evaluate performance and measure value. Ann Surg 232:32–41 8. Coran AG, Blackman PM, Sikina C et al (1999) Speciality networking in pediatric surgery: a paradigm for the future of academic surgery. Ann Surg 230:331–339 9. Roberts WC (1990) Limited research funds and cardiac medicine without cardiac surgery. Am J Cardiol 65:536–537 10. Gross CP, Anderson GF, Powe NR (1999) The relation between funding by the National Institutes of Health and the burden of disease. N Engl J Med 340:1881–1887 11. Dickler HB, Fang D, Heinig SJ et al (2007) New physicianinvestigators receiving national institutes of health research project grants. JAMA 297:2496–2501 12. Berger DH (2005) An introduction to obtaining extramural funding. J Surg Res 128:226–231 13. Aggarwal W, Hance J, Darzi A (2004) Robotics and surgery: a long-term relationship? Int J Surg 2:106–109 14. Weil RJ (2004) The future of surgical research. PLoS Med 1:17–19 15. Ozomaro U, Gitierrez JC, Byrne MM et al (2007) How important is the contribution of surgical specialties to a medical school’s NIH funding. J Surg Res 141:16–21 16. Ascoli GA (2007) Biomedical research funding: when the game gets tough, winners start to play. BioEssays 29:933–936 17. Robertson CM, Klingensmith ME, Coopersmith CM (2007) Long-term outcomes of performing a postdoctoral research
B. Murtuza and T. Athanasiou fellowship during general surgery residency. Ann Surg 245:516–523 18. Greene AK (2000) Research funding for surgical residents. Curr Surg 57:332–334 19. Toledo-Pereyra LH (2001) Surgical research funding. J Invest Surg 14:299–300 20. Niederhuber JE (1985) Writing a successful grant application. J Surg Res 39:277–284 21. Derish PA, Maa J, Ascher NL et al (2007) Enhancing the mission of academic surgery by promoting scientific writing skills. J Surg Res 140:177–183 22. Inouye SK, Fiellin DA (2005) An evidence-based guide to writing grant proposals for clinical research. Ann Intern Med 142:274–282
Web Links NIH office of extramural research. http://grants.nih.gov/grants/ oer.htm The American College of Surgeons. http://www.facs.org Society of Surgical Oncology. http://www.surgonc.org Howard Hughes Medical Institute. http:// www.hhmi.org Medical Research Council (UK). http://www.mrc.ac.uk/Applying
foraGrant/index.htm
How to Organise an Educational Research Programme Within an Academic Surgical Unit
45
Kamran Ahmed, Hutan Ashrafian, and Paraskevas Paraskeva
Contents 45.1
Introduction ............................................................ 597
45.2
Fundamentals of Educational Research Methodologies ......................................................... 598
45.3
Key Points to Design a Surgical Educational Research Programme ....................... 599
45.3.1 Pre-Requisites for Planning Educational Research ............................................... 600 45.3.2 Sequence of Steps in an Educational Research Project ....................................................... 600 45.3.3 Writing a Project Proposal ....................................... 601 45.4
Factors Influencing Surgical Educational Research ............................................ 602
45.4.1 45.4.2 45.4.3 45.4.4
Academic Support .................................................... Credibility ................................................................ Collaboration ............................................................ Time .........................................................................
602 603 603 603
45.5
Conclusions ............................................................. 603
References ........................................................................... 603
K. Ahmed () The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust at St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
Abstract Surgical education research programmes are necessary to ensure the provision of up-to-date surgical training and evaluation to maximise patient safety and care. This chapter aims to provide an overview of the components involved in setting up and running a surgical educational research programme within an academic surgical department. It describes the basic steps in setting up surgical research projects and addresses the factors influencing a productive surgical educational research programme.
45.1 Introduction Medical education is the accredited clinical training of surgeons, physicians and allied healthcare personnel. The research in medical education began more than three decades ago and involved a small group of clinicians and educational researchers at the medical school in Buffalo, New York [8]. Before this, educational research was the realm of academic departments, whilst others focused on service provision. Although education was delivered by academics, it was not really considered a bona fide academic research entity until recently. Academic departments are now considering education more seriously, thus making way for a change of practice starting at the grass roots of medical culture. Academics and clinicians are facing growing pressure from their governing organisations to narrow the boundaries between research and service provision [1]. Advancement in technologies has further resulted in the use and development of educational adjuncts in craft specialities, particularly surgery. Surgical education is now recognised as a life-long process, which starts with a solid training period and is followed by high-quality continuing medical education [9]. It includes training and accreditation in both
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_45, © Springer-Verlag Berlin Heidelberg 2010
597
598
K. Ahmed et al.
Surgical competence
Professional competence
Knowledge/judgment
Relationship with patients
Technical ability
Relationship with colleagues
Fig. 45.1 Components of surgical education (adapted from reference [9])
surgical (knowledge, common sense and technical ability) and professional competence (ability to deal with patients and colleagues) (Fig. 45.1) [9]. Since its beginning, the field of educational research in surgery has been experiencing continuous expansion with a significant cultural change. As a result, the following areas have seen real development: basic research in the skills training, problem-based learning, assessment methods, and continuing medical education along with specialist-recertification [8]. Consequently, researchers from diverse disciplines such as psychology, communication, education and technology are entering the field and are bringing with them the research models and practices as well as parameters of “success” that are contemporary in their disciplines [2]. Evidence of this expansion includes the launch of journals on surgical education, increasing the trend towards higher research degrees and the development of dedicated educational research units at institutions such as Imperial College London (Department of Biosurgery and Surgical Technology), University of Toronto (The Wilson Centre) and University of Dundee (The Cuschieri Skills Centre). With this ever-increasing interest in surgical educational research, innovative voices and perspectives are being added to this domain. The expansion and resulting inclusion of a body of diverse educationalists into the field of surgical education has made it more challenging in terms of the demand for high-quality research programmes. This is further reflected by an increasing demand for obtaining research grants and training students for research degrees [2]. The hallmarks of a surgical educational programme include professional leadership, competent communication and effective evaluation. Educational programmes
provide trainers and trainees with a reliable and valid curriculum that facilitates the development of knowledge, communication and leadership skills. This chapter aims to review the stages involved in setting up an educational research programme within an academic surgical department. It also looks into the factors influencing the outcome of educational research in academic departments.
45.2 Fundamentals of Educational Research Methodologies Educational research is difficult to deliver if it is considered as a clinical trial where an intervention is compared to a therapeutic standard. In the event of comparing two educational methods of varying efficacy, it is unethical to deliberately expose some students to the weaker educational method under investigation. Hence, much of the methodology of educational research is humanistic. In this style of research, the enquiry is more reflective and critical, and the research occurs in real time during the training sessions. The research must take into account the social factors in the target audience and practitioners. These factors may influence educational outcomes and may also change the application of these results into surgical practice. The essence of educational research in surgery is to understand the surgical practice of individuals or groups of individuals with the aim of improving clinical services and outcomes. At a time when surgical knowledge is rapidly expanding, surgical educational research enables an improved translation of research findings into the clinical care of patients. Surgeons with expertise in education can identify the important issues in surgical training at each stage of the learning continuum. Surgical educational research should therefore include all the key stages in training at undergraduate, postgraduate and specialist levels (this includes continuing medical education). The majority of educational research in surgery (irrespective of the type of technique being studied) focuses on two areas, self-directed study and group study. Selfdirected study can take place through personal reading and reflection or through participant observation. Group study, however, requires recording together amongst a collection of individuals with subsequent reflection and analysis that depicts the collective experience. Judicious planning is important for ensuring good outcomes in educational research. During the planning
45
How to Organise an Educational Research Programme Within an Academic Surgical Unit
of educational research, issues such as research ethics, sampling, reliability and validity need to be considered at the very onset. Ethical issues may stem from the research design; the patients being studied, the procedures employed and the methods of data collection (observational or non-observational) and the nature of the participants (novices or experts) [4]. Researchers need to consider the rights of their participants when attempting to answer a research question (Table 45.1). Educational research can be qualitative or quantitative. In qualitative research, words are used to describe the outcomes, whereas in quantitative research, numbers are used to quantify results [3]. Qualitative research methods were developed in the social sciences to enable researchers to study social and cultural phenomena. Examples of qualitative methods include: (i) case study research – an empirical investigation of real-life clinical case, (ii) action research – learning throughout an individual’s case management through continual case discussion until competence (this can start pre-operatively in the clinic and go on to the post-operative period), and (iii) ethnography – the study of the attitude of trainees through a behavioural, social and cultural context. Quantitative research methods were developed in the natural sciences to study natural phenomena. Examples of quantitative methods accepted in the social sciences Table 45.1 Ethical codes for an educational research project [4, 10] Regional/Institutional ethical approval of the project needs to be organised, if required
599
and education include: (i) surveys, (ii) laboratory experiments, (iii) formal methods such as econometrics and (iv) numerical methods (mathematical modelling) [3]. The initiation and coordination of a successful educational research programme requires central leadership in directing various projects. This includes the management of administrative and organisational issues related to studies being carried out in the department. This involves more than simply rationalising the system already developed; it requires the formulation and implementation of processes which enable the research organisation to progress successfully. The leadership must facilitate the research organisation’s future capacity to adapt by articulating and reinforcing the paths along which innovative activity can proceed.
45.3 Key Points to Design a Surgical Educational Research Programme The design of an educational research programme is governed by the notion of fitness for purpose [4]. The rationale behind the research determines the methodology and design of the whole process. For example, if the purpose of a research programme is to find trainers’ or trainees’ views about a certain training module, then a quantitative or qualitative survey approach may be required. If, however, the effectiveness of certain training or assessment method is required, an experimental or action research model may be appropriate (Fig. 45.2).
Researcher needs to reveal his/her identity and background Informed consent (verbal/written) should be sought from all participants The purpose and procedures of the research needs to be fully explained to all the subjects
Study Population
Sample Randomized
Non-Randomized
The research should be as objective as possible: this includes all the steps from the design and conduct to the dissemination of research Experimental
Control
Feedback should be provided to the participants, if requested The outcome of research and its ethical consequences need be seen from the participants’ and institutions’ point of view
Measurement
Measurement
Dignity, confidentiality and interests of the participants should be valued and protected Possible controversial findings/outcomes need to be anticipated and handled as per institutional guidelines In case of ethical dilemmas arise, the researcher may need to consult research and ethics department of the institution and other researchers or teachers
Analysis
Analysis
Fig. 45.2 Design of a study to measure the effectiveness of a training or assessment modality
600
45.3.1 Pre-Requisites for Planning Educational Research A research question is the primary step for a research project. A good educational research question should be clear, subject-specific, potentially answerable, interconnected and relevant to the field. Before commencing an educational research programme, research issues related to planning, methodology, sampling, intervention and investigative sequences need careful consideration (Fig. 45.3). These factors are predictive for a successful research [11].
45.3.2 Sequence of Steps in an Educational Research Project 45.3.2.1 Defining the Need for Research: Literature Reviews, Surveys and Audits of Practice Educational research demands clarity of ideas and expressions. The key step is to make sure that a researcher understands the topic or question. The following questions should be addressed before initiating a research project: What are the main issues? What is already known about the topic (existing literature and audits) and what is the specific question that needs answering?
Fig. 45.3 Pre-requisites for planning an educational research project (adapted from reference [4])
K. Ahmed et al.
Research requirements should be based on the areas of importance where the validity and reliability of training and assessment modalities are unknown. Needs should be based on surveys, available literature and audits of practice. In surgical education, most of the research is focused on the development and validation of training and assessment modalities. With patient safety as a pivot of all these activities, the ultimate aim of surgical educational research is to produce competent surgeons.
45.3.2.2 Designing the Research and Methods of Data Collection Design is concerned with transforming a research question into a project [11]. Strategies and tactics such as methods and sampling selected for a research project depend on the type of research question that is being answered. Each research question needs to be relevant to the aim of the research and there should be a strong link to the existing literature. Methods and sampling strategies to answer the research question should be tailored to fulfil the purpose of the study. If any of these links is absent (Fig. 45.4), the strategy needs to be revised in order to answer the research question. The selection of a research method or methods is based on the type of information that is being sought. Studies can either be fixed-design or flexible-design
45
How to Organise an Educational Research Programme Within an Academic Surgical Unit
601
Table 45.3 Instruments for data collection [4, 11]
Theory / Literature
Purpose(s) Research Question / Aim
Methods
Sampling strategy
Fig. 45.4 Framework for research design of an educational project (adapted from reference [11])
Table 45.2 Explanation of types of research designs and data collection [11] Fixed research design – A research strategy where the research design is fixed prior to data collection. Most of the fixed-design studies are quantitative Flexible design research – A research strategy where the research design develops during the process of data collection. It always involves collection of qualitative data Qualitative data – Non-numerical data, e.g. interview response (personal opinion) Quantitative data – Numerical data, e.g. height, weight Qualitative and quantitative data behave differently and are therefore need to be studied differently
depending on the type of data collection (Table 45.2). Several validated types of instruments for data collection are available (Table 45.3). The selection and design of instruments are important and it is primarily based on the type of study. The strengths and weaknesses of these instruments need to be fully understood by the research team before instigating data collection.
45.3.2.3 Data Analysis An indispensible part of educational research is data collection. This is based on the types of investigation and methods. For educational research, data are collected in many forms such as audio/video recordings, test results, responses to questionnaires and minutes from meetings [11]. Data can be in the form of visual scales, numerals or text. The next step after data collection is “analysis”.
Surveys and Questionnaires – A series of questions and prompts for the purpose of collection of information from respondents. They provide structured and numerical data. Can be administered without a researcher. Often straightforward to analyse Interviews – A conversation between the interviewer and the interviewee where questions are asked by the interviewer to obtain information from the interviewee Observation – Observation offers an investigator the opportunity to collect “live/video-recorded” data from naturally occurring situations. Data can be collected against objectively defined criteria Tests – The test result can be qualitative (yes/no), categorical or quantitative (a measured value). It can be a personal observation (subjective) or the output of a precision measuring instrument (objective). What is being tested (e.g. achievement, aptitude, attitude, intelligence, etc.)? Are the tests parametric or non-parametric tests? Do they involve self-reporting or are they administered tests? Role-playing – Participation in simulated environment that are intended to throw light upon the role/rule contexts governing “real-life” events
Qualitative data analysis entails organising and explaining the data. It makes sense out of data in terms of the research participants’ explanation of the situation, noting patterns, themes, categories and regularities [4]. Qualitative data frequently focus on smaller groups than quantitative data. However, the data tend to be detailed and rich in information. Quantitative data analysis is often associated with large-scale research such as surveys and tests, but can also serve smaller-scale investigations, such as case studies, action research and correlation research. Numerical analysis can be performed using software such as Statistical Package for Social Sciences (SPSS®), Minitab® and Microsoft Excel®.
45.3.3 Writing a Project Proposal Good research demands clarity of ideas. A research proposal aims to provide evidence on clarity of thoughts and expressions. A good-quality proposal explains the whole process from the aims and design to the intended outcomes in a direct and unambiguous style. After assessing the requirement for research in an area of interest, an understandable report of the subject needs
602
to be drafted. Contents of a research proposal need to be structured in order to communicate the message across in an understandable and simple way (Table 45.4). Table 45.4 Contents of a research proposal [11] Abstract/Summary Brief, clear and informative Background and purpose Concise review of relevant work by others Mention any gaps in literature Mention previous/pilot work (if applicable) to prepare for the study Mention that current project proposal is about a novel work (if applicable) Clearly mention what current work will add to the literature Primary and secondary (if applicable) aims of a research project Good-flow of the text to lead the reader inexorably towards the conclusion Plan of work Details about methods or procedures to be used Clear about research site and participants (students, doctors, patients, allied healthcare staff) Mode of data collection How data will be analysed Data handling and confidentiality issues need to be mentioned clearly Financial aspects/Resources required Mention clearly who is going to pay for the research Details about any personnel (salary) or technical support Time frames and sequence Provisional time frame for research work needs to be very clear Ethical implications Research needs to fall within an appropriate “code of practice” When applicable, proposal requires appropriate ethical vetting Ethical issues and ownership of the research need to be clearly documented (e.g. informed consent, anonymity, confidentiality, non-traceability, beneficence, right to refuse/withdraw, respondent validation, research subjects, social responsibility, honesty and deception) Intended outcomes Expected research outcomes need to be clearly documented Clear note about dissemination strategies of the completed research project
K. Ahmed et al.
45.4 Factors Influencing Surgical Educational Research The vast majority of surgeons are developing an interest in education and educational research; however, it is the minority who have educational degrees and qualifications and have entered the speciality with the intent to perform educational research. This is mainly due to the fact that surgery is a craft-based speciality and, as such, its participants are also concerned with being trained clinically and technically. In addition, the increasing pressure on the hours of practice of surgeons at all levels means that there is less opportunity to perform research. As a result, strategies from major stake holders in academia and service delivery are needed to facilitate the production of additional posts to allow training in a craft speciality with the allocation of dedicated time for research. This will further provide an “academic pull” for people who would ordinarily be distracted from venturing into this field. Recently, in the UK, the “Walport Report” highlighted the need for the improved opportunities to develop academic medicine and this liberated funding and opportunities to set up post at different levels of training that allowed for training and academic research and development. One of the areas of need highlighted was medical education [12]. In the programme suggested by Walport, surgeons can train from their third postgraduate year through to consultancy, as well as obtain a higher degree in basic science or medical education [12]. It remains to be seen if this will yield success, but does give an indication of the increased importance of medical education on the agenda of universities and the government. Multiple factors such as availability of appropriately qualified academic support for trainees, collaboration among various academic specialties and availability of sufficient time for research may have a bearing on the outcome of academic/educational research.
45.4.1 Academic Support Support from faculty and the departmental heads is one of the most fundamentally important factors for the success of surgical educational research. In units where educational development and research has been high on the academic agenda, it has been recognised
45
How to Organise an Educational Research Programme Within an Academic Surgical Unit
that an educational arm leads to an academically robust department. Progressive departments have directed energy and resources towards the enhancement of educational practice, leading to the emergence of a formal funded surgeon-educator role and practising surgeons with a scientific approach to teaching and learning. Formal academic training in education may aid in this process and validates the practice of both individuals and departments.
45.4.2 Credibility As mentioned previously, education-in-general and surgical educational research are not directly linked to basic science research and clinical trials. It is important for the academic programme and the department that surgical educational research activities are cuttingedge and of the highest quality. One of the most notable negative factors in the delivery of successful surgical educational research is the perception of a lack of academic credibility by peers. This has been traditionally compounded by fewer available funding opportunities in educational research. However, more recently, medical education is now enjoying increased status and there are more opportunities for funded research [9].
45.4.3 Collaboration Multi-disciplinary collaborations are typically necessary for a surgical research education programme to succeed. This may occur within a department, within a campus or faculty, across different institutions and also with other stakeholders. The nature of education is that multiple factors can be used to influence and enrich experience; therefore, collaboration with different interested parties will allow meaningful study design and interpretation of results. For example, if we are investigating the education of a surgeon performing a simple task in an operating theatre, the nature of the task dictates that many parameters that will influence the outcome. There are personal skills, interactions with others, communication skills which all contribute to competence [5, 7, 13]. Decision-making and ethics can be examined which are affected by underlying psychological aspects [14]. If a simulator is being used,
603
then collaboration with industry and basic scientists and engineers is required to analyse the problem and make meaningful comments regarding the results.
45.4.4 Time Education and educational research cannot be delivered effectively as a sideline [6]. Time allocation with the working week of the department and individuals involved are vital for its success. Individuals taking part in education and educational research programmes need to have this as their primary academic role and have funded sessions set aside for this. In this way, the objectives will be achieved and the changes relating to clinical practice can be implemented and translated.
45.5 Conclusions Educational research in surgery, if performed within a fertile department which embraces its principles and views it on equal grounds with other academic activities, can have a major role in achieving tenure. Teaching alone is not sufficient; it must be accompanied by academic work which includes innovations in teaching, methodology, curriculum design and development of assessment techniques. Active pursuit of funding and faculty support will lead to improved credibility and an environment conducive to high-quality research. Sound theory and study design is critical. The future and development of an academic surgical department will rely on the presence of a critical group of surgical educators within the institution to aid the establishment of any surgical research programme. This will then allow academic cross-fertilisation and collaboration. The future of academic research in surgery depends on the continuing efforts towards the development and validation of surgical training and assessment tools in pace with developing technology.
References 1. Ahmed K, Ashrafian H (2009) Cuts to research funding could hurt health care too. Nature 458:29 2. Albert M, Hodges B, Regehr G (2007) Research in medical education: balancing service and science. Adv Health Sci Educ Theory Pract 12:103–115
604 3. Berry J (2006) Quantitative Methods in Education Research. Available from: http://www.edu.plymouth.ac.uk/resined/ Quantitative/quanthme.htm 4. Cohen L, Manion L, Morrison KRB (2007) Research methods in education, 6th edn. Routledge, London 5. Darzi A, Mackay S (2001) Assessment of surgical competence. Qual Health Care 10(Suppl 2):ii64–ii69 6. Jackson DW (2001) The orthopaedic clinician-scientist. J Bone Joint Surg Am 83:131 7. Moorthy K, Munz Y, Sarker SK et al. (2003) Objective assessment of technical skills in surgery. Qual Saf Health Care 327:1032 8. Norman G (2002) Research in medical education: three decades of progress. BMJ 324:1560–1562 9. Peracchia A (2001) Surgical education in the third millennium. Ann Surg 234:709 10. Reynolds PD (1979) Ethical dilemmas and social science research: an analysis of moral issues confronting investigators in research using human participants. Jossey-Bass, San Francisco
K. Ahmed et al. 11. Robson C (2002) Real World Research: A Resource for Social Scientists and Practitioner-Researchers, 2nd edn. Blackwell Publishers, Oxford 12. UK_Clinical_Research_Collaboration (2005) Medicallyand dentally-qualified academic staff: recommendations for training the researchers and educators of the future. Report of the Academic Careers Sub-Committee of Modernising Medical Careers and the UK Clinical Research Collaboration March 2005. Available from http://www.nccrcd.nhs.uk/ intetacatrain/index_html/copy_of_Medically_and_ Dentally-qualified_Academic_Staff_Report.pdf 13. Undre S, Sevdalis N, Healey AN et al. (2007) Observational teamwork assessment for surgery (OTAS): refinement and application in urological surgery. World J Surg 31: 1373 14. Yule S, Flin R, Paterson-Brown S et al. (2006) Non-technical skills for surgeons in the operating room: a review of the literature. Surgery 139:140
46
How to Structure an Academic Lecture Bari Murtuza and Thanos Athanasiou
Contents 46.1
Definition of an Academic Lecture ....................... 605
46.2
Types of Academic Lecture ................................... 606
46.3
Principles of Lecture Format and Structure ....... 606
46.4
Lecture Planning and Delivery ............................. 606
46.5
Practical Aspects of Lecture Delivery .................. 608
46.6
Newer Aspects of Academic Lecturing ................ 609
46.7
Conclusions ............................................................. 609
References ........................................................................... 610 Web Links ........................................................................... 610
B.Murtuza () The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
Abstract This chapter is concerned with the overall nature of an academic lecture, its importance and context and its place in surgical education and practice. First we define what an academic lecture is, and then consider the types of academic lecture as well as its overall structure and purpose. We also describe the essential ideas of the lecture format using principles from linguistics and use this as a platform for describing the planning and delivery of a lecture for a variety of forums.
46.1 Definition of an Academic Lecture A lecture is a format that allows for the imparting and exchange of ideas and knowledge. The first lectures began some centuries ago as literal readings by a master followed by their interpretation of a given text [1]. This evolved into an important means of information transfer and exchange for students, professionals and society. In the 1800s, medical lectures took the form of anatomical and surgical demonstrations in grand theatres with large audiences, such as those at Padua and Leyden, delivered by distinguished anatomist–physician investigators including William Harvey and Theodor Billroth. Surgery concerns the restoration of anatomical structure and function of the body in a diseased state. The surgical art also extends to influencing form as an end in itself with aesthetics, although one could argue that this serves a psychological “function”. As such, an academic surgical lecture should emphasise the context of the imparted material in this regard, even though the remit of the lecture may cover aspects ranging from pure anatomical description, operative surgical technique, surgical patient care and outcomes to basic science, new technological advances such as telesurgery and robotics or economic and logistic aspects of health care delivery.
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_46, © Springer-Verlag Berlin Heidelberg 2010
605
606
The 1910 Flexner review of the medical school education system in the United States found that the system was “lacking in scholarship and intellectual rigour” and recommended that medical schools be linked with universities to promote scholarship and scientific research within the faculties [2]. In a purely educational context, the lecture has been defined as the “formal presentation of content by the educator for the subsequent learning and recall in examinations by students” [3]. Lectures used at universities have served as the basis for the introduction of academic rigour at medical schools and are now a key mechanism of information transfer in the modern era of globalisation of medical care.
46.2 Types of Academic Lecture Academic surgical lectures may concern both clinical practice and/or clinical and basic scientific research. Lectures may be delivered in the context of a university course as a single lecture or one of a series, or may be delivered at an institution as an invited talk or as part of a surgical meeting, whether local institutional, national or international. Some surgical lectures are named lectures, and the lecture presented is therefore in the spirit of this name. Such are the Hunterian Lectures of the Royal College of Surgeons of England named after John Hunter, one of the founders of the scientific approach to surgery. In the current era of globalisation, the World Wide Web and the free and rapid electronic exchange of information, lectures may also be delivered through live links. Indeed, this has brought back the concept of the operating room as a surgical lecture theatre as in the bygone days of Billroth and Harvey [4]. Academic lectures may be classed according to the type of lecture being delivered: expository, interactive, case study – this latter being used to illustrate a principle or as a problem-solving strategy [5].
46.3 Principles of Lecture Format and Structure A lecture may be considered a type of argument [6], and this requires a logical structure that must be tailored according to its purpose, context and target audience. Particular attention should be given to the key points to
B. Murtuza and T. Athanasiou
be made. There should be a “balance between breadth and depth of the material and between monotony and self-indulgence” and broad overall concepts should precede more specific ideas and examples that “build on or explicate those concepts” (University of Minnesota – see Web Links). An academic lecture may present facts, analysis, opinion, a synthesis of ideas as well as controversy. These may be systematically covered during the lecture and must be done so with structure. Indeed, the structuring of a lecture has been described as an essential aspect of its comprehensibility [7]. One must state the surgical context of the lecture and a clear background is important in this regard, particularly for presentations covering basic scientific research. Consideration should always be given to the implications of the work presented for surgical practice, healthcare delivery and patient outcomes. Categorisation of sequential segments of the whole in turn determines how presented concepts and facts are processed, learnt and integrated by the audience [8]. The logical sequence of the lecture can be organised in a variety of ways: topical, causal, sequential, symbolic/graphic, structural and problem/solution [9]. Each segment of a lecture is an “exposition”, with these expositions determining the so-called “macrostructure” of the presentation. The “micro-structure”, in turn, relates to focal episodes within each exposition, with a prescribed function as in the concluding remarks section [10] (Fig. 46.1).
46.4 Lecture Planning and Delivery Understanding the formal structure of a lecture facilitates better understanding, planning and delivery to the target audience. In “Advice to a lecturer”, the great scientist and communicator of science to the public, Michael Faraday, described the following: on the lecturer, “his thoughts … and his mind clear from the contemplation and description of his subject”; on diction, “a lecturer should endeavor … to obtain … the power of clothing his thoughts and ideas in language smooth and harmonious and at the same time simple and easy. His periods should be complete and expressive, conveying clearly the whole of the ideas” (cited by Murray [11]). The presentation is fashioned according to the subject, audience and forum. If the lecture concerns a specific clinical or scientific study, the structure may follow that of a research paper:
46 How to Structure an Academic Lecture Fig. 46.1 An overview of the formal structure of an academic lecture. The overall themes of a lecture are its type, style and structure. The structure, in turn, consists of elements of macro-structure (A–D) linked together in one of several possible types of logical sequence as indicated. Transitions between segments may be marked by Flower dew type “signpost” definitions. Linguistic constructs within the micro-structure episodes within macro-elements include use of complex nominal groups (CNGs), ellipsis or embedded subordinate clauses (ESCs). The lexical density in each of these construct types influences the comprehensibility of the overall episodes and lecture. As an example, a keynote lecture on robotic surgery for a surgical conference might by expository in type, with a reading style and a topical or sequential logical sequence indicated by formal attribute/ property signpost definitions such as: definition of surgical robotics, types of robotic system, and indications for robotic assistance in surgery
607 >Expository >Interactive type >Case–study
reading< conversational< rhetorical<
style structure
Topical/ causal Problem– solution
Logical sequence
structural
sequential
Symbolic/ graphic
Macro-structure: segments or expositions A
B
C
D
= signpost definitions
Flowerdew types ostensive
substitution
Formal/semi-formal • Behavior/process/function • Composition/structure • Location/occurrence • Attribute/property
introduction, methods, results and discussion (IMRaD). There is therefore a logical structure and progression. Usually, one should establish up to five key points to be taken away during a lecture. As attention span is approximately 10–20 min, for a lecture of longer duration, each of these intervals should be marked by a change of pace or theme [12]. One technique of achieving this is to use “signpost definitions”. Definitions in academic lectures have been classified into at least four Flower dew types: formal, semi-formal, substitution and ostensive. Formal and semi-formal definitions
Micro-structure: episodes
Linguistic constructs: Lexical density, CNGs, ESCs
may be further classed into behaviour/process/function, composition/structure, location/occurrence, attribute/property [13] & see Figure 46.1. One may consider a lecturer to have an implicit contract with the audience. The presenter should be clear and intelligible and “present ideas clearly and persuasively, with self-assurance and skill … come across as … person who has respect for the audience and a clear, insightful mind” [14]. At least three styles of lecture delivery may be described: reading, conversational and rhetorical [15]. In practice, style may vary within a
608 Fig. 46.2 Lexical density and word speed for academic lectures in different disciplines [15]
B. Murtuza and T. Athanasiou 60
55
50
Density 45
Science
40
Social Science Humanities
35
30 0
50
100
150
200
250
Words per Minute
single lecture according to the macro-structure or micro-structure at that point. Further, delivery style may be influenced by the culture and environment in the context of lecturing abroad as well as according to academic discipline. Indeed, professional medical or scientific speakers appear to use a high density of words with independent meaning (lexical density). Lexical density may be regarded as an indicator of propositional content and complexity [16] as it expresses meaning more succinctly though linguistic structures such as complex nominal groups, ellipsis and embedded subordinate clauses [15]. The lexical density of an academic lecture is related to the speed of delivery and, in general, the speed decreases as the density increases, with an average speed for lectures of 125 to 160 words per minute compared with 190 to 230 for normal conversation [17]. This is illustrated in Fig. 46.2.
46.5 Practical Aspects of Lecture Delivery One should follow the general principle of “say what you are going to say, say it and then say what you have said” [18]. Lectures may be delivered from notes or index cards with prompts and indicators. Lecture note handout supplements may be prepared if required and can follow a number of formats including an outline,
list of key points, summary diagram or mind map. One should use memorable examples and short, concise sentences with appropriate “signposts” to mark transitions and structure as described above. Slides are often in PowerPoint™ (®Microsoft Inc., CA) format, although one author has remarked that “PowerPoint is the triumph of the quick fact over the art of argumentation” [6]. It is a fact, however, that electronic preparation and presentation of lectures is a standard essential tool used internationally and allows for great flexibility in modifying slides up until the last minute or during travel to a meeting. Slides should be clear and self-explanatory without overuse of abbreviations and acronyms and should contain no more than three points or five lines of text. A clear and elegant technique is to have a running head in the same quadrant of every slide for the title of the talk. At a very practical level, a text font such as Arial should be used with at least 16-point-sized characters. Use of bold, colour and slide effects should be used where they enhance the presentation rather than distract and detract from it. Importantly, the logistics of the lecture will influence delivery in terms of use of various media and potential for audience participation and interaction. Key takeaway points should be emphasised more than once and should be summarised at the end. In this way, with an attention-grabbing introduction and conclusion which relate to each other, the talk is “book-ended” (University of Minnesota – see Web
46 How to Structure an Academic Lecture
609
Links). It is of course critical to rehearse the talk with peers and colleagues and this facilitates fluent delivery and allows one to identify difficult transitions, hard-to-vocalise ideas and convoluted lines of reasoning [14]. One should allow time for questions if appropriate. It is often useful to restate and paraphrase the question for the audience. Finally, hostile questions should be deflected in a polite, non-confrontational manner. Fig. 46.4 Live-link operating room as a lecture theatre
46.6 Newer Aspects of Academic Lecturing There is a three-way relationship between academic research, education and surgical practice and health care delivery (Fig. 46.3). Thus, academic research can strengthen medical and surgical education through scholarship and rigour [2] and changes in health care delivery can also impact upon the structure of surgical education in academic departments of surgery [19]. Although there has been a rapid implementation of new communication technologies such as videoconferencing and live-link surgery into clinical practice, this has not yet fully infiltrated into the sphere of information transfer through lectures [4]. Surgical educators must therefore adapt this technology for the purposes of web availability of content, live-streaming of lectures or similar formats such as “PodCasts”. Initial studies have indicated that tele-delivery of lectures (illustrated in Fig. 46.4) is readily accepted by surgical students [4, 20].
Further adjuncts to conventional lectures include use of multimedia computer programs [21]. Certain lecture theatre venues may be equipped with electronic handsets at each audience seat which enable interaction and electronic voting responses to be made. These systems are useful for academic courses, university courses and help maintain attention and provide immediate feedback and reinforcement for audience participants. Computer presentations also enable the use of embedded video clips in formats with compression ratios that allow for portability of files on flash memory sticks. Adoption of computer-enhanced technologies into the sphere of academic lecturing particularly lends itself to the dissemination of new surgical techniques to units far afield, allows for the exchange of ideas and also enables distance learning and reinforcement for surgical residents and trainees. Such methodologies may be a vital component in the globalisation of medicine and spread of expertise to developing countries in particular.
46.7 Conclusions Surgical practice & healthcare delivery
Research
Education
Fig. 46.3 Interrelationships in surgical care and academia
The academic lecture in surgery is a key channel for imparting and disseminating ideas both in a university setting, enhancing the scholarship and rigour of medical education, and at surgical meetings. Understanding the structure of a lecture in a formal sense enables one to realise the purpose of the lecture and serves as a platform for effective planning and delivery of an exposition. Newer, computer-based audiovisual technologies have greatly facilitated lecturing and serve to open up the sphere of novel surgical techniques and knowledge to departments across the globe.
610
References 1. Sullivan RL (1996) Delivering effective lectures. JHPIEGO Strategy Paper#5, US Agency for International Development. Availableat:http://www.reproline.jhu.edu/english/6read/6training/ lecture/sp605web.pdf 2. Debas H (2000) Medical education and practice: end of century reflections. Arch Surg 135:1096–1100 3. Vella F (1992) Medical education: capitalizing on the lecture method. FASEB J 6:811–12 4. Gul YA, Wan ACT, Darzi A (1999) Undergraduate surgical teaching utilizing telemedicine. Med Ed 33:596–599 5. Frederick PJ (1986) The lively lecture – 8 variations. Coll Teach 34:43–50 6. Germano W (2003) The scholarly lecture: how to stand and deliver. Chronicle High Educ 50:B15 7. Chaudron C, Richards JC (1986) The effect of discourse markers on the comprehension of lectures. Appl Linguist 7: 113–127 8. Middendorf J, Kalish A (1996) The Change -Up in lectures Natl Teach Learn Forum 5:1–4 9. Gross Davis B (1993) Tools for teaching. Jossey-Bass, San Francisco 10. Young L (1994) University lectures – macro-structure and micro-features. In: Flowerdew J (Ed) Academic listening: research perspectives. Cambridge University Press, Cambridge, pp 159–176 11. Murray R (1992) Faraday’s advice to the lecturer. Anal Chem 64:131 12. Johnstone AH, Percival F (1976) Attention breaks in lectures. Educ Chem 13:49 13. Lessard-Clouston M (2006) Definitions in academic lectures: a preliminary report. In: Kline M, Anderson G (eds) Proceeding of the CATESOL state conference. Orinda, CA 14. Garland JC (1991) Advice to beginning physics speakers. Phys Today 44:42–45
B. Murtuza and T. Athanasiou 15. Nesi H (2001) A corpus-based analysis of academic lectures across disciplines. In: Coterill K, Ife A (eds) Language across boundaries. BAAL-Continuum, London, pp 201–218 16. Ellis R (1994) Factors in the incidental acquisition of second language vocabulary from oral input: a review essay. Appl Lang Learn 5:1–32 17. Tauroza S, Allison D (1990) Speech rates in British English. Appl Linguist 11:90–105 18. Lowe D (1989) How to do it: lecture overseas. BMJ 298:174–175 19. Dunnington GL, DaRosa DA (1994) Changing surgical education strategies in an environment of changing health care delivery Systems. World J Surg 18:734–737 20. Stain SC, Mitchell M, Belue R et al (2005) Objective assessment of video conferenced lectures in a surgical clerkship. Am J Surg 189:81–84 21. Seabra D, Srougi M, Baptista R et al (2004) Computer aided learning versus standard lecture for undergraduate education in urology. J Urol 171:1220–1222
Web Links 1. University of Minnesota Center for Teaching and Learning. Available at http://www1.umn.edu/ohr/teachlearn/tutorials/ lectures/planning.html 2. Delivering effective Lectures: US Agency for International Development. Available at http://www.reproline.jhu.edu/ English/6read/6training/lecture/sp605web.pdf 3. Barbara Gross. Tools for teaching. University of California, Berkeley. Available at http://teaching.berkeley.edu/bgd/ largelecture.html
How to Write a Book Proposal
47
Christopher Rao and Thanos Athanasiou
Contents 47.1
Introduction ............................................................ 611
47.2
The Book Proposal ................................................. 611
47.2.1 47.2.2 47.2.3 47.2.4 47.2.5 47.2.6
The Synopsis or Overview ....................................... The Author(s) ........................................................... Market Analysis ....................................................... Competitors .............................................................. Annotated Table of Contents .................................... Sample Material .......................................................
47.3
The Cover Letter .................................................... 613
47.4
Finding a Publisher ................................................ 613
47.5
After the Proposal Is Accepted ............................. 614
47.6
Sources of Additional Information ....................... 614
612 612 612 613 613 613
References ........................................................................... 614
C. Rao () Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
Abstract The book proposal is a document that enables a publisher to make decisions about whether or not to publish a book. A proposal can also be a useful tool when planning the writing and structure of a book. Consequently, the importance of a well-written, professional book proposal cannot be underestimated. In this chapter, we describe the essential elements of a book proposal. We also discuss how to choose a publisher and what to do when the book proposal is accepted.
47.1 Introduction If you feel that you have an original marketable idea for a medical book, the first stage is to write a book proposal [1]. The book proposal is the document that enables a publisher to determine whether the idea justifies investment of the time and resources required to publish it [2]. Furthermore, it can be a useful tool when planning the writing and structure of a book. Consequently, the importance of a well-written, professional book proposal cannot be underestimated. In this chapter, we describe the essential elements of a book proposal. We also discuss how to decide which publishers to send the book proposal to and what to do when it is accepted. The process of actually writing the book lies outside the scope of this chapter; however, numerous texts have previously addressed this subject [1, 3–5].
47.2 The Book Proposal The book publishing industry is ultimately driven by profit [6]. Consequently, it is useful to think of the book proposal as a business plan [2]. Publishing a book requires
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_47, © Springer-Verlag Berlin Heidelberg 2010
611
612
investment of significant resources. The role of the book proposal is to demonstrate to a publisher that book sales would justify investment of these resources [2]. Different publishing houses prefer the book proposal to be formatted slightly differently; some even require the use of a rigid pro forma. Guidance is usually available on the publisher’s website. Generally, however, the proposal should be double-spaced. The length can be highly variable but 15 to 30 pages excluding the sample material should be sufficient [2], as long as the proposal includes the following elements [6]: • The synopsis or overview should briefly detail the aims of the book and what makes it different and superior to other competing books. It should include the approximate length of the book, and the approximate number and type of illustrations. It should also include a schedule of completion, especially when there are multiple authors or editors [4]. • The proposal should include biographical details of the author(s) or editor(s), mentioning any previously published work. • A market analysis, describing the appeal and relevance of the book to the target audience. • A description of competing books, their strengths and weaknesses. • An annotated table of contents, describing the material covered by each chapter. • Sample material, usually a sample book chapter.
47.2.1 The Synopsis or Overview The synopsis, overview or executive summary is an introduction to the book, summarising its subject, content and unique features. This synopsis enables the publisher to briefly absorb key aspects of the book proposal before they are expanded in later sections [6] and is consequently very important. It has been suggested that the tone of the synopsis should resemble that of a book review or the contents of a book jacket [2]. It should be approximately two pages long and include the following elements [4, 6]: • A statement of the purpose, key features and themes of the proposed book, much like the preface of a book. • A paragraph or bulleted list of what makes this book unique from a publisher’s perspective. • Estimates of the number and type of figures, number of tables and number of pages.
C. Rao and T. Athanasiou
• An estimate of the time needed to complete the book. • A maximum of two sentences about the author(s) or editor(s). • A short paragraph (4 to 5 sentences) describing the books target market, and summarising its appeal to this target market. • A further short paragraph (4 to 5 sentences) comparing the book to key competitors. • Finally, a single paragraph summarising the synopsis, emphasising the marketability and reiterating key competitive advantages of the book.
47.2.2 The Author(s) This section of the book proposal should not be the curriculum vitae of the author(s) or editor(s), but should be focused on demonstrating to the publisher that: 1. The author(s) or editor(s) are qualified to write or edit the book. 2. They will complete the book in the specified time. It should briefly detail the education and experience of each editor and author, focusing particularly on previously published work. Reviews of previously published books can also be included in this section [2, 6]. The concern of the publisher as to whether each editor or author will complete their allocated sections, in case of multiple author(s) or editor(s), should be addressed in this section [4]. While the primary responsibility for promoting and marketing the book rests with the publisher, a publisher would be keen to know what role (if any) the author(s) or editor(s) can play in marketing the book [2].
47.2.3 Market Analysis The proposal should describe the primary target market for the book and explain why the book would appeal to readers in this market. The size of the market should be estimated in a detailed and quantitative fashion [2]. A minimum print run of 1000 copies is needed if the book is to be affordable. The potential market needs to be about ten times larger than the target sales [6]. Depending on whether the book is a primary text, supplementary text or reference text, the potential market may need to
47 How to Write a Book Proposal
be even larger. Consequently, most postgraduate textbooks will need to have international appeal. If the book will be a recommended text for a course, this should be included in the proposal along with estimates of the number of participants. This section of the proposal should also include details of how the primary target market can be reached. For example, it should include magazines and journals aimed at the same market, conferences and meetings on the subject, and research groups and university departments with an interest in the subject. The proposal should also include details and the size of any realistic secondary or ancillary markets [6]. For example, some sections of a textbook on surgical research methodology may also be useful to those engaged in other areas of medical research.
47.2.4 Competitors The book proposal should detail all books on the same subject or closely related subjects. It should list the title, author, publisher, year of publication, price, number of pages and synopsis of each one. It should describe the strengths and weaknesses of each book and explain why the proposed book is better.
47.2.5 Annotated Table of Contents The proposal should contain a paragraph or page describing each chapter including details of other materials, such as tables and figures, which will be included in each chapter. If the book will be edited, names, positions and affiliations of potential contributing authors can be included in this section. This part of the book proposal shows the publisher that there is sufficient material for each chapter. It also shows that the author(s) or editor(s) have a firm grasp of the subject material and have planned exactly how they will present the material [2].
47.2.6 Sample Material The final element of the book proposal is the sample material, which usually takes the form of a sample chapter. It should be fully worked and representative of
613
the book. The aim of this section is to give the publisher confidence in the writing ability of the author(s). It should not contain the completed manuscript as this will damage rather than enhance the chances of the book being accepted [6].
47.3 The Cover Letter A cover letter (or e-mail), although not usually considered part of the proposal, should accompany every proposal. Some publishers may prefer to receive a letter before they are sent a book proposal. This information is usually available on their website. As it is the first thing that a publisher will read, it is of paramount importance. The letter should be approximately one page in length, addressed to the commissioning editor in the relevant subject area and cover the following elements [2]: • It should describe the subject of the book and detail its key features. • It should show that the author(s) or editor(s) are qualified to write about it. • With reference to the important points from the market and competitor analysis, the letter should explain why the book would be successful. • Finally, it should state what is wanted, whether this is an invitation to send a book proposal (if the letter does not accompany a book proposal) or a publishing contract.
47.4 Finding a Publisher Finding an appropriate publisher is difficult. Often even medical publishers have different specialist interests. Before approaching a publisher assess whether [4] they publish books: 1. On the same subject as the proposed book. 2. Of the same type as the proposed book; for example, reference texts, revision texts or clinical textbooks. 3. In the same market as the proposed book (geographical region, undergraduate or postgraduate). The International Association of Scientific, Technical and Medical Publishers have a comprehensive list of medical publishers on their website (www.stm-assoc.org); there are also several directories of publishers that contain comprehensive lists [7, 8].
614
If a proposal is submitted to several publishers simultaneously, it is considered courteous to inform them. The decision time is shortened and publishers are placed in a bidding situation. Publishers, however, do not like multiple submissions and may be less likely to consider a book proposal submitted to several other publishers. It is also difficult to submit revised proposals elsewhere. The idea is also more vulnerable to being copied because more people are aware of it [6].
47.5 After the Proposal Is Accepted When a book proposal is accepted, the author(s) or editor(s) will enter into a contractual agreement with the publisher. The contract will address issues such as the royalty or fee (generally the author takes 10% of the cover price [9]); copyright; delivery date; length of manuscript; number of figures and how they are presented; liability for accuracy of manuscript; libel and reversion rights [6]. While many academic organisations provide legal support for publishing contracts, if this is not available it is important to consult a lawyer, or the Society of Authors (www.writers.org.uk/society), who gives practical advice on authorship to members as all aspects of the contract must be understood and agreed. Once the contract has been agreed, the hard work of writing or editing the book begins. The description of these procedures lies outside the scope of this chapter; however, several texts give an overview of some of the pitfalls of actually writing or editing a book [2, 4]. There are also more comprehensive guides to medical
C. Rao and T. Athanasiou
publishing [3], writing style [5] and the practical aspects of publishing [7, 8].
47.6 Sources of Additional Information The International Association of Scientific, Technical and Medical Publishers is an international organisation of approximately 100 scientific, technical and medical publishers. Their website contains a comprehensive list of medical publishers. www.stm-assoc.org The Society of Authors provides practical support and advice for authors. www.writers.org.uk/society
References 1. Albert T (2000) How to become a book author. BMJ Career Focus 320:S2-7237–S2-7237 2. Alder B (2000) The literary agent’s guide to getting published and making money from your writing. Claren Books, Washington, DC 3. Albert T (2000) An A-Z of medical writing. BMJ Books, London 4. Eccles S (2002) How to write or edit a book. BMJ Career Focus 324:S81a 5. Strunk W, White EB (2000) The elements of style. Allyn and Bacon, Needham Heights, MA 6. Banks M (1998) How to do it: get your book published. BMJ 317:1715–1718 7. Turner B (ed) (2006) The writer’s handbook 2007. Macmillan, London 8. Rankin I (2008) Writers’ & Artists’ Yearbook 2009, A & C Black, London 9. Coales U (2005) Publishing a medical book. BMJ Career Focus 330:231
How to Organise a Surgical Meeting: National and International
48
Bari Murtuza and Thanos Athanasiou
Contents 48.1
Purpose .................................................................... 615
48.2
Principles of Congress Organisation .................... 616
48.3
When and Where.................................................... 617
48.4
Constructing the Budget ........................................ 618
48.5
The Meeting Programme ....................................... 619
48.6
Planning the Meeting Sessions .............................. 620
48.7
The Panel Discussion.............................................. 620
48.8
Seminars and Workshops ...................................... 620
48.9
The Business Meeting ............................................ 621
48.10 The Conference Secretariat ................................... 621 48.11 Trade and Exhibition ............................................. 621 48.12 Computing, I-T and A-V ....................................... 622 48.13 Telecast Meetings.................................................... 622 48.14 Evaluation and CME ............................................. 623 48.15 Conclusions ............................................................. 623 References ........................................................................... 623 Web Links ........................................................................... 624
B. Murtuza () The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
Abstract In this chapter, we consider the surgical meeting, its place in surgical education and training and its purpose. We outline important aspects of meeting organisation such as selecting a time and place for the meeting, how to plan the meeting budget and how to select the programme and run sessions. We conclude by discussing the importance of surgical meetings in the context of continuing medical education (CME).
48.1 Purpose In planning a congress, the first stage is to consider the overall theme for the meeting and its purpose. This will help establish the equivalent of a corporate identity for the meeting at the outset, which can be used for all conference materials, correspondence, delegate packs, etc. A surgical meeting, in particular, may be orientated around live demonstrations of surgical techniques, may be directed toward new technologies or may be a more conventional scientific sessions-type meeting with basic scientific and clinical research components as well as aspects concerning surgical training, education and practice. The theme and purpose in turn help determine the time and place for the meeting in relationship to other meetings. The meeting should fulfil a need that is not met by other congresses, though there may, of course, be partial overlap. A conference offers the opportunity to bring together people from the surgical field who probably do not often get the chance to meet and interact. A surgical meeting therefore promotes networking and exchange of ideas, as well as helping attendees keep up to date with developments in the field. Organising a meeting requires an understanding of these goals together with a clear prescription for what is to be gained by delegates and others involved in
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_48, © Springer-Verlag Berlin Heidelberg 2010
615
616
planning or attending. Indeed, the congress has been described as a ‘balance between stakeholders’ comprising the organisers, presenters, exhibitors and the individual (Zylstra; see WebLinks). A conference is an important forum for peer review and is an opportunity to gain attention for one’s work. It is also a continued channel for maintaining the human side of medicine where an ever-increasing proportion of one’s peopleinteractions is conducted electronically [1]. The international meeting, in particular, can help strengthen the ‘intellectual relationship’ between colleagues from different nations [2]. One must emphasise that a congress is a forum for a two-way interaction between the membership and leadership, between the presenter and audience. It is an opportunity to meet experts in the field and to initiate new collaborations. The scientific quality is a key goal – one should invite top speakers and encourage high-quality exchange of ideas [3]. That said, however, one should aim to strike a balance between known and unknown people – so-called ‘optimal unfamiliarity’ (Zylstra; WebLinks). This applies to all speakers and presenters, though keynote speakers should be leaders of their respective fields. Developing new collaborations is an important aspect of conference attendance, and this is partly dependent on the size of the meeting. Although several encounters are required to establish and foster a new working relationship between parties, the congress encounter is often a vital first one. Smaller meetings tend to offer more possibilities for interaction and may allow for a certain topic to be examined in depth, though scientific creativity may be dulled through a lack of unscheduled time for socialising and reflection. The smaller meeting may also be ‘inbred to the point of intellectual incest’ [1]. In comparison, a larger meeting, while covering areas in perhaps less detail, offers delegates the chance to broaden their knowledge base by sampling from fields not directly their own.
48.2 Principles of Congress Organisation Organising a meeting requires a great deal of commitment and planning. Werner and Kenefick have described the natural ‘life cycle’ of a conference with stages of the proposal, securing support, implementation and evaluation of outcomes [4]. From conception,
B. Murtuza and T. Athanasiou
there should be an overall chairperson of the organising committee (OC) responsible for coordinating all aspects of the meeting (Fig. 48.1). The OC should determine the theme, purpose and proposed key messages of the meeting as well as the plan for the meeting. Determining the overall objectives requires that the OC have a complete vision of the final product [4]. Planning requires attention to detail, commitment, preparation and teamwork. It is useful to seek advice from those who have previously organised the same or similar meetings. At the outset, the Accreditation Council for CME in the United States (ACCME; USA) requires details of the proposed programme, faculty, length and pre- and post-session evaluation formats. For large meetings, one may use the services of a professional congress-organising company. The ideal meeting length may be from 2.5 to 5 days, although single-day symposia can be equally valuable if the theme is focused enough. It is vital to begin planning as early as possible, and a large international meeting running over several days will often require 2 years of planning (Fig. 48.2). The conference secretary and administrative staff as well as the OC and subcommittees should be appointed as early as possible [5]. The chairperson should delegate tasks appropriately according to the time and resources available to each member of the planning committee and make full use of the strengths of particular individuals. The subcommittees should meet with the overall OC on a regular basis to update with progress reports. Subcommittees for the scientific programme, social events, finance, audiovisual (A-V) and information technology (I-T), trade and exhibits (T&E) and lodging are usual requirements [6]. The meeting secretary will liaise closely with the chairperson and local OCs and chairmen of various subcommittees (Fig. 48.1). Further, for international meetings, local chapters of a given society may also be involved, as well as contacts within the city council. International meetings warrant certain additional considerations including time zone, official language for the meeting, travel and national holidays. The World Wide Web (WWW) and electronic mail (e-mail) have helped considerably in the organisation of international meetings in this regard, and the ‘virtual office’ may be separate in location from the site of the meeting itself. For all services employed, it is important to see references and contact previous clients before signing any contracts [7]. Further, it is often helpful to meet in person key members of these teams. Finally, the
48
How to Organise a Surgical Meeting: National and International
617
Chair O.C Organizing secretary Overall O.C
Organizing secretariat
Local O.C
Catering services Local chapter secretary
Scientific programme
CME officer Abstract reviewers & publication committee
Social programme
city/ council contacts
Finance/ budget
Chairpersons/ moderators
A-V/I-T
Publicity/ advertising
Cleaning services
Trade/ exhibition
Press committee/ office
Security officer & personnel
Transport/ Lodging
Travel agency
Support staff & volunteers
Fig. 48.1 Interrelationships between members of the conference organisational structure. The dashed red outline indicates the core of the organising committee (OC) comprising the chair-
person, overall and local OC. Colour codes indicate related strata in the organisational structure
OC should write policy statements for registration deadlines and refunds and draw up contingency plans in the event that the meeting is cancelled.
also be needed for poster display, cybercafés and rest and dining areas. It is helpful to draw up a shortlist of three potential venues, and the merits of each should be examined systematically by the chairperson and OC. In confirming a venue, one should obtain multiple references and conduct a site visit in person to examine the meeting areas. Types of venue range from a meeting centre on a university campus to a large, purpose-built international congress centre or international hotel. A college campus may offer the advantage of low costs and included staff, though larger venues, if required, may have the advantage of past experience with large meetings and perhaps on-site housing. Computer access at cybercafés for internet and e-mail use are now essential, and a wireless (WiFi) network within the conference area is also helpful. Once the timing of the meeting and venue has been decided, a contract may be drawn up between the OC and the conference centre management. This should detail the rights and responsibilities of each party, the estimated number of delegates, number and type of meeting room required as well as computer, other I-T and
48.3 When and Where The location and social programme are important for promoting creativity and exchange of ideas as well as fostering a convivial atmosphere at the meeting and attracting as many delegates as possible. One needs to consider the conference city and venue itself in the context of the meeting objectives and scale. With respect to timing, one should check that this does not conflict with local holidays or other key meetings. One should also consider travel and congestion. An estimate of the number of delegates will help in planning the required facilities in terms of the number and size of meeting rooms required. Larger meeting areas may be needed for plenary sessions as well as for exhibits. Specific space will
618
B. Murtuza and T. Athanasiou –24 mo.
–18 mo.
–12 mo.
–6 mo.
Deadlines for abstracts, scholarships Housing oopens
Preliminary publicity Publish preliminary programme
Define theme, goals Appoint secretary OC, subcommittees
Meeting
Mandates to keynote speakers & session chairs
Construct budget
Feedback to chair, OC Issue certificates Final budget statement Thank-you’s
Print final programme
1st call for abstracts
Book venue
Sponsors for delegate packs
Local publicity
Hire part-time secretary assistant
Early registration deadline Final manuscript submission
Fig. 48.2 Timeline for organising an international meeting. For large international conferences, planning begins up to 2 years ahead of the event. Feedback to the chairperson of the OC and the dead-
line for receipt of final manuscripts may be within a month of the end of the meeting. This timeline may be adapted and compressed into a shorter time frame for a smaller meeting or symposium
A-V services, any included staffing for the event and importantly deadlines and insurance requirements [4].
and monies from trade exhibitors. Local industries may also have a history of willingness to support local events of importance. One should stagger rates for attendance fees for residents and fellows, guests, society members as well as devise a sliding scale for early registration rates and refunds according to a timeline. The OC may wish to include money for prizes and scholarships, and this will help to attract trainee investigators and presenters. In securing sponsorship from companies, it is essential to avoid conflicts of interest and to ensure that attending physicians will not be inappropriately influenced. It may thus be beneficial to obtain an unrestricted corporate grant for a meeting. Particularly relevant to the surgical meeting is sponsorship from companies driving new technologies in surgical practice, and one should avoid the surgical meeting as an ‘infomercial’ [8]. In terms of costs, these may be seen as fixed and variable (Fig. 48.3). Fixed costs include those for hire of the venue itself and meeting rooms, A-V and I-T support staff and other staff in the conference offices. Variable costs include meals and entertainment. The greatest expenses aside from hiring the venue are usually for publicity and advertising and for faculty and invited
48.4 Constructing the Budget Constructing the meeting budget is one of the first tasks of the OC as all else is contingent upon it. One must determine if one intends to show a profit, accept a loss or break even. The budget may thus be considered as a ‘plan of action expressed in money’ [6]. There are two sides to the budget: income and expenditure. The expenditure is controllable, whereas the income must be anticipated more carefully. Several overall margins are important to consider at the outset. One may need 10–20% of the budget for a professional conference organising company. One should also allow perhaps 10% for inflation from the time of first planning the meeting to the start of the meeting over a 2-year period. It is recommended to ensure that 60% or so of the income comes from registrations [6]. Other sources of revenue include internal institutional donations and external donations from corporate sponsors
48
How to Organise a Surgical Meeting: National and International
income
registrations Exhibits Sponsors
hip
Budget Expendi turefixed
Expendi turevariable
Venue h ire A-V & I-T
speaker & faculty cocts catering entertainmen t
Fig. 48.3 The meeting budget. Constructing the budget involves balancing sources of income with expenditure. The latter may be fixed or variable
speakers. These costs can reach $50,000–$75,000 for even a national meeting [9]. Catering for staff as well as staff lodging must be accounted for. Finally, for hotel block-bookings, an attrition rate should be estimated and a contract with hotels negotiated. At regular meetings of the OC and subcommittees, the budget should be constantly revised and updated. One may also set aside 1% of the expenditure budget for insurance costs in the event that the meeting is cancelled.
48.5 The Meeting Programme Once the theme and goals for a meeting have been set, the OC can draw up the provisional programme for the meeting. A stimulating and useful programme, which fulfils a need, will attract delegates. At the same time, the programme should appeal to a broad audience. One may plan the sessions according to the overall length of the meeting and the types of session one wishes to include. These may comprise plenary sessions, oral presentations of accepted abstracts, poster presentations, moderated poster sessions, meet-the-experts sessions, workshops, a business meeting, teleconference sessions and panel discussions. Concurrent sessions often run for large international meetings, though too much choice in
619
this regard may result in indifference and neglected sessions [10]. Further, planning concurrent sessions becomes exponentially complex, as the length of the meeting increases beyond a single day [4]. Abstracts are screened by a review panel, and accepted abstracts may then be grouped into themes for the meeting by the scientific programme subcommittee. Alternatively, abstract categories may be pre-designated by the programme committee. Poster sessions are a good way for delegates to recruit new people [1]. The programme OC should consider the length of talks for the various planned sessions. A plenary session with a keynote speaker may run for 45 min to 1 h, and these sessions should probably not be run concurrently with others; most oral presentations may be between 7 and 15 min with some time scheduled for questions. Small group and workshop sessions between plenary sessions are useful to enhance interaction between participants. One must also schedule free time for resting and reflection, meeting colleagues, checking e-mails as well as social events. Overall, one may wish to run morning and afternoon sessions with free time in the evenings or perhaps morning and evening sessions with a free afternoon [1]. Reserving some important sessions for the last day will help to ensure good continued attendance. For poster and moderated poster sessions in particular, one should factor-in time for setting up and taking down [11]. In the preliminary programme, one needs to state the official language for the congress sessions, the scientific programme, keynote speakers, a deadline for abstracts, early registration deadlines and dates for housing. There should also be details of the date sliding scale for refunds [6]. An outline of social events such as the opening reception should also be included together with details of the host city and local attractions. For selecting abstracts, one may define beforehand either the proportion of abstracts that will be accepted (e.g., 15%) or a specific number. It is helpful for selection if abstracts are structured and often a marking scale such as from 1 to 6 can be used: 1 – reject; 6 – definitely accept. Abstract acceptance should be on scientific merit and this may be defined using certain criteria such as originality, adequate data, legitimate conclusions and interest to the meeting/ society [12]. Often one observes grouping of reviewer marks for abstracts when using cluster analysis and in one study, a high efficiency of abstract selection was achieved using random allocation of abstracts to groups of three assessors from a total panel of twelve [12].
620
48.6 Planning the Meeting Sessions Meeting planning should take into account the overall length of the meeting, types of sessions planned and time for social events and breaks during the day. It is recommended that the organisers arrive 2 h before the start of each day to run through the programme. Guidelines for presentation length should be issued to speakers well before the meeting such as 10 min for speaking and 5 min for discussion. The congress theme will help in selecting appropriate keynote speakers and plenary sessions, and having these sessions in place will facilitate planning of other sessions around this backbone. One should strive for a balance between time for invited speakers and time for oral and poster presentations by delegates. Session types include the major symposium, clinical update sessions, presentation of original research and private/leisure time [13]. One may also hold early-morning breakfast sessions and allow smaller societies present at the main meeting to convene satellite symposia. It is helpful to schedule more intensive sessions for the morning when concentration is better, and plenary sessions may be planned for this time; the afternoon can be reserved for workshops with greater audience participation. Where concurrent sessions are run, these should be limited to an acceptable number based on the size of the meeting and one should aim to minimise the number of evening sessions. Good timekeeping is essential, and session chairs should be briefed regarding this and necessary timers and indicators made available. Chairpersons must also direct and keep the questions following presentations brief and to time. One type of session that requires particularly careful orchestration is the moderated panel discussion.
48.7 The Panel Discussion For the panel discussion, an important, perhaps, controversial topic is chosen by the programme committee. The moderator and panel for the session are subsequently appointed, and the moderator is informed well ahead of time as to what their remit is, the topic to be discussed, the audience and a list of panelists together with a brief resume of each. The moderator should then contact each panelist with details of the planned conduct of the session and topic and should obtain from them an outline of
B. Murtuza and T. Athanasiou
what they plan to say. This will help structure the session and talks in an optimal fashion. The moderator should be a good speaker with excellent knowledge and credibility within the field. They should be impartial and should be capable of conducting the discussion while maintaining a cordial mood. Ideally, the moderator will be someone experienced in such a role, and a good moderator is able to deftly bring out the strong points of each panelist. The ideal panelist, in turn, is similarly someone who is an expert and who can engage in lively and stimulating debate. In terms of conducting the session, the moderator should hold a preliminary meeting on the day of the session with the panelist to finalise the order in which they will speak. The moderator must engage both the panel and the audience and should have some questions themselves as well as invite questions from the floor; these may be written down beforehand on cards. Microphones should be strategically placed around the floor for sections of the audience and A-V staff should be on hand to manage these.
48.8 Seminars and Workshops Small workshops are an interesting and useful way of breaking up a meeting and form a good interpose between plenary sessions and demanding oral presentation sessions. They allow for interaction between those leading the sessions and small audiences. There should be a theme and purpose for each of these sessions and they should be run in a friendly and informal manner. Types of session may include master classes on surgical techniques aided by video demonstrations or ‘wet-labs’, meet-the-experts sessions with a small panel of experts to discuss novel surgical technologies and techniques, and mini-symposia focusing on key areas such as the academic surgeon, research in surgery, setting up a randomised clinical trial or other ‘how-to’ sessions. A variant on the workshop theme is the miniconsensus conference, although this may be the nature of the overall meeting itself. These consensus meetings aim to set policy recommendations and again are focused. A small panel of national or international experts is involved and the outcome of these meetings may be society guidelines and position papers. These meetings are thus an important means for distilling and making
48
How to Organise a Surgical Meeting: National and International
accessible evidence from clinical and translational research for surgical practice. One paradigm conceived for this type of medical meeting is the Cambridge Conference [14]. The concept of this is to bring together a ‘think-tank’ of international experts to discuss topics of high importance. Delegates are set specific goals such as reviewing the literature in key areas, the state of the art and to identify gaps in the knowledge base. The outcome of the conference when first convened was a monograph, which helped disseminate the ideas from the meeting. Such workshops thus bring together international leaders in a field as well as high-level policymakers and make available their recommendations for the whole community.
48.9 The Business Meeting The business meeting is an important forum for communication between a society membership and its leadership and is an essential part of a society-organised national or international meeting. It is where new members are admitted and annual reports of the society presented, such as the business and financial activity reports and where new guidelines are issued. The society president and chairperson preside over the meeting with an appointed meeting secretary to take minutes. The agenda should be set well beforehand and include minutes from the previous meeting, new items, new members, reports and activity; other matters arising; summary of decisions and an action plan; closing comments.
48.10 The Conference Secretariat Conference organisation is an ‘exercise in production management’ [6], and the organising secretary and office are at the heart of this, working closely with the chairperson, overall and local OCs (Fig. 48.1). The conference secretary must also be in close contact with chairpersons of all subcommittees and intended delegates. The secretariat should keep computer database records of delegates and their requirements including registration details, payments, housing requests, social event requests, abstracts and their acceptance status. All delegates should have a unique identification number cited in all correspondence. The secretariat should
621
ensure that sponsors for delegate bags are in place by 24 months for a large international meeting in conjunction with the trade and exhibits committee. The contents of these bags should be checked and include, for example, name badge, abstract and programme book, local maps and details of sights, bus and transport schedules, list of participants and sponsor pamphlets and pens. Final confirmations of attendance and abstract acceptance status are managed and sent out by the meeting secretariat. At the meeting itself, the conference office should be readily accessible and as a physical space should be able to accommodate small business meetings and have a place for conference materials such as delegate packs. General information for the secretariat at the meeting such as opening hours and services available including a business centre should be in the final programme brochure. On-site, the secretariat is also the central place coordinating communications concerning daily housekeeping between the OC and delegates. At both the meeting itself and during the planning phase, the conference secretariat maintains close links with the trade and exhibits committee and publicity/ advertising committee and press office. The press office in turn often has 4–5 members for international meetings and co-ordinates publicity from 4 months or so before the meeting, communicating with local press and radio. There may be a press conference a week before the meeting, as well as for important updates during the meeting itself. The press office may usefully be situated adjacent to the secretariat office during the meeting itself and should make available newspapers, notices of meeting press releases, copies of programmes and keynote speeches, as well as literature on participating societies.
48.11 Trade and Exhibition The T&E committee calculates space requirements for exhibitors and the charges to be made for this in conjunction with the secretariat and finance and budget subcommittee. As exhibitors are an important source of revenue for the meeting, their support should be secured early and many corporations have their conference budget determined at the start of the fiscal year. The T&E office also considers requirements for cybercafés, computing facilities offered to exhibitors as well
622
as storage space for their use. Volunteers are often essential members of the meeting workforce and help greatly with staffing exhibit areas. The chair of this subcommittee needs to specify how many representatives from each company will be permitted at the meeting and should design a floor-plan for vendor booths which is included in the final programme booklet [4]. The T&E office also liaises with the security officer to ensure that this plan meets the security, safety and fire safety requirements. The subcommittee chair should send an invoice to all confirmed exhibitors at least 2 weeks before the final programme goes to print, and this should include details of meeting location and dates, designated space number, floor-plan, directions for setting up and taking down of exhibits as well as storage facilities and facilities for electrical power and computing setup. T&E also liaise with the cleaning services manager concerning the exhibition areas.
48.12 Computing, I-T and A-V Computers are an essential requirement for organising a meeting, and excellent I-T support is a necessity. The conference secretariat and office need details of all delegates, registration, payments and housing requests, and these should all be maintained and continually updated on compiled databases. Virtual administration frees the organisers of physical and geographical constraints as members of the OC may be in different countries for international meetings [15]. E-mail allows for great speed and efficiency in communications between all members of the organising structure, as well as between potential attendees and the conference office. Online registration and abstract submission should be set up by dedicated I-T professionals who maintain the conference website. This method is rapid, flexible and secure and allows for automated e-mail confirmations and receipts to be generated. Details of housing may be kept on the website with links to individual hotels and booking forms. There should be contact details (telephone, e-mail, fax and mail) for the conference office in case of any queries. The congress website should also have details of the preliminary programme, list of invited keynote speakers and highlights of both the scientific and social programmes. There should be a link for important dates such as deadlines for early and final online meeting registration, abstract submission and housing registration.
B. Murtuza and T. Athanasiou
At the time of the meeting itself, computers and I-T support for the conference office are maintained as is support for internet and e-mail facilities for delegates. Many meetings now offer free WiFi access for delegates. Large plasma or liquid-crystal display screens may be strategically placed through the conference buildings with details of programme activities, changes and updates and important housekeeping notices. Interestingly, some meetings now offer programmes and updates online through WiFi for hand-held personal computing devices. Both I-T and A-V staff are needed to support these facilities, although the major role of the A-V staff will be to organise the facilities for presenters including microphones and the public address (PA) system, computers for presentations, projectors, video and DVD players, timers and pointers. There should also be speaker preparation rooms if possible staffed by A-V personnel. Clear instructions should be made available to all delegates before the meeting concerning the accepted format for presentations and video such as PowerPoint™ (®Microsoft Inc., CA) files and versions supported and whether files may be brought on compact disc or portable flash discs.
48.13 Telecast Meetings An interesting development in surgical and medical education is increasing use of high-speed data links via fibre-optic and satellite technology to hold teleconferences between remote centres. These may be used to broadcast live demonstrations of surgical procedures and allow for two-way interaction between the audience/chairperson and the operating team. Such telemedicine conferences, however, often require considerable planning time, financial resources and expertise. Costs for the technology required vary with the speed of data transmission and bandwidth used. The videoconferencing units used must also conform to international standards and provide for at least 1 MBit/s data transmission if possible. One needs carefully defined ‘connection topology’ for links between involved parties supervised and maintained by dedicated technical staff [16]. Further, one should define audience heterogeneity in terms of languages and time zones and each participating site requires trained I-T staff coordinated by a central unit to ensure correct and precise integration [16]. The OC in
48
How to Organise a Surgical Meeting: National and International
these cases needs to schedule ‘dry-run’ rehearsals of the set-up to test all communication links. An extension of the teleconference concept is the virtual meeting (VM), whereby an entire meeting is conducted on the WWW. Virtual sessions may be complimentary to a conventional meeting. Participants in these VMs can converse and exchange ideas in public and private meeting ‘rooms’ online and have their own characters that can move between VM rooms and take notes in a so-called multi-user dimension, object-oriented (MOO) interactive environment [17]. Such VMs can also accommodate trade exhibitors and incorporate electronic poster sessions.
623
The collection mechanism for forms should be stated, such as collection boxes and e-mail. One may estimate a 50% completion rate for forms, often with a further reduction in completion rates for the final day of the meeting. Tie-in of form completion and submission with CME accreditation and issuing of attendance certificates is an excellent way to ensure a high rate of returns. Finally, a report should be compiled by the CME officer once all returns are in and presented to the congress chairperson, overall OC and individual invited speakers after the meeting ends.
48.15 Conclusions 48.14 Evaluation and CME One should consider needs assessment as the basis for the programme and continuing medical education purposes. The ACCME has specific requirements in the United States, and these criteria should be met. The CME committee sets the educational agenda for the meeting in accordance with an educational mission statement [9]. The CME officer liaises with the scientific programme committee to oversee the accredited CME activities. The CME process may thus be viewed as a ‘continuum that starts with needs assessment’ [9]. One seeks Category I credits as set out by the ACCME, and all such congresses require structured meeting evaluation by participants. Overall objectives for meeting evaluation include: to ascertain educational value; to obtain delegates’ opinions about the programme; to give speakers feedback from their audience; to improve the quality of future meetings; and for CME purposes and certification [13]. Evaluation should consider faculty, keynote speakers, other lecture presentations, workshops, panel and group discussions as well as catering and other facilities. Delegates should be questioned on feedback forms as to whether the meeting met both the stated educational objectives as well as each individual’s expectations. Relevance to surgical practice should also be ascertained. CME evaluation forms should be in a simple format such as a tick-box matrix, which facilitates rapid completion and easy review. Delegates should be offered constant reminders throughout the congress to encourage completion of forms for each session and to help ensure submission of all completed forms by the end of the meeting.
Organising a surgical meeting requires a great deal of planning and time commitment. The chairperson relies upon a good team who work together and excellent communication is essential. Frequent update meetings during the planning phase help the chairperson and OC to steer the course of development of the programme and arrangements and help ensure that planning is kept to schedule. Update meetings also maintain flexibility, which means the OC can adapt their strategy in the face of unforeseen circumstances. Above all, despite being an arduous task, organising a meeting should be an enjoyable and rewarding one.
References 1. Petsko GA (2006) The high and lows of scientific conferences. Nat Rev Mol Cell Biol 7:231–234 2. Clegg HA (1969) International relations in medicine. Proc Royal Soc Med 62:1147–1150 3. Waddell P (1994) The role of research conferences in developing European collaboration in science and technology. SEPSU Policy study No. 9. Royal Society, London 4. Werner SE, Kenefick C (2005) A Primer for effective organization of professional conferences. Med Ref Serv Quart 24:39–54 5. Scrimgeor E (2001) A rough guide to organizing a medical congress. BMJ 358:1918 6. Capperauld I, Macpherson AIS (1978) How to do it: organize and international medical meeting. Brit Med J 2:541– 544 [(I) and following in series II–V] 7. Wall M (2004) How to organize a conference. The Sunday Times – February 15 8. Muskett AD (1998) The surgical meeting as infomercial. Ann Thorac Surg 65:297–298 9. Muroff L (2005) The anatomy of an outstanding CME meeting. J Am Col Radiol 2:534–540
624 10. Wood PHN (1971) Organizing a medical congress. Br Med J 4:290–291 11. McIntyre E, Millar S, Thomas F (2006) Conference works. BMJ Careers 23:116–117 12. Appleton DR, Kerr DNS (1978) Choosing the program for an international congress. Br Med J 1:421–423 13. Richmond DE (1983) Improving medical meetings I-IV. Br Med J 287:1201–1202; 1286–1287; 1363–1364; 1450–1451 14. Hays R, Jolly B, Newble D et al (2000) The Cambridge conference: background. Med Educ 34:782–784 15. Kent A (2000) Computers and conference organization. Med Educ 34:963–964 16. Hjelm M, Lee JCK, Li AKK et al (1998) Planning criteria for multicentre, multilingual telemedicine conferences. J Telemed Telecare 4:47–55 17. Hardy B, Sweet D (1996) Virtual conferences. Trends Cell Biol 6:363–365
B. Murtuza and T. Athanasiou
Web Links 1. The Accreditation Council for CME. Available at: http:// www.accme.org/ 2. Surgery conferences worldwide. Available at: http://www. conferencealerts.com/surgery.htm 3. Global Alliance for Medical Education. Available at: http:// www.game-cme.org/resources/index.html 4. International Special Events. Available at: http://www.isesuk.org 5. Meeting Professionals International. Available at: http:// www.mpiweb.org 6. Meetings Industry Association. Available at: http://www. meetings.org 7. Business link. Available at: http://www.businesslink.gov.uk 8. Zylstra T. How to organize valuable congresses/conventions. Available at: http://www.zylstra.org/blog/archives/001165.html
Presentation Skills in Surgery
49
Sanjay Purkayastha
Contents 49.1
Introduction ............................................................ 625
49.2
Content of Presentation ......................................... 626
49.3
Verbal Presentation and Poise ............................... 626
49.4
Slides ........................................................................ 626
49.5
Audio–Visual Tools for Presentations ................... 627
49.6
Other Materials That May Be Used as Adjuncts for Presentations................................ 628
49.7
Feedback and Learning New Techniques............. 628
49.8
Summary ................................................................. 628
Abstract In surgery, good presentation skills are important for many different professional reasons: clinical presentation of patients and cases, audit, research, teaching and lecturing, morbidity and mortality meetings, interviews for training and research posts, multidisciplinary team meetings, presentation of grant proposals and pitches to raise funding for research. All of these situations require careful thought and preparation to make the presentation eloquent, efficient and effective. This chapter provides some guidance on how surgeons can acquire the skills and background knowledge for a successful presentation.
Further Reading ................................................................. 628 Useful Web Sites ................................................................. 628
49.1 Introduction There are many definitions for “presentation” including:
S. Purkayastha Department of Biosurgery & Surgical Technology, Imperial College London, 10th Floor, QEQM Building, St. Mary’s Hospital, Praed Street, London W2 1NY, UK e-mail: [email protected]
• The activity of formally presenting something (e.g. a prize or award): “she gave the trophy, but he made the presentation”. • The act of making something publicly available, presenting news or other information by broadcasting or printing it: “he prepared his presentation carefully in advance”. • A show or display, the act of presenting something to sight or view: “the presentation of new data”: “he gave the customer a demonstration”. • The act of presenting a proposal. • Display: a visual representation of something. • Formally making a person known to another or to the public. • And in medicine (obstetrics): the position of the foetus in the uterus relative to the birth canal: “Caesarean sections are sometimes the result of abnormal presentations”.
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_49, © Springer-Verlag Berlin Heidelberg 2010
625
626
In surgery, good presentation skills are important for many different professional reasons: clinical presentation of patients and cases, audit, research, teaching and lecturing, morbidity and mortality meetings, interviews for training and research posts, multidisciplinary team meetings, presentation of grant proposals and pitches to raise capital and funding for research. All these situations require careful thought and preparation to make the presentation eloquent, efficient and effective. It is important to be clear what you are trying to get across to the audience, what is the audience’s background (so as to pitch the presentation at the correct level) and why are you giving the presentation (to teach, to compete, to win a prize, as an invitation, to entertain or to aid revision, for example). There are different factors in preparation, which will be covered in this chapter. These include content, verbal presentation and poise, slides and audio–visual usage and other materials that can be used as an adjunct (e.g. handouts and electronic resources to be given to the audience involved).
49.2 Content of Presentation It is imperative that the point of the presentation and the take home message be clear through the presentation, especially if it is invited or for a competitive purpose. A carefully thought out and snappy title is very useful, especially if the presentation will be advertised in advance. A slide with the aims and objectives of the presentation is very helpful at the beginning of the talk. The structure of the presentation is the key to a smooth flow. Usually, a clear introduction to the question or issue in question should be provided first. If the presentation is to be for research, then a shortened abstract in bullet form is helpful. The main part of the presentation should include the salient events or points in an order that makes clear sense (e.g. chronological or as per a study: introduction, methods, results, discussion and conclusions). The content should be completed with a firm reminder of the reason for the presentation, conclusions and take home messages, repeated if necessary. Always allow time for a few questions.
49.3 Verbal Presentation and Poise Presenting to audiences that usually include senior clinicians, peers, experts in their fields, students, the media and the public, requires the speaker to work on
S. Purkayastha
their verbal projection, confidence and poise. Practise, practise, practise! Rehearsal is imperative. A wellrehearsed presentation has good flow, projects well and is always remembered. Ultimately though, to present well the speaker must know his or her subject well. Speak slowly so that the audience can understand you without sounding patronising. Vary the pace of the talk, so that you hold the interest of the audience and know your slides so that you do not need to look and the slides. Do not just read out your slides! This is the worst kind of presentation. Form careful eye contact with the audience. Even for very large audiences, this is possible with different segments of the audience. Give attention to the different sections of the audience, from the front for to the rear tiers – this really hold the attention of the whole audiences. Posture is important too – stand up straight, don’t slouch, keep the shoulders up, but don’t be too stiff. Try not to gesture with arms and hands too much. If there is a lectern, then sometimes it is helpful to hold on to it. If the stage is large and the audience sizeable, then sometimes walking to different positions on the set stage is an effective tool to keep the presentation fluid, but to do this you must know your topic, slides and material extremely well. Using a wireless presentation device to change slides a laser pointer allows the presenter not to be fixed behind a laptop or console and gives greater mobility and also enables the presenter to ask and answer questions closer to the audience. Finally, it is imperative to keep to time. Do not be rushed and try not to linger on points that will unnecessarily slow you down and make you run over. It is not a good idea to overrun when other presentations may follow you in a tight schedule. Again rehearsal is the key to avoid this from happening. Rehearse using the timer function on the presentation programme you use and ideally rehearse to a friendly audience prior to the real event.
49.4 Slides Carefully prepared, stylish yet uncomplicated slides are excellent aids to presenting. Take the time to get to know a particular software package to make your slides. Whether it is Mac or Windows based, understanding the potential particular problems of the package and exactly what can be created is fundamental to its best utility. The title slide should be clear, and have the details of the talk in one short sentence if possible. The name and title of presenter and department/
49
Presentation Skills in Surgery
institution should be on the title slide. A simple, appropriate image regarding the presentation is also useful on the first slide. Take care not to make the slides too busy. Usually, no more than 4 to 5 lines of text should be used in bullet points. Text can be broken up with appropriate images. Avoid using abbreviations, and do not use over enthusiastic fonts! Stick to Arial and Times New Roman or other well-established fonts. Also, the font size is important: too small is problematic and too large seems simplistic. Using the slide master function allows you to set up the slide, font and format, colours, header and footers of the presentation to enable this to be present for each slide. This is an invaluable tool if a series of presentations is to be given so that a theme can be constructed and followed throughout the series. Colour scheme is best kept simple with one or two colours and simple texts. If a more elaborate design is used, this can distract the viewers from the content of the slide or presentation. Colour schemes can be used that are relevant to the department or institution in question and images of said institution may be used as an image for slides when questions are to be taken. If a complex diagram or image has to be used, try to break it up or annotate it so that it is easy to explain. Complicated images are like busy text slides in that they do not hold the attention of the audience and make it difficult to get the point across. Sometimes, animations may be useful in helping to explain a flow chart, process of care or more complex image. These should be kept simple if possible and can be accessed through the custom animations tool in most software packages. Allowing images or texts to appear in a click through different effects can be useful to talk through processes, but do not make these animations too elaborate, for it may diverge from the professionalism of a surgical presentation. In a similar way, slide transition can be animated and can often be used to enhance the flow of a presentation, but again do not create intricate transitions as it may detract from the content of the presentation itself. Headers and footers are useful to create a professional feel to any presentation, but these should be carefully selected so that they are not to busy and colourful. These can also be incorporated into the slide master and the footer can be used as a slide numbering tool. Avoid using large texts, and elaborate fonts here as it takes the readers’ eye away from the main content of the slide. If slides are made to be given out in an electronic format, then the header and footer can be used to denote the origin and ownership of the slides.
627
If this is the case, then it is useful to save the presentation using the read-only function so that the slides cannot be copied or changed by the audience that it is given out to. Occasionally, amusing or funny slides can be used to make a point or to break up a lengthy presentation. These can be taken from the World Wide Web or other sources, but it is important to remember to cite the place and or author(s) that they are taken from. Similarly, references must be remembered and can be included on each individual slide as the information is presented, or at the end with a list of references. If giving a presentation for teaching purposes, it is often helpful to include a “further reading” list.
49.5 Audio–Visual Tools for Presentations Video and audio materials are very useful in surgical presentations, especially to demonstrate surgery. A picture may be worth a thousand words, but then a carefully edited video is worth a million. Demonstrating surgical technique and new procedures are much easier with high quality video. These should be carefully edited, and checked for quality, sound and resolution as well as compatibility with the system that will be used on the day of the presentation. If in doubt, always take your own laptop with you so that you know that the audio–visual inserts will play without problems. If these clips are taken from the internet, they must be cited appropriately. If footage is available taken from a commercial source, for example a commercially available DVD, then permission should be sought prior to showing the clip. Using just audio clips is useful for quotes and music if felt necessary, but again the sound quality and utility of the console on the day of the presentation should be checked prior to its inclusion. If audio–visual clips are included, they should be inserted such that they play on the start of the slide or when the presenter clicks on the insert. If this is created, then it is useful to keep the presentation in its own folder with the clips saved in the same folder so that transference of the presentation. Video footage can be downloaded from may sites on the internet, including surgical teaching sites, hospital sites, journal sites and media sites (such as U tube). If commentary is already included on a video clip, the presenter should decide whether the included commentary or an explanation by the presenter is a better explanation of the said clip.
628
49.6 Other Materials That May Be Used as Adjuncts for Presentations Handouts of the slides or handouts of the materials may be put together for the audience to have some information to take away. Having all the information that you are going to present is not a useful tool as a handout as this usually reduces the attention span of your audience. However, if teaching students, it is useful to give handouts with some of the slide information on them, so that notes on each slide can be made. The number of slides to each page of the handout is an important consideration. If slides are intricate and have diagrams, then having four slides to a page is useful. If the slides are simple and mainly text, then having nine slides is more appropriate. Utilising electronic sources to give “handouts” is a newer way to leave information with the audience. This can be via a web site where the presentation can be read again (but not downloaded), or CDs, DVDs or even flash disks can be used to give out large presentations or those loaded with large amounts of video. For competitive presentations, a copy of the abstract or even an A4-sized poster is a useful tool to give the audience and judges an aide memoir to your individual presentation.
49.7 Feedback and Learning New Techniques Feedback is crucial to successful presenting. Ask your audience or critics for what you did well and what you could do better. If presenting to students, then often giving out feedback questionnaires is useful, but be prepared for some harsh criticism and do not take this too personally. If professional presentations are to be given regularly and you feel that you are not getting adequate critiques of your performances, have your presentation video taped and watch it for yourself or with someone with experience in presenting for more help with techniques. Learning new techniques is important so that your style does not get stale or predictable if presenting to similar audiences regularly. Go and watch experienced speakers and see what they do in terms or verbal projection, body language and materials used. This does not have to be in surgery
S. Purkayastha
necessarily, as many of the best speakers may be found in other professions or in the media.
49.8 Summary Presentations skills in surgery are an important asset for professional progress. They are useful not only in everyday practice but also for formal presentations. This chapter highlights the salient points to preparing, constructing and delivering formal presentations for many different perspectives, from research to multidisciplinary team meetings. A sound grasp of a presentation software package and confident verbal presentation techniques are necessary for successfully delivering formal surgical talks. Careful preparation is vital and is constructive feedback and watching experienced, distinguished speakers. Ultimately, the best presenters are those that are interested in the subject and know their subject clearly and concisely. If you can explain complicated surgical advances to a lay person or even a child, then your grasp of the subject is complete. Keep presentation styles fresh by learning new techniques, regularly updating your slides and keep it simple.
Further Reading 1. Bradbury A (2006) Successful presentation skills, 2nd edn. Kogan Page, London 2. DiResta D (1998) Knockout presentations. Chandler House, Worcester, MA 3. Templeton M, Sparks Fitzgerald S (1998) Schaum’s quick guide to great presentations (quick guides), McGraw-Hill Professional, New York 4. Wempen F (2004) PowerPoint® advanced presentation techniques. Wiley, Hoboken, NJ
Useful Web Sites http://lorien.ncl.ac.uk/ming/dept/Tips/present/present.htm http://www.presentationhelper.co.uk/Essential_Presentation_ skills.htm http://www.palgrave.com/skills4study/studyskills/personal/presentation.asp http://www.totalsuccess.co.uk/powerpointpresentationskills. htm
Internet Research Resources for Surgeons
50
Santhini Jeyarajah and Sanjay Purkayastha
Contents 50.1
Introduction ............................................................ 629
50.2
Bibliographic Databases ........................................ 629
50.3
Journal Collections ................................................. 630
50.4
Web Search Engines ............................................... 631
50.5
Medical Statistics .................................................... 631
50.6
Images and Audio-visual Resources ..................... 631
50.7
Online Anatomy Resources ................................... 631
50.8
Training ................................................................... 633
50.9
Interview and Exam Preparation ......................... 633
Abstract The internet is an important resource for academic and clinical surgery of the twenty-first century, encompassing research, publication production, presentations, as well as practice of evidence-based medicine. There are also resources available for clinical practice, exams, career progression, and social networking. This chapter provides a comprehensive overview of these resources separated into clinically, academically, and socially relevant sections with a guide on utilisation of each resource, methods of access and costs, where applicable, in the hope of aiding the twenty-first century surgeon to effectively use this wide-ranging, rapidly evolving asset.
50.10 Administration in Surgery .................................... 633 50.11 Social Networks ...................................................... 634 50.12 Summary ................................................................. 634
50.1 Introduction
Further Reading ................................................................. 635
The internet has evolved into a large, varied resource used in academic and clinical surgery of the twentyfirst century. There are numerous resources regularly used in academic surgery, encompassing research, publication production, presentations, as well as practice of evidence-based medicine. There are also resources available for clinical practice, exams and career progression and social networking. In this chapter, we provide a review of the different types of resources available online along with their uses and accessibility.
50.2 Bibliographic Databases S. Jeyarajah () Department of General Surgery, Royal London Hospital, Whitechapel, London E1 1BB, UK e-mail: [email protected]
There are several bibliographic databases available online. Here is a summary of those with the largest collections and the most widely used.
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_50, © Springer-Verlag Berlin Heidelberg 2010
629
630
• PubMed http://www.ncbi.nlm.nih.gov PubMed is a widely used free service provided by the U.S National Library of Medicine (NLM), which provides over 17 million citations from MEDLINE and other life science journals for biomedical articles back to the 1950s. It links to abstracts, full-text articles and other related resources through searches performed using the Entrez retrieval system, a text-based search and retrieval system developed by the National Centre for Biotechnology Information (NCBI) for NLM. Publishers participating in PubMed electronically submit their citations to NCBI prior to or at the time of publication, allowing online-first access. If the publisher has a website that offers full-text of its journals, PubMed provides links to that site, as well as biological resources, consumer health information and research tools. There may be a charge to access the text or information. PubMed also provides access and links to the other Entrez molecular biology resources, has the ability to store collections of citations, save and automatically update searches, filter to group search results and spell check. Searches can also be performed through citation management programmes directly through PubMed for easier access and through links to citations and full-text articles. Personal registered accounts are available to organise and manage citations. MEDLINE is the NLM’s main bibliographic database, which is the largest component of PubMed that contains references to journal articles in the life sciences with a concentration on biomedicine containing citations from 1950 to the present, with some older material.
S. Jeyarajah and S. Purkayastha
1974 to present. Each record is fully indexed, and it covers over 5,000 biomedical journals from 70 countries. Similar to Ovid, search and abstract retrieval is free, but full-text accesibility is at cost or through institutional accessibility.
50.3 Journal Collections Access to online journals is available through the bibliographic databases referred to above, as well as other similar websites such as with full-text articles available free depending on the publisher and journal type or at personal individual purchase, personal or institutional subscription. ScienceDirect http//www.sciencedirect.com/ is one of the largest online collections of published scientific research in the world. Produced by Elsevier, it contains over 8.5 million articles from over 2,000 journals, including titles such as The Lancet, Cell and Tetrahedron, as well as 40 reference works and numerous book series and handbooks. There are also specific collections formulated with particular purposes, which are reviewed below. • Cochrane Collaboration and Library http://www. cochrane.org/
Ovid Technologies is part of the Wolters Kluwer group of companies, providing access to online bibliographic databases, journals, and other products, chiefly in the area of health sciences. The MEDLINE database was once its chief product, but now that this is freely available through PubMed, Ovid has diversified into a wide range of other databases and other products including abstracts and full-text citations usually at a cost or through institutional accessibility although the use of the search facility and abstract retrieval is free.
The Cochrane Collaboration is an international nonprofit organization consisting of over 11,500 volunteers in more than 90 countries who apply a rigorous, systematic process to review the effects of interventions tested in biomedical randomised controlled trials with more recent reviews including nonrandomised observational studies. The results of these systematic reviews are published in the Cochrane Library, which is available online and is part of the Wiley Interscience system. It also includes systematic reviews and clinical trials performed by groups outside the Cochrane collaboration. Abstracts of reviews can be searched and browsed for free. Fulltext reviews are also easily accessible in many countries including the United Kingdom, Australia and the United States, parts of Europe, India and Africa have free access through governmental funding provision. Institutional licenses are also available at variable costs.
• EMBASE http:/www.elsevier.com/
• Bandolier http:/www.bandolier.com/
EMBASE, or the Excerpta Medica Database, is a biomedical and pharmacological database produced by Elsevier and contains over 11 million records from
Bandolier was an independent journal about evidencebased healthcare, written by Oxford scientists and printed in February 1994, and evolved into an electronic
• Ovid Technologies http://www.ovid.com/
50
Internet Research Resources for Surgeons
resource with the aim of providing easily accessible, summarised versions of systematic reviews, meta-analyses, randomised trials, and from high-quality observational studies that are obtained through searches of PubMed and the Cochrane Library. Where necessary systematic reviews are performed by the authors themselves, they appear first in the paper version and, after six months, on the website. These online versions are available free albeit 6 months after initial publication.
50.4 Web Search Engines More and more commonly generic information about all subjects, not least of all surgery is obtained through internet searches on the world wide web using search engines not limited to bibliographic databases. Information obtained may consist of web pages, images and other types of files. The more commonly used of these is Google Search http://google.com. Other search engines also commonly used are Yahoo search http://yahoo.com, MSN search http://msn. co.uk, aside several others. http://search.vivisimo.com is a search engine that not only searches for the terms entered into the search field but produces results that are related to the terms looked for, providing a wider overview of the subject. It also allows searching through the same search engine of Pubmed, news and Ebay. The difficulties that arise from using these resources are the variable quality of information obtained, and no real control is exerted by medical bodies. It is important therefore to use the information obtained this way with careful criticism and as an adjunct to more robust information resources.
50.5 Medical Statistics Statistics, which is a large element of surgical research can be accessed online for free for numerous purposes. Medical Statistics Using SPSS: an Introductory Course http://www.shef.ac.uk/scharr/spss/ is an online course on medical statistics using the often used SPSS statistical package with a guide to this software as well as to basic medical statistics. Statsoft http://www.statsoft. com/textbook/stathome.html is a free online statistical textbook with definition and explanation of statistical
631
methodology as well as online software and calculators. Statpages.net http://statpages.org/ is a compilation of statistical software packages from numerous servers, providing free guidance on analysis selection and the software required to perform these calculations. Review Manager (RevMan) is the Cochrane Collaboration’s programme for preparing and maintaining Cochrane reviews. It is a freely downloadable programme from http://www.cc-ims.net/RevMan that is used for protocol and review entry, and is able to perform meta-analysis on the data entered, presenting the results graphically. Another widely used meta-analysis software is Metadisc, freely downloadable from http://www.hrc.es/investigacion/metadisc_en.htm
50.6 Images and Audio-visual Resources Audio-visual resources are particularly important in producing high-quality, well-illustrated and explained presentations both academically and clinically. More and more of these resources are available online for public consumption with high-quality images, videos and audio recordings easily accessible and often free. Videos are of particular use in surgery as laparoscopic, and open procedures are taught and broadcast online. There are several websites through which these can be obtained (Table 50.1)
50.7 Online Anatomy Resources There are several online resources for anatomy. Images are obtainable from the resources cited above, but full anatomical descriptions and images are available from Anatomy of the Human Body by Henry Gray http:// www.bartleby.com/, which is The Bartleby.com edition of Gray’s Anatomy of the Human Body and Instant Anatomy http://instantanatomy.net, an online anatomy revision course complete with MCQs, podcasts and audio-visual broadcasts of lectures. These resources are free. • The Visible Human Project http://www.nlm.nih.gov/research/visible/visible_ human.html is run by the NLM and provides access to
Audio-visual resources
Images
http://wolfgang.hcuge. ch/Media/media.html
http://rad.usuhs.mil/ medpix/medpix.html
http://images.google. com/ http://www.healcentral. org/
http://laparoscopyhospital.com
Free http://www.nlm.nih.gov/ This resource provided by the United States/US National Library contains a medlineplus wide selection of medical information including a medical dictionary, encyclopaedia, interactive health tutorials, drug information as well as recorded webcasts of surgical procedures http://www.youtube. com
Health on the net media gallery
MedPix
Google images search
Health education assets library
Laparoscopy hospital
Medlineplus
YouTube
YouTube is a well-known audiovisual resource widely used online, it also provides webcasts of surgical and medical procedures
This online course is accepted for training in some international centres but is not yet commonly accepted in the UK. It, however, does provide a valuable resource of videos of general surgical and gynaecological laparoscopic procedures
This is a digital library of teaching resources for health sciences education. It is designed to facilitate the sharing of a wide variety of high-quality multimedia resources including images, audio and video clips on several areas in medicine that are located on the HEAL server and across many remote servers
Google’s search engine has a specialised image search facility. It is used in a similar fashion as Google Search
This is a medical image database provided by the Departments of Radiology and Biomedical Informatics, Uniformed Services University, Bethesda, MD. All contributed content may be copyrighted by the original author/ contributor. Radiology images predominate, but pathology, ophthalmology, dermatology, and endoscopy images are also included
This media gallery is put together by the Health on the Net Foundation, an international body that seeks to encourage ethical provision of online health information. The media gallery is indexed by body part and includes images of medical conditions and procedures
Free
Free
Free registration is required
Free
Free
It is possible to browse and use the images for free but copyright is acquired by payment
Free
First launched in 1996, as a source of lists on information in health and medicine. It is designed to provide links to high-quality directory pages, direct links to primary information in circumscribed subjects and links to medical pictures
http://www.lib.uiowa. edu/hardin/md/pictures. html
Hardin meta directory (MD)
Free
This system provides access to the nearly 60,000 images in the prints and photograph collection of the History of Medicine Division (HMD) of the U.S. National Library of Medicine (NLM). The collection includes portraits, pictures of institutions, caricatures, genre scenes, and graphic art in a variety of media, illustrating the social and historical aspects of medicine
http://www.ihm.nlm. nih.gov/
The archive can be browsed for free but downloading full-size images requires completion of a free registration process
Images from the history of medicine
This is a categorized archive of 20,000 images maintained by the University of Bristol
http://www.brisbio. ac.uk/
Access
Bristol biomedical images archive
Table 50.1 Surgical and Medical Images and Audiovisual Resources Title Web address Summary
632 S. Jeyarajah and S. Purkayastha
50
Internet Research Resources for Surgeons
cross-sectional images of the human. Sample images are available free of charge. Access to the complete sets of images requires a fee payment and completion of a license agreement. Most new editions of surgical textbooks now are available online upon purchase. This list would be too extensive and not within the remit of this book to provide.
50.8 Training Surgical training is evolving as we progress through the twenty-first century. More training is assessed formally at multiple levels, starting immediately from post graduation all the way through to continual assessment once specialist training is complete. Trainees are assessed by several trainers, using assessment tools and packages that are easily made available for utilisation online and can be accessed remotely allowing dynamic and up to date evaluation. Logbooks, traditionally kept as paperwork are also now available online, allowing monitoring and comparison to be made on a more global level. In the United Kingdom, this has been epitomised by the execution of the Intercollegiate Surgical Curriculum Programme, http://www.iscp.ac.uk. This programme has structured postgraduate surgical training with its curriculum, evaluation systems, recording and communication centralised into one website, with input and continual development involving the Royal College of Surgeons and Postgraduate Medical Education Training Board (PMETB). Similar programmes are being used in the United States with www.new-innov. com being used as an evaluation system by 47.4% of all surgical residency programmes. BeST, or Basic electronic Surgical Training http://www.bestonline. com is an independent, basic surgical training programme with core content, visual and interactive learning, realistic case studies, challenging simulations, tests and personalized feedback tailored for the UK and US residency training programmes. This programme is available at cost depending on the length of subscription. Other aspects of surgical training such as access to professional international and national organisations such as the Association of Surgeons of Great Britain and Ireland, as well as the associations of other subspecialties and American counterparts are mainly online.
633
Access to educational and professional resources and guidelines is commonly available through these websites, and submission of research for presentation at these organisations is usually online as well. At a specialist training level, there is also access to training groups such as the Dukes’ Club for colorectal trainees and the Mammary Fold for breast surgery trainees. These serve as important avenues for education through meetings, forums, revision resources, as well as a social network between trainees.
50.9 Interview and Exam Preparation Professional interviews in surgery at all levels of training are a daunting prospect. Preparation involves revision of clinical management, surgical technique, clinical governance and health service policy and provision. All these elements can be researched online using the resources suggested above as well as numerous others that can be found using simple searching techniques. Interviews also require practice for which questions are available online as are course options. Surgical exam preparation (MRCS and FRCS) is also available online, mainly in the form of MCQ questions. However, most of these resources are available at cost. The following is a summary of the resources available online for interview and exam preparation (Table 50.2): There are also many websites providing access to application for revision courses and interview courses run by independent companies, hospitals and medical schools.
50.10 Administration in Surgery Staying up to date with national administrative changes is the key to successful clinical and academic surgery. As this is a continually changing area, the internet is an ideal way to keep on top of continual transformations in the system. It is important to be informed about administrative processes, governance and managerial skills for successful research hypothesisation, application and execution. It is also integral to successful career progression and interview performance. The following is a list of several websites useful in this area that are free to access (Table 50.3).
634
S. Jeyarajah and S. Purkayastha
Table 50.2 Websites for Interview and Exam Preparation Resource type Title Web address Interview questions
Exam revision
Summary
Access
Surgeons.org
http://www.surgeons.org.uk/ general-surgery-tutorials/ interview-questions. html?Itemid = 44
Free A list of interview questions with a forum of questions asked at interview of other candidates
Royal College of Surgeons, England
http://www.rcseng.ac.uk/ career/careersadvice/ interviewtips.html#Q
A robust collection of questions and a suggested list of areas that need to be covered prior to interview with tips about pre-, during and post-interview practice
Association of surgeons in training
http://www.asit.org
Examples of questions Free registration for several levels of training from consultant down
Surgical tutor
http://surgical-tutor.org.uk
Provides a spectrum of resources including practice MCQs, surgical tutorials, revisions notes, images and slides
MRCS.org
http://www.mrcs.org.uk
A website with online Courses and tools revision courses, have variable costs but lecture courses and revision generic tips are free material as well as exam technique hints and tips
One examination
http://www.onexamination. com/
A choice of questions for the Price varies by MCQ element of the MRCS length of with feedback and analysis subscription
Pastest
http://www.pastestonline. co.uk
A website provided by one of Price varies by type of the most comprehensive resource commercial postgraduate education organisation. There are choices of online MCQ courses, lecture courses for all parts of the MRCS, SPR management courses and interview training
50.11 Social Networks The internet has become one of the largest social resources utilised by surgeons with the advent of email, which has become almost indispensable to daily practice. DoctorsNet http://doctors.org.uk is a specialised website set up only for subscription by doctors and provides personal email, education, news, discussion forums and job applications. Registration and service provision are free but only available to doctors registered for practice in the United Kingdom. Several other social utilities are also commonly used, including http://www.facebook.com and http://www. friendster.com, which provides an outlet for interacting
Free
Most resources are available free however the revision courses cost £22–£30
through messages, forums, event invitations, sharing of photographs and setting up of common interest groups.
50.12 Summary The internet is an enormous resource, which can be very effectively used as long as selection and criticism are exercised when selecting the type of information used. It will continue to expand and change, and staying closely related with continuous interaction with these transformations will ensure that the academic surgeon remains academically and professionally progressive.
50
Internet Research Resources for Surgeons
Table 50.3 Free websites for Administration in Surgery Title Web address
635
Summary
Department of Health
http://www.dh.gov.uk
Informative website on health policy, structure of the health service, finance and management with numerous links and pdf documents with new guidelines and policy change
Clinical Governance
http://www.cgsupport.nhs.uk
Clear information about the pillars of clinical governance and its practical implications
Medical Research Council
http://www.mrc.ac.uk
A website providing guidance to researchers to allow implementation of good practice meeting ethical and legal requirements
Research Governance
http://www3.imperial.ac.uk/clinicalresearchgovernanceoffice/researchgovernance/ whatisresearchgovernance
A good definition and application explanation of this principle
National Institute of Clinical Excellence
http://nice.org.uk
This website provides the guidelines issued independently by this agency on public health, health technology and clinical practice
National Patient Safety Agency
http://www.npsa.nhs.uk/
An overview of the national patient safety agency
Health Matters
http://www.healthmatters.org.uk/
An independent, quarterly online magazine dealing with issues pertaining to the NHS, public health and health politics including human resources and national policy
BBC
http://www.bbc.co.uk/health
A dedicated section on the BBC website encompassing news on health policy and politics, research and public information and advice
Further Reading 1. Allen JW (2002) The internet for surgeons, 1st edn. Springer, New York 2. Davis JB (2002) Health & medicine on the internet: a comprehensive guide to medical information on the world,
4th edn. Practice Management Information Corporation, Los Angeles, CA 3. McKenzie BC (2002) Medicine and the internet: introducing online resources and terminology, 3rd edn. Oxford University Press, Oxford
Clinical Practice Guidelines in Surgery
51
Shawn Forbes, Cagla Eskicioglu, and Robin McLeod
Contents 51.1
Introduction .......................................................... 637
51.2
What Are Guidelines? ......................................... 638
51.3
Developing Guidelines ......................................... 639
51.3.1
51.3.4
Defining the Scope of a Clinical Practice Guideline .................................................. Reviewing the Literature........................................ Assessing Study Quality and Level of Evidence in Surgery........................................... Formulating Recommendations .............................
51.4
External Review of a New Guideline.................. 641
51.5
Updating Guidelines ............................................ 641
51.6
Implementation of Guidelines............................. 642
51.7
User Issues ............................................................ 642
51.7.1 51.7.2
Rating Guidelines .................................................. 642 Practice Guidelines-Can They Effect Change? The Cancer Care Ontario Experience .................... 642
51.8
Conclusion ............................................................ 644
51.3.2 51.3.3
639 640 640 641
Abstract Clinical practice guidelines (CPGs) are defined as “systematically developed statements to assist practitioner and patient decisions about appropriate healthcare for specific clinical circumstances.” A systematic approach for the development of CPGs ensures that the best available evidence is combined with clinical experience and patient preferences. CPGs are useful to surgeons because the published literature is synthesized into a useful recommendation. In developing CPGs, a systematic review of the literature should be performed, the quality of the evidence is assessed, and the recommendations are made based on the evidence as well as by consensus. All of this should be done in a reproducible and transparent fashion. Furthermore, the quality of the evidence directly influences the strength of the recommendation. Finally, experience suggests that active strategies are required to implement CPGs in order to change physician behavior. This may, in fact, may be the most challenging part of CPG development.
References ........................................................................... 644
51.1 Introduction
R. McLeod () Department of Surgery, University of Toronto, Toronto, ON, Canada e-mail: [email protected]
Clinical practice guidelines (CPGs) have emerged in an era when there is constantly expanding literature and a clinical setting focused on providing the best quality of care to patients. CPGs can assist physicians and surgeons to synthesize and better utilize published evidence from the large pool of literature. Although CPGs may be designed to amalgamate and appraise large bodies of evidence in an evidence-based manner, sometimes, CPGs can be fraught with bias and conflicts of interest. In this chapter, we detail what guidelines are, describe the process by which guidelines are written, and highlight some of the challenges faced
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_51, © Springer-Verlag Berlin Heidelberg 2010
637
638
with guideline implementation. Also, we present critical appraisal tools that can be used by the practicing physician to assess the quality of published guidelines. Finally, we describe an example of guideline development aimed at improving the surgical care of patients.
51.2 What Are Guidelines? CPGs are defined as “systematically developed statements to assist practitioner and patient decisions about appropriate healthcare for specific clinical circumstances” [1]. A systematic approach for the development of CPGs ensures that the best available evidence is combined with clinical experience and patient preferences [2]. CPGs, like systematic reviews and metaanalyses, are another method to consolidate evidence from a variety of sources as well as address conflicting or inconsistent evidence. Meta-analyses combine results of studies in a quantitative approach, whereas systematic reviews and CPGs employ more qualitative methods. One advantage of these qualitative approaches is that these publications can include evidence from a heterogeneous sample of studies and are not limited to clinical trials. Going one step further, authors of CPGs make recommendations based on the best evidence, considering both the expected benefits and harms of a given diagnostic test or treatment. CPGs can fall under two broad categories based on their development. Firstly, CPGs can be consensus statements based largely on clinical expertise and expert opinion in areas where there is not a large body of evidence. Alternately, CPGs can be evidence-based guidelines, which are developed by drawing on evidence from well-designed, randomized controlled trials and meta-analyses. If these two different types of CPGs represent two ends of a spectrum, then most CPGs fall at some point along this continuum. In controversial areas where evidence is lacking, a CPG derived by expert consensus may be useful. In the health care profession, the two most commonly used consensus-based methods are the nominal group technique or the Delphi survey [3]. In the nominal group technique, also called an expert panel, members of the group suggest their solutions for a given problem. After duplicate suggestions are eliminated and similar solutions are grouped together, the members rank the remaining solutions. The results are then tallied to
S. Forbes et al.
identify the solution with the highest rank. This technique is designed to offer quick solutions and consensus while still taking each member’s opinion and clinical expertise into account [3]. In the Delphi method, either content experts or the team organizing the conference present opinions regarding a particular topic. After a period of questions and discussion, these opinions are grouped and summarized in statements as part of a questionnaire. Next, participants are asked to rank their agreement with each statement. A summary of these ranks along with the initial statements is then sent back to the participants for re-ranking, which offers participants to take into account the ranking from the rest of the group. This process repeats until an appropriate degree of consensus is reached [3]. Evidence-based CPGs, in contrast to consensus guidelines, rely on evidence from research studies in order to make recommendations. Evidence-based CPGs can include evidence from only randomized controlled trials if this level of evidence is available for the particular topic. However, if evidence from only cohort studies, case–control trials or even case series is available, then the CPG is based on this evidence. For example, in the area of mechanical bowel preparation for elective colorectal surgery, there are many randomized controlled trials and meta-analyses, so a CPG addressing this topic may include only this high level of evidence. On the other hand, CPGs on the optimal technique for rectal cancer surgery must be based on lower level evidence. One of the strengths of CPGs is that they take into account the quality and level of the evidence when making and grading the recommendation. Furthermore, evidence-based guidelines are preferred to consensus guidelines because when they are well written; they do not have the inherent biases that are present with consensus guidelines. The members or even the conferences to develop consensus guidelines may be supported by various funding bodies and pharmaceutical industries and thus are not always free of bias and conflicts of interest. Although these issues may affect evidencebased guidelines, this is the exception as opposed to the rule. In evidence-based guidelines, the literature is appraised based on a priori specified quality criteria and the recommendations are formulated in a transparent fashion. Furthermore, in evidence-based guidelines, the quality of the evidence directly influences the strength of the recommendation. For example, evidence
51
Clinical Practice Guidelines in Surgery
639
from many well-designed, randomized controlled trials would provide a stronger recommendation than results from case–control trials. This direct link between the quality of evidence and the proposed recommendations allows the reader to understand the transparent process involved in the formulation of the recommendation. Finally, CPGs should also be distinguished from clinical pathways or clinical care pathways. A CPG is a synthesis of the available evidence for a given topic, which includes a quality assessment of the primary literature and a specific recommendation derived from this evidence. In contrast, clinical care pathways describe a sequence of events, usually with designations as to who should carry out each step, used to take a patient or patient population toward a particular outcome in a defined period of time [4]. CPGs should also be distinguished from standards, which state rigid rules of care for patients with specific clinical conditions and are usually used as measures of quality, medico-legal or pre-certification standards for approval of care [5]. Modern CPGs are designed to offer the reader a concise, systematic, and evidence-based review of the literature to help guide patient care with respect to complex or controversial management issues. Several bodies, such as the Canadian Task Force on Preventive Health Care (CTFPHC) and the US Preventive Services Task Force (USPSTF) have published their methods for guideline development [6–8]. While CPGs were initially geared toward screening and preventive medicine, the methods are easily translated for use in developing guidelines on intervention strategies as well. CPGs are particularly useful in areas of clinical
Patient Population at Risk
Screening
Early Detection of Disease
Treatment
Adverse Effects of Screening
Fig. 51.1 The analytic framework. Each arrow represents a clinical question to be answered by the guideline. The “overarching” arrows represent the principle question the guideline
debate and disagreement. When there is a large body of evidence, which may or may not be conflicting, CPGs are a useful technique to evaluate the quality and results of this evidence. CPGs are less helpful when there is already a consensus or very little variation around a given topic. The following steps are adapted from the CTFPHC and USPSTF and are designed to provide an overview for guideline development in surgery.
51.3 Developing Guidelines 51.3.1 Defining the Scope of a Clinical Practice Guideline The first step in preparing a practice guideline in surgery is formulating an appropriate question. Typically, a guideline should focus on a topic where controversy exists as to best practice, where a balance between benefit and harm must be weighed, or where there is a need to develop a practical strategy for implementing a standard of care. CPGs also offer additional support for changing otherwise engrained, though archaic, practices. To facilitate a structured approach to formulating relevant questions, Battista and Fletcher described a practical tool called the “Causal Pathway,” since renamed the “Analytic Framework” by the USPSTF [7-9]. The Analytic Framework defines the scope of the guideline (Fig. 51.1). Each arrow represents a “linkage”
Intermediate Outcome
Reduced Morbidity or Mortality
Adverse Effects of Treatment
sets out to answer: “Does a screening test/treatment lead to a reduction in morbidity or mortality from disease.” Adapted from Harris et al. [7]
640
in the chain of evidence used to answer a question set by the CPG. The first or “overarching” question states the premise of the CPG: “Does screening test A reduce the mortality of disease B?” The remaining questions in an Analytic Framework examine prevalence or burden of disease, the diagnostic accuracy of screening tests, the effects of various treatments on intermediate outcomes, and the associated harms of an intervention: an indirect approach to answering the overarching question. Where no evidence exists to directly demonstrate the efficacy of a maneuver, indirect evidence is sought. For example, in a guideline published by the USPSTF on screening patients for carotid artery stenosis, there was no direct evidence demonstrating the efficacy of screening asymptomatic adults for carotid artery disease on fatal or nonfatal strokes. Indirect evidence, including the diagnostic accuracy of duplex ultrasonography, the efficacy of endarterectomy on asymptomatic subjects compared to medical management, and studies on potential harm caused by screening led the task force to recommend against the use of screening ultrasonography [10,11]. While some guidelines are designed to present the evidence pertaining to a single aspect of care, for example, the role of screening asymptomatic patients for carotid artery stenosis, other guidelines present evidence-based practice parameters for managing complex conditions. Recommendations are presented in a style similar to the USPSTF reporting the level of the evidence and the strength of a recommendation, an example of which includes the guidelines set forth by The American Society of Colon and Rectal Surgeons for the management of rectal cancer [12]. Establishing these questions that define the scope of the CPG facilitate the conduction of a systematic literature review.
51.3.2 Reviewing the Literature A predefined and systematic approach is required when performing a literature search. Each question to be answered by a CPG represents a separate search strategy. Before embarking on a literature review, criteria pertaining to patient type, the screening tests or intervention of interest, and study type (randomized controlled trials of interventions, cohort studies of diagnostic accuracy) must be decided upon. The expertise of a research librarian or a specialist in research methodology is essential in
S. Forbes et al.
executing a comprehensive search of the literature. Databases routinely searched include MEDLINE, a collection of over 16 million citations in life sciences and biomedicine; EMBASE, a biomedical and pharmacological database containing nearly 11 million citations; CINAHL, specializing in nursing and allied health literature, and the Cochrane Database of Systematic Reviews, a source of reviews of the medical literature conducted with the most rigorous of methodologies [13-16]. A comprehensive review typically includes a hand search of the reference lists of important publications and discussions with content experts to ensure that no seminal works have been missed. Once a body of literature has been collected, the next step includes screening the abstracts of the works for their relevance to the topic of interest. This includes searching for other systematic reviews and meta analyses, randomized controlled trials, prospective cohort, and retrospective case–controlled studies that answer the questions a guideline sets out to answer. At least two reviewers should be involved with the screen to ensure that no relevant studies are overlooked. However, where large bodies of evidence are to be reviewed, the participation of several reviewers is often necessary. Conflicting opinions on the inclusion of studies are resolved by consensus. Once a list of studies has been compiled, each study is critically reviewed and appraised for quality.
51.3.3 Assessing Study Quality and Level of Evidence in Surgery The quality of a study is a measure of its internal validity or degree of bias [17]. Organizations such as the USPSTF have established a number of criteria for measuring quality for various study types [7]. These criteria are generic; however, they may be tailored to the specific topic as required. Surgical studies by their very nature may differ in methodology compared to medical trials, and therefore the quality criteria by which studies are appraised may not be appropriate or fair. Blinding, for example, can be difficult to incorporate into trials comparing surgical and medical interventions, or different types of surgery (laparoscopy vs. open surgery being a good example), although developing objective outcomes may partly compensate for this. There are several published Users’ Guides that provide a framework for the reader by which to appraise both the internal and
51
Clinical Practice Guidelines in Surgery
external validity (generalizability) in the surgical literature [18-20]. Both the CTFPHC and USPSTF use a three-level (Good, Fair, or Poor) system for grading quality. Studies of the highest quality, meeting all of the predetermined criteria (appropriate sample sizes, validated measurement tools, reproducible outcomes, etc.), are graded “good.” Studies with fatal flaws (significant loss-to-follow up, invalid measurement devices, etc.) are graded “poor,” while studies in between are graded “fair.” The quality of reported studies cited in a guideline should be disclosed to determine how each study influenced the grade of the final recommendation. Level of evidence is a reflection of study design. Meta analyses and large, randomized controlled trials represent the strongest level of evidence, whereas small, uncontrolled case series, case reports and expert opinion represent the weakest level of evidence. Level of evidence does not reflect the internal validity of a study, and therefore must be interpreted with caution. Controversy exists as to whether a well-designed case–control study may provide better evidence than a poorly conducted randomized trial. There are a variety of scales for grading the level of evidence published by various organizations [7, 21]. While each reports level of evidence in a different way, the relative strength of one level of evidence vs. another is preserved between scales.
51.3.4 Formulating Recommendations As with measuring the quality of a study or the level of evidence, there are a variety of scales that report the strength of a final recommendation, yet this is perhaps the most difficult aspect of guideline development. The USPSTF grades its recommendation based on both the certainty and the magnitude of the net benefit of an intervention [22]. The certainty of the net benefit of an intervention is based on several criteria: the type and quality of individual studies; the generalizability of a study; the number and size of individual studies; the consistency of results between studies, and the general fit of the results within a biologic model. Certainty is then assigned a high, moderate, or low grade. The magnitude of net benefit is a quantification of the outcome, for example, relative risk reduction, number needed to treat (NNT), or even number needed to harm (NNH). The balance struck between certainty and magnitude is then used to reach a final letter grade of recommendation.
641
Another method of grading recommendations, which incorporates study design, quality, consistency, and directness (external validity) of study results, has been described by the Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group [23-25]. The GRADE system reports the strength of a recommendation as either strong or weak based on the net benefit (the difference of benefits and harms), and the quality of the evidence (high, moderate, low, or very-low) according to the methodology, consistency and directness of the evidence. This system requires some insight on the reader’s part. Recommendations based on studies of poor quality may still be “Strong,” though the quality of the evidence may be “Very Low.” The GRADE system has been adapted by a number of internationally recognized bodies including the World Health Organization and the Cochrane Collaboration.
51.4 External Review of a New Guideline The development of a CPG requires the input of all relevant stakeholders whose practice may be influenced by its implementation. Guidelines affecting perioperative care, for example, should include the input of surgeons, anesthesiologists, perioperative nursing specialists and administrators, as well as patients. The input of such experts will ensure that guideline recommendations are not formed in a “vacuum”; that the recommendations are both in the best interest of the patient, as well as feasible and within the scope of practice of caregivers. An external review, an evaluation of the contents of a CPG by stakeholders not involved with its development offers unbiased feedback with which to modify the guideline before publication.
51.5 Updating Guidelines CPGs should be reviewed and updated every 1–2 years, depending on the available resources. In order for CPGs to provide good-quality recommendations, it is imperative for authors to be up-to-date regarding the guideline topic and add new evidence to the CPG as it becomes available. Updating CPGs is particularly important
642
because newly published evidence may change the net balance and thus, the proposed recommendations.
S. Forbes et al.
particular guideline offers the best chance that its recommendations will be adopted into practice. It may also be useful to engage the assistance of experts in knowledge translation research field [29].
51.6 Implementation of Guidelines CPGs are designed to provide busy physicians a concise review of the current evidence and recommendations based on the quality of this evidence. CPGs should be displayed in a reader-friendly format and provide direction on providing evidence-based quality care in order to improve patient outcomes. However, simply publishing the CPGs rarely leads to either changes in physician behavior or improvement in patient outcome. Furthermore, academic rounds, conferences, and CME, perhaps the most common strategies for implementation, have been reported to have at best mixed results. Active strategies are required to implement guidelines and initiate change. These challenges fall under the larger field of implementation research and knowledge translation. Knowledge translation has been defined as “the exchange, synthesis and ethically sound application of knowledge … to accelerate the capture of the benefits of research” [26]. Knowledge translation can apply to the dissemination and application of results from any research such as randomized controlled trials or CPGs. Unfortunately, this process can be slow, incomplete and faced with many barriers [27]. In a comprehensive review of different guideline implementation strategies in 2004, Grimshaw et al. reported that most interventions including reminders, educational interventions, opinion leaders and audit and feedback have only minimal or moderate changes [28]. Other models of implementation including audit and feedback, multidisciplinary collaboration, media campaigns, financial incentives, and a combination of the above, among others, have had varying effects. Furthermore, this review revealed that strategies that are successful in implementing one guideline may not be equally successful in another setting [28]. Therefore, it is especially important to first identify the relevant stakeholders or end users of the CPG and the setting for which the CPG is targeted before embarking on an implementation strategy [29]. Barriers to change exist at the individual and organizational levels, each needing to be addressed in order to facilitate a change in practice. A strategy tailored to the barriers specific to a
51.7 User Issues 51.7.1 Rating Guidelines With the abundance of literature being published every year, surgeons must critically appraise articles and interpret the results in order to potentially change their practice patterns. They must take into account their patient population and local practices as compared to those evaluated in the study. CPGs should also be critically appraised in a similar fashion. Just as there are certain criteria, such as the Consolidated Standards of Reporting Trials (CONSORT) criteria for assessing the quality of randomized controlled trials, there is the Appraisal of Guidelines, Research and Evaluation (AGREE) instrument to assess the quality of CPGs [30,31]. This instrument is composed of 23 questions divided into six broader headings including (1) scope and purpose, (2) stakeholder involvement, (3) rigor of development, (4) clarity and presentation, (5) applicability, and (6) editorial independence. The AGREE is a user-friendly tool, which is presented as a questionnaire in combination with a users guide and directions for calculating domain scores [6]. The AGREE instrument is one of many quality assessment tools that can be used to critically appraise CPGs [32–34]. Each allows the reader to appraise the methodology and applicability of the recommendations of the CPG. Additionally, a well-developed CPG should include the full disclosure of its methodology, including the composition of its development team, any conflicts of interest on the part of guideline developers, and the participation or sponsorship of industry partners.
51.7.2 Practice Guidelines-Can They Effect Change? The Cancer Care Ontario Experience Cancer Care Ontario (CCO) is a government agency whose mandate is to oversee cancer services in the
51
Clinical Practice Guidelines in Surgery
province of Ontario. One of its responsibilities is to monitor performance and report both to the government as well as publicly in order to ensure there is accountability. The Surgical Oncology Program is one of the clinical programs under the CCO umbrella, which also includes systemic therapy and radiation. Unlike radiation therapy, which is provided only in designated Cancer Centers, most cancer surgery is performed in acute care general hospitals throughout the province. Furthermore, most cancer surgery is performed by non cancer specialists. The goal of the Surgical Oncology Program is to provide timely access to patient centered, high-quality cancer surgery to all patients in the province. This is achieved by developing regional networks of multidisciplinary care. Complex surgery, where there is an evidence of a volume-outcome relationship, is regionalized to designated centers. More commonly though, the prevalence of most cancers precludes this and instead other measures must be used to ensure there is highquality care delivered throughout the province. Developing and implementing evidence-based guidelines is a central initiative of the program. As shown in Fig. 51.2, the quality improvement cycle begins by identifying the problem. Various sources of data, including administrative and pathological data, are used to assess the current status of cancer surgery. For instance, using administrative data, Simunovic et al., reported that the postoperative mortality following pancreatic surgery in the province of Ontario between 1988 and 1994 was 10.2% ranging from 14.4% in low-volume hospitals to 3.4% in highvolume hospitals [35]. Surgeons are also surveyed and workshops have been held to determine surgeons’ priorities in quality improvement initiatives. Once the
Identify the Problem
Develop Guideline or Standard
Evaluate Results
Initiate Knowledge Transfer Strategies
Fig. 51.2 Quality improvement cycle
643
question is established, the Surgical Oncology Program collaborates with CCO Program in Evidence-based Care (PEBC), which coordinates the development of evidence-based CPGs. An expert panel is created to develop the guideline on the specific topic. Members of the expert panel include representatives from all stakeholders including surgeons, clinicians from other relevant disciplines (e.g., medical or surgical oncology, diagnostic imaging, pathology) as well as administrators. This increases the validity of the guideline and also increases the likelihood that the recommendations will be accepted and incorporated into practice. The addition of administrators has been felt to be important if resources have to be allocated to implement the guideline. In developing the CPG, an evidence-based approach is used combining a systematic review along with consensus. The evidence-based approach ensures that the best available evidence is used in an unbiased explicit manner to make recommendations for patient care. The search strategy is explicit and the process is transparent. However, much of the surgical literature is of rather poor quality so members of the expert panel evaluate the evidence but expert opinion is often required. The recommendations of the CPGs are not graded. Once the initial report is completed, it is sent out to stakeholders to receive practitioner feedback. The opinions of the clinical community are incorporated in a systematic fashion. This is an important part of the process. Browman and colleagues reported that 19 of 43 CPGs were changed as a result of input from practicing oncologists. Of the 40 changes, 28 (70%) were considered to be substantive [36]. The guidelines are disseminated in multiple ways including posting on the CCO website (www.cancercare. on.ca). Other strategies have been employed depending on the guideline. Following completion of a guideline on Laparoscopic Surgery for Colon Cancer, a mentoring program was initiated in partnership with industry and the Ontario Association of General Surgeons. Other strategies have been the use of expert opinion, regional champions, audit and feedback, listserv discussions and workshops. Generally, a multifaceted approach is used with involvement of the community of practice. In addition, while the guidelines are developed centrally, implementation occurs locally. Thus, local initiatives may vary. In the case of Standards, which were developed for both
644
Thoracic Surgery and Hepatopancreatic Biliary (HPB) Surgery, system changes were required, so financial incentives were provided. Has this process been successful? First of all, the guidelines have great credibility because of the rigorousness in developing them. In addition, relevant stakeholders are involved in all aspects of their development. Lastly, the PEBC is supported by funding from the provincial government, so there is no potential bias from industry or competing interests of specialty groups. With respect to having an impact on patient care, following reporting of the postoperative mortality rates by Simunovic et al. a HPB Practice Guideline was developed [36]. This guideline made recommendations about volume of surgery as well as hospital and surgeon requirements. It was recommended that HPB surgery should be performed only in hospitals, which perform more than 20 pancreatic procedures per year and a total of 50 HPB procedures each year. In a subsequent review of 2005 data, the proportion of patients having surgery in high-volume hospitals increased from 17.8 to 62.3%, and overall postoperative mortality decreased from 10.2 to 6.2%. These changes were made mainly by audit and individual hospital results and providing feedback. On the other hand, the mentoring program introduced following the Laparoscopic Surgery for Colon Cancer guideline was in some respects less successful. Only 35 procedures were completed in this program. While the absolute number of procedures performed was small, it did stimulate ad hoc mentoring among surgeons throughout the province.
51.8 Conclusion CPGs are designed to offer the reader a concise, systematic, and evidence-based review of the literature to help guide patient care with respect to complex or controversial management issues. The abundance of literature available on any one topic may prove overwhelming to the individual clinician; so an evidence-based guideline developed with rigorous methodology provides caregivers a succinct consolidation of the evidence in a format that can readily be applied in day-to-day patient care. However, for CPGs to have impact, they must be evidence based, seen to be developed without bias, and active implementation strategies must be employed. Further research is required to understand barriers and enablers, which impact on the adoption of guidelines so
S. Forbes et al.
that optimal strategies can be implemented in order to change physician behavior and improve patient care.
References 1. Field MJ, Lohr KN (eds); Committee to Advise the Public Health Service on Clinical Practice Guidelines, Institute of Medicine (1990) Clinical practice guidelines: directions of a new program. National Academy Press, Washington, DC 2. Wollersheim H, Burgers J, Grol R (2005) Clinical guidelines to improve patient care. Neth J Med 63:188–192 3. Jones J, Hunter D (1995) Qualitative research: consensus methods for medical and health services research. BMJ 311: 76–380 4. Vlayen J, Aertgeerts B, Hannes K et al (2005) A systematic review of appraisal tools for clinical practice guidelines: multiple similarities and one common deficit. Int J Qual Health Care 17:235–242 5. Howard RSA, Lapaucis A (1993) Guidelines workshop: initiating, conducting and maintaining guidelines development programs. Can Med Assoc J 48:507–511 6. Woolf SH, Battista RN, Anderson GM et al and the Canadian Task Force on the Periodic Health Examination (1990) Assessing the clinical effectiveness of preventive maneuvers: analytic principles and systematic methods in reviewing evidence and developing clinical practice recommendations. J Clin Epidemiol 43:891–905 7. Harris RP, Helfand M, Woolf SH et al for the Methods Work Group, Third US Preventive Services Task Force (2001) Current methods of the US Preventive Services Task Force: a review of the process. Am J Prev Med 20(3S):21–35 8. Barton MB, Miller T, Wolff T et al for the US Preventive Services Task Force (2007) How to read the new recommendation statement: methods update from the US Preventive Services Task Force. Ann Internal Med 147:123–127 9. Battista RN, Fletcher SW (1988) Making recommendations on preventive practices: methodological issues. In: Battista RN, Lawrence RS (eds) Implementing preventive services. Oxford University Press, New York 10. Wolff T, Guirguis-Blake J, Miller T et al (2007) Screening for carotid artery stenosis: an update of the evidence for the US Preventive Services Task Force. Ann Intern Med 147: 860–870 11. US Preventive Services Task Force (2007) Screening for carotid artery stenosis: US Preventive Services Task Force recommendation statement. Ann Intern Med 147:854–859 12. Tjandra JJ, Kilkenny JW, Buie DW et al (2005) Practice parameters for the management of rectal cancer. Dis Colon Rectum 48:411–423 13. United States National Library of Medicine, National Institute of Health (2008) Fact sheet: medline. Available at: http://www.nlm.nih.gov/pubs/factsheets/medline.html 14. Elsevier (2009) EMBASE. Available at: http://www.elsevier. com/wps/find/bibliographicdatabasedescription.cws_ home/523328/description 15. EBSCO Publishing (2009) CINAHL databases. Available at: http://www.ebscohost.com/cinahl/
51
Clinical Practice Guidelines in Surgery
16. The Cochrane Collaboration (2009) An introduction to the Cochrane reviews and The Cochrane Library. Available at: http://www.cochrane.org/reviews/clibintro.htm 17. Grimes DA, Schulz KF (2002) Bias and causal associations in observational research. Lancet 359:248–252 18. Thoma A, Farrokhyar F, Bhandari M et al for the EvidenceBased Surgery Working Group (2004) Users’ guide to the surgical literature: how to assess a randomized controlled trial in surgery. Can J Surg 47:200–208 19. Bhandari M, Devereaux PJ, Montori V et al for the EvidenceBased Surgery Working Group (2004) Users’ guide to the surgical literature: how to use a systematic literature review and meta analysis. Can J Surg 47:60–67 20. Bhandari M, Montori VM, Swiontkowski MF et al (2003) User’s guide to the surgical literature: how to use an article about a diagnostic test. J Bone Joint Surg Am 85-A: 1133–1140 21. Cook DJ, Guyatt GH, Laupacis A et al (1992) Rules of evidence and clinical recommendations on the use of antithrombotic agents. Chest 102:305S–311S 22. Sawaya GF, Guirguis-Blake J, Lefevre M et al for the US Preventive Services Task Force (2007) Update on the methods of the US Preventive Services Task Force: estimating certainty and magnitude of net benefit. Ann Intern Med 147:871–875 23. GRADE Working Group (2009) GRADE working group. Available at: http://www.gradeworkinggroup.org 24. Atkins D, Best D, Briss PA et al (2004) Grading quality of evidence and strength of recommendations. BMJ 328:1490 25. Schunemann HJ, Jaeschke R, Cook DJ et al on behalf of the ATS Documents Development and Implementation Committee (2006) An official ATS statement: grading the quality of evidence and strength of recommendations in ATS guidelines and recommendations. Am J Respir Crit Care Med 174:605–614 26. Canadian Institute of Health Research (2009) More about knowledge translation at CIHR. Available at: http://www. cihr-irsc.gc.ca/e/39033.html
645 27. Davis D, Evans M, Jadad A et al (2003) The case of knowledge translation: shortening the journey form evidence to effect. BMJ 327:33–35 28. Grimshaw JM, Thomas RE, MacLennan G et al (2004) Effectiveness and efficiency of guideline dissemination and implementation strategies. Health Technol Assess 8:1–352 29. Davis D, Goldman J, Palda VA (2007) Canadian Medical Association handbook on clinical practice guidelines. Canadian Medical Association, Ottawa 30. Altman DG, Schulz KF, Moher D et al (2001) The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Int Med 134:663–669 31. The AGREE Collaboration (2004) Appraisal of guidelines research and evaluation. Available at: http://www.agreecollaboration.org 32. Shaneyfelt TM, Mayo-Smith MF, Rothwangl J (1999) Are guidelines following guidelines? The methodological quality of clinical practice guidelines in the peer-reviewed medical literature. JAMA 281:1900–1905 33. Hayward RSA, Wilson MC, Tunis SR et al for the EvidenceBased Medicine Working Group (1995) Users’ guide to the medical literature. VIII. How to use clinical practice guidelines. A. Are the recommendations valid? JAMA 274: 570–574 34. Wilson MC, Hayward RS, Tunis SR et al (1995) Users’ guides to the medical literature. VIII. How to use clinical practice guidelines. B. What are the recommendations and will they help you in caring for your patients? The EvidenceBased Medicine Working Group. JAMA 274:1630–1632 35. Simunovic M, To T, Theriault M et al (1999) Relation between hospital surgical volume and outcome for pancreatic resection for neoplasm in a publicly funded health care system. CMAJ 160:643–648 36. Browman GP, Makarski J, Robinson P et al (2005) Practitioners as experts: the influence of practicing oncologists “inthe field” on evidence-based guideline development. J Clin Oncol 23:113–119
From Idea to Bedside: The Process of Surgical Invention and Innovation
52
James Wall, Geoffrey C. Gurtner, and Michael T. Longaker
Contents 52.1
Introduction ............................................................ 647
52.2
Determining the Value of an Innovation .............. 648
52.3
Intellectual Property .............................................. 650
52.4
Prototyping ............................................................. 651
52.5
Institutional Technology Transfer ........................ 652
52.5.1 FDA Regulation of Medical Devices ....................... 652 52.6
Cost Effectiveness ................................................... 654
52.7
Teaching and Mentoring Surgical Innovation ..... 654
52.8
Conflict of Interest.................................................. 655
References ........................................................................... 655
Abstract Contemporary surgery utilizes extensive equipment and devices to enable its performance. As specialties develop and new frontiers are crossed, technology needs to advance in a parallel fashion. A modern surgical device is the end point of a sophisticated, complicated, and potentially treacherous route, which incorporates new skills and knowledge acquisition. Processes including Technology transfer, Commercialization, Corporate and Product Development, Intellectual Property, and Regulatory Routes play pivotal roles in this voyage. Many good ideas may fall by the wayside for a multitude of reasons, including them not being protected by intellectual property, or being badly marketed or poorly adopted by clinicians. In this chapter, we attempt to illuminate the components required in the process of Surgical Innovation, which we believe must remain in the remit of the modern day Surgeon.
52.1 Introduction
J. Wall () The Department of Surgery, Stanford University School of Medicine, Stanford University, 257 Campus Drive, Stanford, CA 94305-5148, USA
In the book They Made America, author Harold Evans tells the story of the one hundred most influential innovators, including the likes of Thomas Edison and Henry Ford, in the history of the United States [1]. Evans observes that “a scientist seeks understanding and an inventor a solution … an innovator seeks the universal application of the solution by whatever means possible.” Advances in surgical technology have enabled increasingly complex therapies while limiting invasiveness, leading ultimately to improvements in patient care. Surgeons have been the cornerstone of scientific understanding and often the creators of surgical inventions. Very few, however, have taken the leap to surgical innovation by driving their solutions from idea to bedside [2]. This leap requires an element of individual
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_52, © Springer-Verlag Berlin Heidelberg 2010
647
648
risk beyond the comfort of many physicians and an element of entrepreneurship that has not generally been embraced by the surgical community [3]. The observant surgeon is in the ideal position to identify problems with current therapy and propose both incremental and revolutionary solutions. In fact, the majority of surgical devices are rooted in surgeons’ ideas, yet the process of developing these ideas into products is almost uniformly left to the medical device industry. Without a sound clinical guidance throughout the process of product development, a surgical invention cannot realize its full potential benefit [4]. Surgical innovation can be broadly divided into (1) the development of novel methods and (2) the development of novel devices. Admittedly, there is often overlap of method and device innovation as one may enable the other. Novel surgical methods are briefly discussed followed with a focus on the challenges of developing novel surgical devices. New surgical methods are often conceived by the surgeon to improve patient care by using existing devices in new ways. There is no question that pioneers in surgical methods such as Starzl, Shumway and DeBakey, to name a few, are true surgical innovators. Significant changes to a procedure that present new safety or efficacy issues are generally studied under the auspices of an Institutional Review Board (IRB) and funded through academic sources. Once evaluated, new surgical methods are generally disclosed through the academic outlets of publication and presentation. Adoption is driven by peer acceptance and the regulation of any applicable surgical specialty board. While the intellectual property laws of the United States allow novel surgical methods to be patented [5, 6], the surgical community has made a practice of sharing new methods for the benefit of patients and not recovering royalties for the practice of such operations. While both surgical techniques and devices require resources to be developed, devices face the additional challenge of continued costs of production, distribution, and improvement. In the United States, a device must generally have commercial viability in order to affect patient care. Quite simply, a surgical device must generate enough revenue to support its development and production in order to reach patients. With no sign of the government taking over the device industry in the near future, commercial viability will continue to be required of surgical devices. New surgical devices face
J. Wall et al.
a series of significant challenges in their development from idea to bedside that require careful attention to intellectual property, product development, FDA regulations, and market acceptance. The average time from idea to bedside for a revolutionary surgical device is 5–8 years (Thomas Fogarty and Rodney Perkins 2007, Personal Communications). This chapter describes the process of taking a surgical device from idea to bedside, and in doing so will hopefully inspire more surgeons to take an active role in innovation. The classic story of surgeon driven device innovation is that of Dr. Thomas Fogarty. As a scrub technician in high school, Fogarty witnessed patient suffering and loss of limbs from cumbersome surgery to remove blood clots. Later, as a medical student at the University of Cincinnati, he devised the solution of using a balloon on the tip of a catheter that could be inserted through a small access incision and passed through the artery beyond the blockage. Once past the blockage, the balloon could then be inflated and the clot dragged out of the artery. Despite criticism from his contemporary mentors, he patented the balloon catheter, built the device in his garage, and found surgeons willing to try the new technology. Dr. Fogarty defined the process of surgical innovation by observing a clinical need, devising a feasible concept, and working tirelessly to develop his solution for the benefit of his patients. The Fogarty Catheter revolutionized vascular surgery and opened the door for many modern minimally invasive techniques [7].
52.2 Determining the Value of an Innovation Determining the value of a surgical innovation can be challenging. Ultimately, the value of any medical technology is in the benefit it provides to patient lives. However, in order to get to the patient in the United States, a device’s commercial viability must be fully understood. The resources required for product development necessitates one to understand the development risk and potential commercial return in order to plan a realistic financial pathway to getting a product to the bedside. Consider the vastly different development pathways for a more ergonomic scalpel vs. the first abdominal aortic stent graft. The former has a quick development timeline at a low cost with the potential
52 From Idea to Bedside: The Process of Surgical Invention and Innovation
649
Table 52.1 Development factors for incremental and revolutionary surgical device innovations Development factors Incremental innovation Revolutionary innovation Patient Impact Market Size
Smaller impact and market is acceptable if less resources are required to develop the device
A large patient impact in a large market justifies the resources necessary to develop the device
Technological Feasibility
Generally relies on the merits of proven technology
Often high risk in developing an unproven technology
Intellectual Property Protection
A patent portfolio that allows practice of an invention (freedom to operate) and excludes others from the practice of the invention is critical to any commercially viable technology
FDA Regulatory Pathway
Generally follow a 510(k) pathway and receive quick approval based on predicate devices
Often face Premarket Approval that can take many years at a high cost
Reimbursement
Generally fall into existing reimbursement codes
Often require new CPT codes
Development Team
Frequently can be executed by a small group of focused individuals
Significant challenges of each aspect of revolutionary technology development require a large experienced team
for limited revenues, while the latter requires a large investment and years of work with the potential for significant revenues. Both devices hold merit, and in order to pursue their respective development, one must understand their relative risks and costs (Table 52.1). Small investments in product development, often termed “seed” financing, can come from a wide variety of sources, including Friends & Family, Angel Investors and device company grants, as well as the Small Business Innovation Research (SBIR) & Technology Transfer (STTR) grants provided by all major federal grant agencies, including the National Institutes of Health, National Science Foundation and Department of Defense. These investments are generally in the range of $50,000–$500,000. They are aimed at proving the concept behind a device over a short time period, generally 6–18 months. In the case of incremental device, this funding may be all that is needed to get to market, while in the case of revolutionary concepts, this funding may provide the opportunity to validate a concept and seek further investment. Private investors generally expect an equity stake in the company for early investments, while federal grants generally do not require such a concession. Device company grants often come with an agreement for that company to have first right of refusal to license and commercialize any technology developed under the grant. Large investments into product development generally come from either established device companies or venture capital. Established device companies generally
invest in ideas for internal development to strengthen a product offering or expand into new markets. Venture capitalists generally fund the creation of new companies by investing money on behalf of limited partners who have committed money to a venture fund. The role of the venture capitalist is to minimize risk and maximize return on behalf of their limited partners. In the venture capital world, sequential investments are made to grow companies from the ground up. Venture investments usually begin with a “Series A” investment that ranges from $500,000 to $10 million, funding companies to reach early development milestones such as proof of concept, FDA approval and first-in-man. “Series B” investments can range from $2 to $50 million and focus on advanced company development such as sales force creation, largescale manufacturing, and marketing campaigns. In return for such an investment, the venture capital group will receive a preferred equity share in the company, meaning they have the first right to dividends in the case of success and liquidated assets in the case of failure. The share is determined by the amount of the investment and the agreed upon value of the company prior to investment, the “pre-money valuation.” For example, a $1M investment into a company valued at $3M dollars would result in the venture capital group owing 25% of the company that would be valued at $4M after the investment, “post-money valuation.” Venture capitalists require a detailed analysis of the development pathway with a focus on reducing risk in critical areas, including technology, intellectual property, FDA regulatory pathway,
650
J. Wall et al.
Table 52.2 Key investor questions in determining the risk for a medical device opportunity Technology
Is the idea feasible? Has a working prototype been constructed? What level of proof of concept has been achieved, i.e., benchtop testing, animal testing, first in human?
Intellectual property
Has the core technology been protected by one or more patents? What level of patent protection exists, i.e., provisional patent, submitted utility patent, granted utility patent? How broad is the protection? Is there any risk of infringement on existing patents if the device is commercialized? Can any significant infringement be avoided by licensing?
FDA regulatory pathway
What level of evidence will be required to determine safety and efficacy? Do predicate devices exist allowing a 510(k) approval pathway or will this technology require Premarket Approval?
Reimbursement
How much will this device cost? Will it qualify for current reimbursement codes? Will existing reimbursement levels be adequate to make the device commercially successful? Will use of this device effect hospital revenues or physician revenues, and if so, how can these economic stake holders be properly incentivized?
Management
Does the management team have the necessary skills to develop this device? Have they been successful in past endeavors?
reimbursement, and management team (Table 52.2). The successful entrepreneur works to reduce these risks as quickly as possible for the least amount of money. Investors at all levels of funding require a clear and realistic development plan with measurable milestones such as securing intellectual property, building a working prototype, successful animal testing, FDA approval, and first-in-man use of a device. Each milestone creates value for the device and can generate further rounds of investment. The process of developing an idea from conception to investment can take months and even years of work with no guarantee of funding. However, the right idea in the right hands can both create commercial value and immensely impact patient’s lives.
Suggested Resources 1. GladstoneD, GladstoneL (2002).Venture capital handbook: an entrepreneur’s guide to raising venture capital revised. Prentice Hall, New Jersey
52.3 Intellectual Property Medical Devices are most often protected in the Unites States by utility patents granted by the U.S. Patent and Trademark Office (USPTO). The Utility Patent is the most common type of patent issued by the office and covers machines, articles of manufacturing, composition of matter and methods of making things. The patent grants the inventor the right to exclude others from making, using, offering for sale, selling, or importing
the invention in the United States. Once issued, a utility patent protects a device for a period of 20 years. The critical elements of a patent are: 1. Drawings A graphical representation that show all aspects of the invention. 2. Written Specifications A description of the invention such that someone skilled in the art can understand and replicate the invention. 3. Claims Specifically define the improvement of the invention. This is the key legal section of the document that is the basis for prosecuting infringement. Utility patents are reviewed by the USPTO to determine if the invention meets three criteria: 1. Novel – the invention cannot have been known or used by others or described in a printed publication. Inventors in the academic environment must be very careful to protect intellectual property before making public disclosures of their work. 2. Unobvious – The invention cannot be obvious to someone skilled in the relevant art. 3. Useful – the invention must fulfill a purpose and work. Patents in the United States have been historically issued to the “First-to-Invent,” whereas most of the rest of the world issues patents to the “First-to-File.” It has therefore been of utmost importance in the United States for inventors to document the invention process in proper laboratory notebook format with regular
52 From Idea to Bedside: The Process of Surgical Invention and Innovation
dating, signing and witnessing. Proof of the date of an invention through a well-maintained laboratory notebook can make the difference in who is awarded a patent. There is a growing support within the Congress to convert the patent system to “First-to-File” to align the U.S. patent system with the rest of the world [8]. Proponents argue this will simplify the system, reduce litigation costs and improve fairness. Most experts believe this reform will occur in the near future. Of particular note for the surgeon innovator are the rules of public disclosure. A U.S. patent cannot be obtained on any invention after 1 year from the time the invention has been revealed through printed publication or public use. In many foreign countries, there is not a 1-year grace period, and a patent will be refused if it was not filed prior to the date of public disclosure. Prior to contemplating any patent application to the USPTO, the surgeon innovator should begin by searching the patent literature for relevant patents and decide if their invention is novel and does not infringe on prior art. Many free resources for searching patents are available (www.freepatentsonline.com, www.google.com/ patents, http://www.uspto.gov/patft/index.html). A fundamental strategy is to find as similar a device as possible to one’s current invention by using a keyword search. Once a seemingly relevant patent is found, one should focus on the figure section of a patent, which often gives a good overview of an invention, and on the claims section, which is the legal foundation of a patent determining what is protected. Most patents reference prior art to support their claims. An in-depth patent search continues by reviewing the referenced prior art of a similar invention, allowing one to begin to understand the “patent landscape” in a given field. Relevant prior art will usually fall into a few classifications that can be further searched for pertinent prior patents. For a modest fee ($105–$210 in 2007), a provisional patent can be submitted to the USPTO to protect an invention in its early stages. A provisional patent documents the invention with a written description and any relevant drawings, without generally making claims. Additionally, the application must have a cover sheet that details the inventors and their contact information. Instructions for submitting a provisional application can be found at http://www.uspto.gov/web/ offices/pac/provapp.htm. The document is submitted to the USPTO that files the information but does not review it until a full utility patent application is initiated. The provisional patent sets a filing date with the USPTO and offers strong
651
proof of the timing of the invention. The provisional patent is valid for 1 year, giving the inventor time to perfect the invention and submit a utility patent application that references all or part of the provisional patent. The Patent Cooperation Treat (PCT) facilitates the filing of intellectual property internationally among member countries. A PCT filing should be strongly considered for any device that may have universal applications. The term “patent pending” may be used to refer to an invention that has either a provisional or utility patent application filled. Sooner or later, a surgeon inventor will reach a point beyond their level of understanding in the area of patent law, at which point legal counsel specializing in intellectual property is advisable. The experienced surgeon inventor often performs initial patent searches and filing of provisional patents, however filing of a utility patent application or PCT application is rare without counsel. If the surgeon inventor is working within an institution that holds rights to their intellectual property, the institution may provide necessary counsel through the technology licensing office.
Suggested Resources 1. United States Patent and Trademark Office Official Site. Available at: www.uspto.gov 2. Searchable database of all published US patents. Available at: www.freepatentsonline.com 3. Searchable database of all published US patents. Available at: www.google.com/patents 4. RockmanHB (2004) Intellectual property law for engineers and scientists. Wiley, New York 5. Poltorak AI, LernerPJ (2002) Essentials of intellectual property. Wiley, New York
52.4 Prototyping Prototyping is the ultimate method of experimentation in the field of medical device development that ideally leads to improvements in device function. As with a basic science experiment, a prototype should be carefully planned and aimed at answering a key question in the most efficient manner. Rarely is a fully functional device required in the early stages to answer important questions. Simple mock-ups using available tools and methods can generate significant advances in a device design. Thomas Fogarty proudly created his first
652
J. Wall et al.
balloons with a variety of available rubbers secured to a catheter with his personal fly-fishing equipment. By doing so, he quickly and inexpensively learned the ideal material for his balloon. In testing the device, the surgeon innovator attempts to learn as much as possible from inanimate models first, improving the device incrementally for the greatest chance of success when moving on to testing in animals and ultimately humans. There is a point at which the physician innovator will not have the engineering expertise to create advanced prototypes. At this point, there are many resources available though groups that specialize in everything from one-time prototypes to high-volume, medicalgrade device manufacturing.
Suggested Resources 1. Kelley T (2001) The art of innovation. Doubleplay, New York 2. http://www.devicelink.com 3. http://www.medicaldevices.org/public
programs [9]. Since 1980, universities have expanded their intellectual property claims through employee contracts to generally cover any invention created with the use of university resources. In order to manage their patent portfolios, most universities have created technology licensing offices (TLO). Each surgeon innovator should understand the specific intellectual property policies of their institution, university or otherwise. The process of invention within an institution begins with disclosure of the technology to the TLO. They proceed with an evaluation of the technology and choose to retain the rights to the invention or return the rights to the inventor (the latter being a rare occurrence). Any invention retained by the institution is marketed to industry to determine value and potential developers. The inventor is generally given a fair chance at licensing the technology if they choose to proceed with its development. The terms of a license are variable among different institutions and different technologies, however several basic terms are listed in Table 52.3. The Bayh-Dole Act requires universities to give preference to small business firms and to share a portion of revenues with the inventor.
52.5 Institutional Technology Transfer 52.5.1 FDA Regulation of Medical Devices The Bayh-Dole Act of 1980 gave universities, nonprofits and small businesses the power to retain inventions developed under federally funded research
The Center for Devices and Radiological Health (CDRH) is the branch of the FDA responsible for the premarket
Table 52.3 Basic terms of a technology license agreement Exclusivity
The licensee can be granted either exclusive or nonexclusive use of the patent. Exclusivity guarantees no other group may license the technology for the term of the license
Term
The length of a license
Field of use
The licensee may be granted use of the patent is specific fields, i.e., medical, veterinary, industrial, etc. This is particularly important for core technologies that may have vast applications
Licensed territory
The licensee can be granted use in specific regions, often in the United States or worldwide
Issue fee
The fee paid to the licensing body for issuance of the license
Equity
If licensee is a company, equity may be granted as part of the payment for the license
Annual royalty payment An annual fee for use of the license Earned royalty
A percentage of the revenues from products that rely on the licensed patent paid to the licensing body. This is usually set as a percentage of net sales
Milestones/milestone payments
Milestones for development and sales of an invention are often set in order to make sure the licensee is moving the technology forward. Payments to the licensing body may be required for reaching specific milestones
Sublicensing fees
A fee paid to the licensing body, if the licensee sublicenses the technology to another group
Assignment fee
A fee paid to the licensing body if the license is sold and reassigned to another group
52 From Idea to Bedside: The Process of Surgical Invention and Innovation
approval of all medical devices, as well as overseeing the manufacturing, performance and safety of these devices. There are two basic considerations in determining how to approach FDA device approval [10]. 1. Device Classification: The Medical Device Amendments of 1976 to the Federal Food, Drug, and Cosmetic Act established three regulatory classes for medical devices – Class I, Class II, or Class III. Classification is determined by the degree of risk associated with each medical device and the extent of control needed to ensure safety and effectiveness. 2. Approval pathway: The FDA has three pathways leading to a commercially marketable medical device – Exemption, 510(k) premarket notification, and Premarket approval application (PMA). Approval pathway is determined by the degree of risk associated with each medical device and the extent to which a medical device poses new safety and efficacy concerns. Class I devices are those for which safety and effectiveness can be assured by adherence to general controls, which include compliance with the applicable portions of the FDA’s Quality System Regulation, or QSR, facility registration and product listing, reporting of adverse medical events, and appropriate, truthful and non-misleading labeling, advertising and promotional materials. Class II devices are those which are subject to the general controls mentioned above, as well as certain performance standards or other special controls, as specified by the FDA. Class III devices have a new intended use, or use advanced technology that is not substantially equivalent to a predicate device. The safety and effectiveness of Class III devices cannot be assured solely by the General Controls and performance standards for Class I & II devices. Therefore, extensive premarket clinical studies are almost always required to demonstrate safety and efficacy of a Class III device. The FDA has approximately 800 generic types of Class I devices [11] (i.e., sunglasses) and approximately 60 generic types of Class II devices [12] (i.e., wheeled stretchers) that are exempt from premarket notification. These devices may be commercially marketed without formal FDA review as long as they meet the requirements of general controls for Class I devices and General Controls plus performance standards for Class II devices. For Class I & II devices that are not exempt, premarket review and clearance by the FDA is accomplished
653
through the 510(k) premarket notification process. A 510(k) requires a manufacturer must/to essentially submit to the FDA a premarket notification, demonstrating that the device is substantially equivalent in intended use and technology to a predicate device that is either 1. A device that has grandfather marketing status because it was legally marketed prior to the Medical Device Amendments of 1976 2. A Class I or II device with existing 510(k) approval. If the FDA agrees that the device is substantially equivalent to a predicate device, it will grant clearance to commercially market the device. The FDA has 90 days to respond to a 510(k) submission. Often, the process can take longer than 90 days as the FDA may require further information, including clinical data, to make a determination regarding substantial equivalence. If the FDA determines that the device, or its intended use, is not substantially equivalent, the FDA will classify the device as Class III, requiring significant premarket scrutiny through a PMA. After a device receives 510(k) clearance, any modification that could significantly affect its safety or effectiveness, or that would constitute a major change in its intended use, requires a new 510(k) clearance or could even require a PMA. Device companies are grated the freedom to determine what constitutes a significant modification or change in intended use, however, if the FDA disagrees with a manufacturer’s decision not to seek a new 510(k) clearance, the agency may retroactively require the manufacturer to seek 510(k) clearance or PMA approval. The FDA also can require the manufacturer to cease marketing and/or recall the modified device until 510(k) clearance or PMA approval is obtained. Class III products almost universally require a PMA approval from the FDA. The PMA process is significantly more demanding than the 510(k) premarket notification process and requires proof of the safety and effectiveness of the device. A PMA application is supported by extensive data, including preclinical studies and human clinical trials. Additionally, a PMA contains a full description of the device and its components, a full description of the methods, facilities, and controls used for manufacturing, as well as proposed labeling. The FDA performs a substantive review once a PMA is filled and has 180 days to complete the process. As with most FDA reviews, a PMA application often requires a significantly longer period of time. In approving a PMA application (and occasionally in
654
clearing a 510(k) application), the FDA can require postmarket surveillance, whereby the manufacturer follows certain patient groups for a number of years and makes periodic reports to the FDA on the clinical status of those patients when necessary to protect the public health or to provide additional safety and effectiveness data for the device. Any modification to a PMA device, including its labeling or manufacturing process, requires a PMA supplement and potentially a new PMA application. When FDA approval of any class device requires human clinical trials, a determination must be made as to the device posing a significant risk. Both IRBs and the FDA can make this determination. One may initially approach an IRB for determination of risk, and if determined nonsignificant, FDA notification is not required. Alternatively, one can submit a request for determination of significant risk to the FDA, the result of which is binding. If the device is considered a nonsignificant risk, an IRB can oversee human clinical trials. If the device is considered a significant risk, a device sponsor is required to attain an FDA investigational device exemption (IDE) approval prior to commencing the human clinical trial [13].
Suggested Resources 1. http://www.fda.gov/cdrh/index.html 2. Kucklick TR (2006) The medical device R&D handbook. Taylor & Francis Group, Boca Raton, FL 3. Trautam KA (2008) The FDA and worldwide quality system requirements guidebook for medical devices. Quality Press, Milwaukee, Wisconsin
52.6 Cost Effectiveness The rising cost of medical technology has been dubbed “the medical arms race.” There is an increasing concern that the benefit of many new surgical devices does not out-weight their rapidly escalating costs [14]. While often true, technology can have the less cited effect of decreasing cost by decreasing the need for hospitalization. The limited resources of the current healthcare system will increasingly rely on the tools of outcomes research and cost-effectiveness analysis to determine the adoption of emerging technologies. The surgeon innovator is charged with considering
J. Wall et al.
cost during device development in order to enable maximum utility of their device in both the context of the U.S. market and global health, where resources are further limited.
Suggested Resources 1. Foote SB (1992) Managing the medical arms race: innovation and public policy in the medical device industry. University of California Press, Berkeley
52.7 Teaching and Mentoring Surgical Innovation Modern surgical training follows the Halsteadian educational tradition in which physicians are taught through a combination of didactic sessions and graded responsibility in the operating room. Academic surgery has focused much effort on the pursuit of basic science research. While surgeons such as Thomas Fogarty have made enormous contributions to surgical technologies, the processes of surgical technology development has been left to individuals and not embraced by the majority of surgical institutions. Like any other field, a combination of education and mentoring is critical to producing a new generation of surgical innovators that will contribute to shaping the future of surgery. There have been calls from within surgery to increase clinician involvement in innovation [15]. Innovation can be treated as a discipline when one considers that it has very little to do with a “eureka” moment of inspiration and relies heavily on a structured approach to understanding needs and working through solutions [16]. The development of modern surgical devices is increasingly complex and requires a multidisciplinary approach that is far beyond Fogarty’s construction of the balloon catheter in his garage. The future of surgical device includes not only smaller and more complex electromechanical devices, but also drug-device and cellular therapy-device combinations. A handful of leading institutions are beginning to formalize surgical innovation through a team-based approach [17]. Innovation teams are drawn from individuals with expertise in engineering, medicine and business. The teams are educated on the process of
52 From Idea to Bedside: The Process of Surgical Invention and Innovation
identifying and solving important clinical needs and taking the solutions from idea to bedside. Throughout the process, the teams receive guidance from mentors in the industry who have successfully developed range of surgical devices and other medical technologies.
655
Table 52.4 Francis D. Moore’s ethical criteria for innovative surgical care (Adopted with permission from [2]) Solid scientific background Skill and experience of the team Ethical climate of the institution Open display
52.8 Conflict of Interest Innovative surgical devices ultimately hold the promise of improved patient care, but also hold the promise of financial gains for the inventors and developers. Therefore, conflict of interest must be addressed and managed by all parties involved in surgical innovation. AdvaMed (Advanced Medical Technology Association) is a voluntary trade association that represents the majority of medical device companies. They have developed a code of ethics [18] that clearly defines limits in the general sales, marketing and educational activities of medical device companies. Overall, medical device companies must be held to the highest ethical standards in the business world as their products can literally determine the difference between life and death. The FDA, while instrumental in approving surgical devices, has little ability to track performance when on the market. A combined effort between the FDA and the industry should be taken forward to effectively monitor device performance [19, 20]. For the surgeon innovator, simply avoiding conflict of interest is not possible in an industry where devices must be designed in conjunction with physician operator whose technical skill and techniques shape product development. Conflict of interest is particularly difficult for those with financial, professional and personal interest in the success of a device. The AdvaMed code falls short of addressing key issues such as disclosure of legacy relationships with medical technology companies and the separation of design and implementation of clinical trails from major stake holders [21]. It is thus incumbent on the surgeon innovator to advocate for technology they believe in, but separate themselves from any activity in which financial gain may bias objective analysis and provide colleagues with full disclosure of industry relationships. Francis Moore led the debate on surgical ethics in this century through his struggle with the ethics of transplantation [22]. The result of that debate resulted in principles that hold true in surgical device innovation (Table 52.4) [23, 24].
Public evaluation Public and professional discussion
Suggested Resources 1. http://www.advamed.org
References 1. Evans H (2004) They made America: two centuries of innovators from the steam engine to the search engine. Little Brown, London 2. Riskin DJ, Longaker MT, Gertner M et al (2006) Innovation in surgery: a historical perspective. Ann Surg 244:686–693 3. Preston H (2007) Spotlight: A bleak diagnosis for medical innovation. U.S. seen failing at fostering doctor-inventors. International Herald Tribune 4. Guezuraga RM, Steinbring DY (2004) View from industry. Eur J Cardiothorac Surg 26(Suppl 1):S19–S23; discussion S23–S26 5. Kesselheim AS, Mello MM (2006) Medical-process patents–monopolizing the delivery of health care. N Engl J Med 355:2036–2041 6. Klein RD (2007) Medical-process patents. N Engl J Med 356:753–754 7. White T (2006) What's the master medical device maker's secret?. Stanford Medical Magazine Volume 23, Number 3 8. Ambrogi R (2007) Guide to current patent reform legislation. Experts and the Law, Bullseye 9. Dea M (2004) Ivory tower and industrial innovation: University-Industry technology transfer before and after the Bayh-Dole act. Stanford University Press, Stanford 10. Nau JY (2007) [A great humanitarian and surgeon: Ambroise Pare]. Rev Med Suisse 3:2923 11. Jones RS, Debas HT (2004) Research: a vital component of optimal patient care in the United States. Ann Surg 240:573–577 12. Brennan PA, McCaul JA (2007) The future of academic surgery–a consensus conference held at the Royal College of Surgeons of England, 2 September 2005. Br J Oral Maxillofac Surg 45:488–489 13. Last JM (1995) A dictionary of epidemiology, 3rd edn. Oxford University Press, New York 14. Bartlett S (1992) Managing the medical arms race: innovation and public policy in the medical device industry. University of California Press, Berkeley, CA 15. Cosgrove DM (2001) Developing new technology. J Thorac Cardiovasc Surg 121:S29–S31
656 16. Curtis C, William W (2006) Innovation: The Five Disciplines for Creating What Customers Want. Crown Business, New York 17. Krummel TM, Gertner M, Makower J et al (2006) Inventing our future: training the next generation of surgeon innovators. Semin Pediatr Surg 15:309–318 18. AdvaMed Code of Ethics (2007) Available at: http://wwwadvamedorg/MemberPortal/About/code/ 19. Kereiakes DJ, Willerson JT (2004) Medical technology development and approval: the future is now. Circulation 109:3078–3080 20. Mehran R, Leon MB, Feigal DA et al (2004) Post-market approval surveillance: a call for a more integrated and comprehensive approach. Circulation 109:3073–3077
J. Wall et al. 21. LaViolette PA (2007) Medical devices and conflict of interest: unique issues and an industry code to address them. Cleve Clin J Med 74(Suppl 2):S26–S28; discussion S32–S37 22. Moore FD (1970) Therapeutic innovation: ethical boundaries in the initial clinical trials of new drugs and surgical procedures. CA Cancer J Clin 20:212–227 23. Moore FD (1988) Three ethical revolutions: ancient assumptions remodeled under pressure of transplantation. Transplant Proc 20:1061–1067 24. Moore FD (1989) The desperate case: CARE (costs, applicability, research, ethics). JAMA 261:1483–1484
Research Governance and Research Funding in the USA: What the Academic Surgeon Needs to Know
53
Michael W. Mulholland and James A. Bell
Contents 53.1 Research as a Core Mission...................................... 657 53.2 A Culture of Investigation ........................................ 658 53.3 Creating Investigative Diversity............................... 658 53.3.1 Investigators ............................................................. 658 53.3.2 Investigative Topics .................................................. 661 53.3.3 Investigative Facilities .............................................. 662 53.4 Applying for a Grant ................................................ 663 53.5 Administrative Research Infrastructure................. 665 53.6 Running an Academic Department as a Business Activity ............................................. 665 53.7 Research Bridging Funds Policy .............................. 667 References ........................................................................... 667
Abstract Along with education and surgical care, research informs every aspect of an academic department of surgery, from faculty composition, to trainee selection, to financial outcome. In pursuit of their goals, academic departments must create a culture of investigation. Departmental leadership should seek to create scientific diversity in the form of investigators, research topics, and physical facilities. Modern biomedical research, clinical trials, and health service investigation cannot be sustained by clinically derived funds or solely on the basis of philanthropy. Instead, high-quality research must be sustained long term by extramurally derived scientific grants. Investigators should be provided with professionally staffed administrative assistance. These functions can be usefully divided into pre-award and post-award services. Because investigative activity is one of the core values of an academic department of surgery, it must be planned for and financed with the same sense of detail as clinical activity and education. We also consider that knowledge of relevant policies is very important for academic surgeons, and for this reason, the basic elements of Research Bridging Funds Policy is presented in this chapter.
53.1 Research as a Core Mission
M. W. Mulholland () University of Michigan Health Systems, 2101 Taubman Center/SPC 5346, 1500 East Medical Center Drive, Ann Arbor, MI 48109, USA e-mail: [email protected]
Research is a core mission of academic departments of surgery. Along with education and surgical care, research informs every aspect of an academic department of surgery, from faculty composition, to trainee selection, to financial outcome. This core mission derives from an obligation to surgical patients. Academic departments of surgery have an obligation to current patients to provide the most effective and contemporary surgical care,
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_53, © Springer-Verlag Berlin Heidelberg 2010
657
658
treatments that are based on scientific investigation and evidence. An obligation to future patients also exists. Future patients should be provided the possibility of novel treatments for unsolved surgical and biological problems, and effective novel therapies are the product of scientific investigation. In a broader sense, academic departments of surgery have a societal obligation. This obligation originates from the financial and material support that society provides to academic medical centers, and from an ethical commitment to optimize health care of the populations the centers serve.
53.2 A Culture of Investigation In pursuit of these goals, academic departments must create a culture of investigation. This culture is a reflection of institutional values such as the development of new knowledge, scholarship, and critical thinking. Departmental leadership, in the person of the departmental chair, should seek to build consensus that research is a central feature of departmental life. This belief is reflected in faculty composition as a consequence of recruitment, development, and retention. Development of a culture of investigation is a very long-term process, requiring years. Conversely, this consensus, reflected in faculty composition is also enduring, lasting years beyond any individual’s leadership tenure. Departmental leadership must also exhibit personal respect for, and accomplishment in scientific investigation. The ability to think critically, engage in the development of new knowledge, and contribute to the scientific literature should be considered essential criteria for the selection of surgical leadership. Withsin an academic surgical department, administrative roles for talented scientists, representing every aspect of modern surgical investigation, should be provided. The wisdom, perspective, and practical advice of senior scientists will energize the departmental scientific mission. Investigative accomplishments must be celebrated periodically. Regularly scheduled scientific and investigative conferences are an important part of academic surgical life, but are not sufficient. On a regular basis, academic departments of surgery should pause from the hectic pace of clinical life to focus upon investigative accomplishments and to congratulate successful investigators. These interludes are particularly important for young investigators so that they may receive the
M. W. Mulholland and J. A. Bell Table 53.1 Departmental named lectures 2008 Lecture Topic William J Mayo
Live donor liver transplantation
William Coon
Frontiers in cancer science
Rosenberg Foundation
Overcoming alloantibody in sensitized renal allograft recipients
Reed O. Dingman
Current strategies in free autogenous breast reconstruction
Milton Bryant
Vascular surgery, completion of the circle
Cameron Haight
The current status of lung transplantation
Conrad Jobst
Advances in the diagnosis of clinically suspected pulmonary embolism: an evidence-based approach
Moses Gunn
Tails from the deep: learning about pancreatic cancer from mouse and zebrafish embryos
Table 53.2 Endowed professorship lectures 2008 Professorship Topic Maude T Lane Professorship
Proteomics
John A. and Karla S. Klein Professorship in Thoracic Surgery
Molecular alterations in esophageal cancer
Lazar J. Greenfield Professorship in Surgery
Tracking pancreatic cancer:a multidisciplinary perspective
DaNancrede Professorship
Cancer immunology
acknowledgment of their peers and the feedback and admiration of their scientific co-travelers. Accomplishments of senior investigators should also be acknowledged in the form of named lectures and scientific addresses (Tables 53.1 and 53.2).
53.3 Creating Investigative Diversity 53.3.1 Investigators Departmental leadership should seek to create scientific diversity in the form of investigators, research topics, and physical facilities. A successful academic department of surgery should contain a variety of investigators including clinician
53 Research Governance and Research Funding in the USA: What the Academic Surgeon Needs to Know
scientists, clinical trialists, basic scientists, and clinician innovators. In each of these categories, opportunities should also be provided for the like-minded trainees. Clinician scientists are defined here as clinically active surgeons who are also engaged in basic biomedical research, usually at a fundamental molecular or cellular level. These investigators represent the leading edge of biomedical progress. They seek to examine the root causes of diseases that are relevant to surgical practice. To paraphrase Louis Pasteur, research success comes to the prepared mind. Just as surgical practice requires detailed and rigorous training, basic investigation likewise requires formal preparation. While there are many options available, two paradigms are most common in the United States. In the first scenario, scientific training is obtained in the course of an MD-PhD program in medical school. In the second pathway, formal scientific preparation is obtained during a surgical residency or fellowship, with time specifically set aside from clinical duties for this purpose. In either instance, the selection of a scientific mentor is crucial. Talented mentors possess a rare combination of gifts. In addition to scientific accomplishment, mentors must have time to devote to another person’s development, and have personal traits such as sharing, intellectual engagement, superb communication skills, and a willingness to have the mentor-trainee relationship that evolve over time. Didactic course work in fundamental scientific topics, exposure to a large number of accomplished investigators, the completion of a novel scientific project, and formal oral and written presentations are all components of this training. An advanced graduate degree is a sign of accomplishment, but usually not a primary goal. The first few years of faculty appointment are critical. Most surgical clinician scientists hired at the completion of clinical training are expected to have clinical obligations upon joining the faculty. In contrast, very few young surgeons are ready to become scientifically independent upon joining the faculty. Instead, a relationship with an accomplished, more senior scientist, either within the surgical department or based within another department, is very beneficial. The relationship should be structured so that mentorship becomes collaboration as the young clinician investigator matures scientifically. Clinical trialists are defined as surgeons seeking to examine the translational application and dissemination
659
of novel therapies using human subjects [1]. Within the past two decades, the practice of surgery has been transformed with the application of evidence-based practices. In this context, clinical studies have gained importance, with randomized clinical trials as a prime example. Formal preparation is important. This preparation should explicitly include the topic of use of human subjects. A number of public resources are available. Sponsoring organizations include the National Cancer Institute (http://www.cancer.gov.clinicaltrials), the Veterans Administration (http://vaww.ess.aac.gov), and the American College of Surgeons (https://www.facs.org). Demonstration of knowledge in this area is now required on a yearly basis for all investigators engaged in human research. A number of courses have been developed to provide structured training to clinical trialists. Many academic medical centers are associated with a School of Public Health or other similar institution. The resources available through these schools can be invaluable, particularly course offerings in epidemiology, public health administration and clinical trial design. In this regard, a unique program exists at the University of Michigan Medical School and its School of Public Health. This program, called On the Job/On Campus Program, allows the completion of a Masters Degree while the surgeon continues to work. The offering covers 2 years with a series of 4-day, Thursday through Sunday, sessions each month. The intervals are filled with course work, including statistics, survey design, study design, protection of human subjects, and ethical conduct of research (Table 53.3). Faculty involved in clinical trials should think nationally but act locally. Most trials require a large number of patients, more than would be available at any single institution. The national design of such trials is meant to maximize patient enrollment and thus to optimize statistical validity. Every national trial, however, requires local organization, provision of facilities for patient investigation and care, and a local champion. In addition, most clinical trials require a long time horizon. As one example, a recently completed study sponsored by the Veterans Affairs on hernia repair required 7 years to plan and execute and an addition of 2 years for the data to be analyzed and then published [2]. Basic scientists are defined as those individuals with advanced degrees who are not in the practice of clinical surgery. The number of basic scientists in academic
660 Table 53.3 School of public health – executive master’s program (MPH) Course titles 1 Introductory seminar Biostatistics Microeconomics Health services system 1 Epidemiology Current issues in public health Principles of health behavior Spreadsheet modeling Managerial accounting Health economics Health law Health services system 2 Corporate finance Operations research Case studies Politics of health policy Understanding organizations Competitive strategy and marketing Environmental health sciences Course titles 2- clinical research design and statistical analysis Introductory seminar Computer introduction Clinical trials and study design Methods of epidemiology Biostatistics Computer packages Legal rules and ethical issues Cost utility and decision analysis Statistical methods for epidemiology Longitudinal models and repeated measures Planning and funding of clinical research
departments of surgery has greatly expanded in the past decade. On a national basis, there are now more basic scientists in clinical departments within American medical schools than within basic science departments [3]. The reasons for this growth are multifactoral but reflect a
M. W. Mulholland and J. A. Bell
consistent effort of many academic departments of surgery to expand their investigative horizons and to attract talented scientists whose interests are relevant to unsolved clinical problems. Development of a large cadre of basic scientists is not appropriate for all academic departments of surgery. Successful application of this strategy requires a sophisticated culture of investigation, an expectation that basic scientists will neither displace nor act as surrogates for clinician scientists, and a very substantial financial commitment. If these conditions are met, this strategy has a number of very substantial advantages. Basic scientists bring different scientific perspectives and expertise to vexing clinical problems. A recent study has shown that basic scientists are more likely to be engaged in the successful translation of laboratory findings into clinical practice when employed in this model [1]. Successful basic scientists provide additional opportunities for extramural research funding. There are also potential disadvantages to this model. A significant drawback is the potential for basic scientists to become isolated from other like-minded scientists in basic departments. To avoid this dilemma, dual appointments, with faculty rights and obligations deriving both from the department of surgery and from basic departments, can be helpful. This arrangement also provides the basic scientist with access to graduate students that originate in basic departments, and permits additional opportunities for discharge of teaching obligations. If basic scientists are employed in academic departments of surgery, an explicit effort must be made to have them become part of the academic administration. An important new form of academic research is expressed by clinician innovators. Surgeons are in a unique position to make technical innovations, as they encounter the limitations of current technology in the care of surgical patients on a daily basis. As surgical procedures become increasingly non invasive, as care is extended progressively to the extremes of age, and as more complex comorbidities are encountered in surgical patients, numerous opportunities for innovation exist. Surgical innovation can serve an academic purpose if results are published and rigorously critiqued. As a secondary benefit, commercial application of successful ideas can provide another source of revenue to support academic departments. Recent evidence is emerging that innovation can be taught. In a University setting, the development of a formal program in clinical innovation can help to cross a number of institutional barriers [4]. Recent efforts at the
53 Research Governance and Research Funding in the USA: What the Academic Surgeon Needs to Know
University of Michigan, originating in the Department of Surgery, have engaged partners in the Medical School, the School of Engineering, the School of Public Health, the Dental School, and the Business School. These efforts culminated in the formation of the Michigan Innovation Center (http://www.med.umich.edu/ummic). This center includes faculty from all of these schools on the Ann Arbor campus. This faculty interacts on a series of cross-disciplinary projects. Novel ideas are supported financially via innovation grants. An innovation fellowship program has been established to train the inventors of tomorrow. Through this process, unexpected partnerships can emerge, for example, a joint effort between the Department of Surgery and the School of Art and Design. Surgical innovation can be leveraged by seeking extramural partnerships with individuals familiar with entrepreneurial enterprises, product design, and commercialization. Throughout all of these efforts, inclusion of trainees is crucial. Surgical trainees represent the next generation of practitioners. Academic departments of surgery have a clear obligation to provide surgical education and clinical training to the next generation of surgeons. A select number of academic surgical departments should also be engaged in the development of future clinician scientists, trialists, and clinician innovators. Success in this effort requires a long-term commitment on the part of the faculty to this phase of training, substantial institutional investment in infrastructure, and an ongoing financial commitment to provide trainees with time free of clinical obligations to pursue scholarly interests. The commitment to trainees can be highly beneficial to academic departments of surgery. These young individuals are the source of new ideas, incredible energy, and an unbeatable work ethic. They act as a rejuvenating factor for more senior clinicians and scientists.
53.3.2 Investigative Topics Academic departments of surgery should also seek to develop a diversity of investigative topics. In the most fully developed form, a research portfolio should include topics relevant to basic biomedical research, translational science, clinical trials, health service research, and innovation in medical devices. Basic biomedical research has traditionally been a major focus of academic departments of surgery in the
661
United States. Intellectual discipline is required on the part of surgical leadership in striking an appropriate topical balance. Given the constraints of time, facilities, and financial resources, not every potential investigative topic can be supported to a similar degree. A successful strategy requires conscious effort to identify clinical strengths and unsolved clinical problems and to match investigative programs to these opportunities. Contemporary science requires teamwork. The composition of the investigative teams is a crucial ingredient to success. The teams should be consciously structured to include individuals with different investigative talents, without regard for the department of origin. Team leadership is important but team structure should also be flexible enough to permit the inclusion of investigators from different areas of science as needed. Modern biomedical investigation is very expensive. The financial investment can be mitigated by development of shared facilities and service cores. These service cores should be centrally located, appropriately staffed, and funded through a shared financial model that permits recovery of operating costs and depreciation of expensive equipment. Academic departments of surgery should seek partnerships with basic disciplines to develop these intellectual and physical facilities. In the United States, the National Institutes of Health has emphasized very strongly the need to translate basic findings into novel patient care. This expectation was first articulated in May, 2002 by NIH Director, Elias A. Zerhouni. The strong rationale for this strategy and the supporting documentation, termed the NIH Roadmap, has been periodically updated and may be viewed online (http://nihroadmap.nih.gov). Support of translational research requires a very broad infrastructure including biomedical informatics, statistical support, ethical oversight, regulatory support, community engagement, educational support, and financial resources. A number of other topics are often included in the support for translational research. Examples are facilities for pediatricfocused research and infrastructure to examine aspects of health care disparities. In the United States, the emphasis upon translational research is exemplified by the development of clinical translational science awards (CTSA), from The National Institutes of Health. The CTSA initiative derived from the NIH decision to re-engineer the national clinical research enterprise, expressed as one of the major objectives of the NIH Roadmap for Medical Research. Financial support for the CTSA initiative has come from redirecting preexisting clinical and translational
662
programs and from Roadmap funds. The total 5-year funding for new CTSA awards is anticipated to be approximately $577 million [5]. At the University of Michigan, these efforts have been incorporated within the Michigan Institute for Clinical and Health Research (http://www.med.umich. edu/cacr/admin.htm), which brings together resources from across the campus, not just those of the medical school. The foundation for this institute is the Center for the Advancement of Clinical Research. This institute provides investigators with service in the area of research development, clinical research informatics, education data management, and project oversight. Translational science represents a unique opportunity for academic departments of surgery to develop partnerships with basic science departments. By engaging talent in basic science departments, a powerful two-way flow of ideas can emerge. This process can greatly accelerate the rate of investigation and provide the basis of novel ideas and thus new treatments. An academic department of surgery research portfolio should include a number of clinical trials. An essential component of such an effort is the Institutional Review Board (IRB). Detailed discussion of IRB form and function are provided in other chapters in this book. Clinical trials can include the entire spectrum of medical innovation. The first steps in testing of new therapeutics are termed Phase I trials. Facilities for conducting Phase I trails are usually more expensive to create and maintain than can be justified by any single department. Intuitional partnerships are crucial in this regard. Clinical trails also include, at the other end of the spectrum, large multi-institutional studies of clinical efficacy. The VA Hernia Repair study, mentioned previously, is an example of such a large trial. An emerging area of intense interest is health services research. Health services research seeks to examine processes of surgical care, structure of care, and surgical outcomes. Examples of structure of care include volume of procedures preformed at an institution, staffing models of intensive care units, and the development of centers of excellence. Examples of processes of care include measures to prevent venous thromboembolism through appropriate anticoagulant prophylaxis and administration of appropriately timed antibiotics for patients undergoing operative procedures. The sum of the structure and process of care is reflected in surgical outcomes. Timehonored examples of surgical outcomes are procedurerelated morbidity and mortality.
M. W. Mulholland and J. A. Bell
Health services research requires access to large patient databases, sophisticated analytical and quantitative skills, and the engagement of formally trained individuals in the topical areas of epidemiology, economics, and statistical analysis. Health services research is evolving in surgical departments from a descriptive science to a prescriptive exercise. This evolution is occurring with the advent of scientifically driven health policy, which then leads to investigation of the effects of changes in health practices. The Michigan Surgical Collaborative for Outcomes Research and Evaluation (M-SCORE) is an example of a multidisciplinary program that focuses upon this area of investigation (http://www.med.umich.edu/ mscore). M-SCORE serves as a resource center for research based on analysis of large administrative databases. The data center houses staff programmer/ analysts and computer systems for managing large volumes of administrative data with appropriate data security. Ongoing projects explore variation in cancer surgery outcomes, “integrative” measures for assessing surgical performance, utilization and outcomes in bariatric surgery, and others. In exploring these areas, M-SCORE researchers use data from the national Medicare database, linked SEER-Medicare files, the national Healthcare Utilization Project and administrative datasets from the state of Michigan. M-SCORE also serves as a resource center for three Blue Cross and Blue Shield of Michigan-funded clinical outcomes registry based surgical quality improvement initiatives. The projects are designed to foster regional collaboration between hospitals and surgeons; to identify variations in both practice and outcomes and thus opportunities for improvement; and to implement improvement activities and evaluate their effectiveness. These projects share many important respects, including a goal of building on existing regional and/or national efforts.
53.3.3 Investigative Facilities Investigation in all of its forms requires physical infrastructure. The assignment of research space on a departmental basis has both strong advantages and disadvantages. Departmental research space has an advantage of control of space assignment in proportion to scientific needs and levels of productivity. If space assignments are appropriately flexible, subject to changes in scientific productivity
53 Research Governance and Research Funding in the USA: What the Academic Surgeon Needs to Know
and financial support, then both scientific and financial accountability can be achieved. A strong disadvantage of departmental assignment of research space relates to the inability to co-localize talented investigators from different disciplines. Physical adjacency is an important element of scientific productivity that is not recognized through departmental space assignments. A second general approach is to assign research space on a thematic basis. This is done most commonly through disease or topic orientation. Investigators interested in cardiovascular disease, immunology, or neurosciences can be localized without regard to clinical departmental assignment. This physical arrangement greatly advantages scientific interchange. An open lab concept can be an element of thematic space assignment. In open biomedical laboratories, research benches are not separated by walls. This permits flexibility of research assignment and expansion and contraction of research programs without impediments of physical building design. A disadvantage of thematic space assignment, from the vantage point of academic departments of surgery, is that it detracts from cohesion and cross fertilization of surgical investigators. It may also disadvantage clinical surgical trainees seeking preparation for an investigative career if a critical mass of investigative role models is less easily identified. “Laboratories” for health services research are fundamentally different than those for basic biomedical research. The physical requirements for research benches, rooms for tissue culture and freezers are replaced by those for computational power and analysis of databases. The most essential component of a successful health services laboratory is a design that encourages interactions of a diverse group of people. The most productive health service laboratories include clinicians, data analysts and a variety of specialists, for example, experts in health care economics.
53.4 Applying for a Grant Maintaining a diversified and vigorous research program requires the marshalling of financial resources. Modern biomedical research, clinical trails, and health services investigation cannot be sustained by clinically derived funds or solely on the basis of philanthropy. Instead, high-quality research must be sustained long term by extramurally derived scientific grants.
663
Setting realistic and explicit expectations for grant application and success are essential when new faculty are recruited and hired. A careful assessment should be made of the level of scientific independence of the prospective faculty member and a clear expectation should be set as to the nature of grant application and the time from hire to submission. For most young faculty members, a mentored grant, exemplified by the NIH K-series, is an appropriate initial step. K awards support faculty salary and typically require a substantial commitment of effort, 75% in most instances. As the faculty member matures scientifically, grant expectation should also change. Independent investigator-derived grants, typified by the NIH R-series, should be an expectation of senior faculty. In addition to large governmental agencies, such as the National Institutes of Health and the National Cancer Institute, a number of other regional and national organizations support topically oriented biomedical research. Examples include the American Cancer Society, the American Heart Association, and the American Surgical Association. Young faculty members should be encouraged to submit a variety of small awards, typically for less than $100,000, to enable the collection of preliminary data and to underwrite initial efforts to organize and staff a laboratory. In this way, the young faculty member can learn to hit singles before swinging for a home run. There are a number of grant writing resources available to young surgical investigators. The best current source of advice may be found at the NIH website (http://grants.nih.gov/grants/grant_tips/htm). The web page is entitled Grant Writing Tips Sheet. There, investigators will find a section – Writing a Grant. This section is mandatory reading for all application writers, young and old. The American College of Surgeons has promoted a grantsmanship workshop, held every 2 years. This workshop includes contact with senior surgical investigators and representatives from the National Institutes of Health. Mock study sessions are held and grants are critiqued. Attendees are familiarized with the essence of successful grant writing and the process by which applications are scored. Details can be found at the website of the American College of Surgeons. A successful grant application requires thoughtful attention to detail and careful time management. The applicant should begin by working backward in time from the submission deadline. The applicant should be
664
strongly encouraged to have the grant “perfect” 1 month before the submission deadline. This will permit internal review, constructive critique, and editorial revision before the real submission must occur. The first page of a grant application is the most important. Typically, the first page includes a short preamble stating the general health issue to be addressed, and the specific problem, which will be the focus of the application. Following this introduction, a series of hypotheses are enumerated. These hypotheses are then converted into the same number of specific aims. These aims provide the operational basis for investigating the hypotheses. The specific aims should be distinct, but should build one upon another. While each specific aim might encompass a number of experiments, each should express an individual general idea. The first page of a grant application must be perfect. Constructing this initial aspect of the grant takes time. If a month is required, then a month should be budgeted to this single item. The next section of the grant should frame the general argument in the form of background and review material. The importance of the health issue to be examined must be stated explicitly. The state of existing knowledge should be explored, with a deliberate attempt to illustrate the limits of this knowledge and new areas of investigative potential. In this background section, reference should be made to each of the specific aims of the proposal. Writing in lucid fashion, the investigator should illustrate the importance of the proposed work in a manner that any scientifically sophisticated reviewer can understand. Because most reviewers are busy, often over committed individuals, clear prose is rewarded. Convoluted thinking is not. Typically, the next section of the grant includes preliminary data. The preliminary data section should illustrate that the investigator and his/her collaborators have experience with and are capable of expertly performing each technique, which is germane to the proposed investigation. Preliminary data should support the feasibility of the proposed experiments. Initial experiments should also indicate the essential correctness of the hypotheses. A publication track record is crucial. Peer reviewed publications, which can be cited to prove the technical capabilities of the investigative team are extremely helpful. In most grants, the next general section enumerates the specific aims and provides detailed methodology. Each specific aim should begin by restatement of the general problem, followed by a detailed list of the experiments,
M. W. Mulholland and J. A. Bell
which will allow its investigation. A general experimental framework should be followed by details of individual experiments. The investigator is encouraged to state the expected results of the experiments. Following these expected results, the investigator should focus upon potential problems. Potential problems may be biological, technical, and interpretative. Alternative approaches that the investigator will employ should be provided should any of these potential problems arise in the course of the experimentation. Contemporary applications for grant support are reviewed in a competitive fashion. Investigators will maximize the opportunity for funding if the grant receives a pre-submission critique. To this end, enough time must be allowed for busy reviewers to carefully review and criticize the submission, and then for the applicant to revise before final submission. The applicant should select a number of senior scientists with track records of successful application. The applicant should say to these reviewers “Don’t tell me what you like about my grant. Tell me what is wrong with it so that I can fix it.” An engaged and constructively critical reviewer is an applicant’s best friend. Not all grants will be awarded following the initial submission. Disappointment is the human reaction, which always accompanies non acceptance of an application. Every investigator, junior and senior, experiences an emotional let down when a grant is not initially awarded. Investigators are advised to read the review, and then set it aside until it can be addressed in a constructive fashion. Grant revision is an opportunity for significant improvement of scientific thought and content. An objective self evaluation is an initial step in revision. In addition, other scientists should be asked to read the critique and to also provide objective assessment of the deficiencies, which have been identified. The revision should enumerate each objection, large and small, and should address every single problem. The easiest objections are those which can be handled editorially. If there are administrative deficiencies, these must be addressed in a straight forward fashion. If additional experiments are necessary, they should be preformed. The revised grant should not be resubmitted until every criticism can be rectified. The investigator should submit a comprehensive response. In order to help organize a response, the criticisms should be listed numerically and highlighted in some fashion, for example, by the use of a bold font.
53 Research Governance and Research Funding in the USA: What the Academic Surgeon Needs to Know
The National Institutes of Health website contains a great deal of helpful information about the correct form of a research application and the importance of clarity, organization, and time deadlines.
53.5 Administrative Research Infrastructure Investigators should be provided with professionallystaffed administrative assistance. These functions can be usefully divided into pre-award and post-award services. The pre-award office helps the investigator prepare grant submission documents. The administrative and compliance portions of this service include assuring that space and departmental approvals are obtained. With the budget portion of the grant documents, the pre-award office helps determine staff salaries, supply costs and animal charges for the grant. The office also handles forms for waiver of indirect costs, cost sharing requirements and memoranda of understanding with the Veterans Administration Serivce for shared research time. The office also acts as the Liaison from the department to the Medical School Dean’s office and the University research offices. This office is responsible for department research space assignment, OSHA support of laboratories and research conference planning. The purpose of the post-award office is to assist the investigator with the management of received funding. This centralized function reviews expenditures on each award on a monthly basis. The expenditures are reviewed for compliance with federal guidelines and award specification. This reconciliation process is used to predict the future needs of the grant and to ensure that needed future expenditures are encumbered. The office meets with each investigator, on a monthly basis, to discuss the financial progress of the grant. Academic departments of surgery may develop policies for faculty after recent termination of major research funding. Some faculty, after successful careers as independent investigators, may have difficulty securing outside funding to continue their research programs. It is in the departmental interest to maintain faculty as contributing members of the academic community. As such, in selected cases, it is appropriate for the department and/or school to provide support to the faculty member to maintain his or her laboratory. An
665
Table 53.4 This table shows the department of surgery numbers and total Medical School data Short run pipeline Long run pipeline FY2007 Surgery
0.61
1.32
Total school
0.67
1.50
Surgery
0.66
1.40
Total school
0.65
1.37
FY2008
example of a bridging funding policy is provided in Table 53.4.
53.6 Running an Academic Department as a Business Activity Because investigative activity is one of the core values of an academic department of surgery, it must be planned for and financed with the same sense of detail as clinical activity and education. Research can also be a competitive advantage for academic departments of surgery. Many potential patients are attracted to academic medical centers if they can obtain novel treatments not generally available in the community practice of surgery. These patients become the subject of clinical trials, which are the basis for validating and then improving these novel therapies. These refinements can lead to additional novel trials. In this way, research leads to additional clinical activity which in turns lead to improved research opportunities. Thus, a virtuous cycle of research and clinical activity can be created. Research funding should be considered a component of departmental budget activity. To this end, a diversified research portfolio is crucial. Governmental and foundation research support varies with budgetary cycles and evolves over time. A diverse research portfolio permits smoothing of these potential impacts. There are three major components of research grants, which impact departmental budgeting. These components include the number of awards, the type and size of grants, and indirect costs recovery rates. Departmental budgeting processes should anticipate the number of awards submitted annually, and these
666
expectations should be communicated to the faculty. Senior surgical scientists must be expected to apply as independent investigators for awards typified by the NIH R-series. R-awards are given on an individual basis. When budgeted in a modular format, they carry $250,000 of direct cost expenditures per year (as of FY2008). Larger budgets require explicit justification. Groups of scientists that engage in a similar general topic can submit a number of grants simultaneously in the form of a program project award. Components of program project awards can also be considered individually, typically as R-awards. Program project awards are larger in scope and larger in financial impact. Multi-institutional trails are typically funded through U-series awards. These largest grants are necessary for the typical multi-center clinical trial. The National Institutes of Health, and many other non governmental foundations, permit payment of costs for support of investigative infrastructure in the form of indirect cost recovery. These are expressed as a percentage of the direct cost, indirect cost recovery rates are negotiated between universities and the National Institutes of Health and equal approximately 50% of the direct cost in many instances. The recovery of indirect cost is essential to every academic medical center and to every academic department of surgery. These funds offset the cost of space, facilities maintenance and utilities. In addition, these costs also pay for overhead administrative costs of purchasing, payroll, human resources, and financial services. Many academic departments of surgery offer performance bonuses as a part of faculty compensation. Almost every department of surgery offers a clinically oriented incentive bonus system. Many fewer departments allow financial incentives for research success. Research-oriented bonuses can be a useful mechanism to focus faculty attention upon the financial importance of research and to reward both effort and success. The Department of Surgery at the University of Michigan offers faculty research bonuses based upon three components, the number of grants awarded, percentage of faculty effort covered by awards and indirect cost recovery rate. The research incentive program for the faculty rewards both, initiative and salary support. The initiative component is earned based on a point system that awards points based upon the type of award; for example, an RO1 award is worth 4 points. Salary support also earns points based on the amount of effort funded
M. W. Mulholland and J. A. Bell
on the award, such that 10% effort is worth 1 point. Each faculty member that earns 8 or more points is deemed to have one share in the incentive pool. The dollar amount of the incentive pool is calculated as one-third of the indirect funds that have been earned on the sponsored research for the fiscal year. This pool is split equally within all faculty that has been deemed to have a share. The amount of the share may change based on the dollars available and the number of faculty that share the pool. All determinations are made at the end of the fiscal year with the incentive awarded at the beginning of the following fiscal year. The department pays the award in two components. The first 50% of the incentive is placed in a research discretionary account for the faculty member’s use. The second 50% is paid as additional salary to the faculty member. All academic departments of surgery recognize that clinical activity may both grow and contract. Medical advances, changes in third-party payment, and societal expectations underlie these fluctuations. As a familiar clinical analogy, expanded use of coronary stenting led to a dramatic decrease in surgical coronary revascularization after 2004. Likewise, research programs expand and contract. Because of the strategic importance of surgical investigation coupled with its financial ramifications, budget forecasting is essential. A methodology to estimate the number of grants in the “pipeline” used at the University of Michigan is provided. Departmental research forecasting is projected at this institution on a rolling 5-year basis. The Medical School uses a ratio to measure anticipated future research awards. This ratio measures a department’s ability to maintain and grow funds for research and serves as a predictive indicator of the continuity of research funding. A ratio is calculated of the awards for the last year relative to the awards at the end of the current year. This is done both for the first year out (short run pipeline) and the total award life (5 years, long run pipeline). The department then is measured based on increase in the ratio from the previous year. The Table 53.4 below shows the Department of Surgery numbers and total Medical School data. In this view, the Department of Surgery has improved not only in its pipeline projects over the first year, but also has improved above the total for the Medical School. Some of our faculty, after successful careers as independent investigators, have had difficulty securing
53
Research Governance and Research Funding in the USA: What the Academic Surgeon Needs to Know
outside funding to continue their research program. It is in the Department’s interest to maintain our faculty as contributing members of the academic community. As such, in selected cases, it is appropriate for the Department and/or School to provide support to the faculty member to maintain his or her laboratory. As differing levels of resources are available to faculty members based on Sectional membership, it is our desire to have a fair and transparent process for determining levels of support for all faculty. These principles provide the basis for this process.
53.7 Research Bridging Funds Policy This policy includes the following: 1. Bridging support will be considered for investigators in the instructional, research, and clinical track that have no external fund source and have exhausted their discretionary accounts. Only faculty with recent reviews based on failed outside submissions will be eligible for consideration. 2. Bridging (including Departmental matching funds) may be provided for up to 3 years. The Chair, Section Head, and Dean ultimately arbitrate the sources of these funds. 3. The faculty members, chair and/or section head must sign off on any application for bridging funds, monitor the faculty member’s progress based upon agreed to goals, and be responsible for the faculty member’s progress toward competitiveness and independence. If progress is not made toward the agreed upon goals, funding will cease. 4. Faculty members will be required to submit a budget justification for requested bridging funds.
667
5. The Research Advisory Committee (RAC) will be convened and may include the chair and/or section head of the affected faculty. RAC will attend an oral presentation by the investigator to review the bridging request and all available evidence including current pink sheet reviews to make the final recommendation on funding the bridging request as well as provide feedback regarding preparation for resubmission. The decision of this group is advisory to the Chair/Dean. 6. Faculty members whose independent investigative career has little likelihood of continuation, as assessed by their chair and/or section head, will be considered for internal sabbaticals, new collaborations and other efforts to reassign their effort to maintain them as a vibrant contributing member to the academic community. 7. Salary may be adjusted over time, based on total effort. 8. Faculty members who have been without significant external funding for 3 years may have their laboratory space reassigned.
References 1. Neumayer L (2006) Clinical research. Am J Surg 192: 264–266 2. Fang D, Meyer RE (2003) PhD faculty in clinical departments of US medical schools, 1981–1999, Their widening presence and roles in research. Acad Med 78:167–176 3. Herman S, Singer A (1986) Basic scientists in clinical departments of medical schools. Clin Res 34:149 4. Riboh J, Curet M, Krummel T (2007) Innovative introduction to surgery in the preclinical years. Am J Surg 194:227–230 5. Zerhouni EA (2006) Clinical research at a crossroads: the NIH roadmaps. J Invest Med 54:171–173
Research Governance in the UK: What the Academic Surgeon Needs to Know
54
Gary C. Roper
Contents 54.1
Introduction ............................................................ 669
54.2
Regulation ............................................................... 670
54.2.1 The Medicines for Human Use (Clinical Trials) Regulations ............................................................... 54.2.2 Medical Device Regulations .................................... 54.2.3 The Human Tissue Act ............................................. 54.2.4 The Data Protection Act ........................................... 54.2.5 The Mental Capacity Act .........................................
670 671 671 672 673
54.3
Approvals ................................................................ 673
54.3.1 54.3.2 54.3.3 54.3.4
Ethics Approval ........................................................ Amendments ............................................................ Sponsor Approval ..................................................... NHS Trust Approval.................................................
673 674 675 675
References ........................................................................... 676
Abstract Research Governance forms an essential part of conducting research in the United Kingdom. All health and social care research activity is encompassed under the Research Governance Framework for Health and Social Care, which defines processes for research compliance and outlines individual roles and responsibilities in research conduct. UK systems are compliance driven, so there is an essential need to identify what rules apply to the conduct of individual studies and to ensure that all required approvals have been obtained before subject recruitment can occur. Understanding the key regulations and approval requirements of this complex area will provide the reader with knowledge to inform and support safe and legal research practice.
54.1 Introduction
G. C. Roper Imperial College London, Imperial College Healthcare NHS Trust, AHSC Joint Research Office, G02 Sir Alexander Fleming Building, Exhibition Road, London SW7 2AZ, UK e-mail: [email protected]
Research Governance is a general term that applies to all aspects of research management. It encompasses key areas such as regulatory compliance, ethical conduct and approval, trial management, finance, scientific review and ongoing assessment of research practice. The UK operates under a system known as The Research Governance Framework for Health and Social Care, Second Edition, 2005 (RGF) [1], which defines mechanisms to ensure that research complies with professional, scientific, legal and ethical standards by identifying specific areas of responsibility within the research process. It pulls together a number of acts, standards and legislative requirements to ensure that all areas of research in human subjects are covered by a single framework with the aim of setting “best practice” standards at a national level. As there is some
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_ 54, © Springer-Verlag Berlin Heidelberg 2010
669
670
variance in Regulations between the countries that comprise the UK, an amended version of the Framework is in operation in Scotland [2]. The RGF provides guidance on how human research should be conducted and explains the administrative and support systems required to ensure that all appropriate measures have been taken to allow for safe and scientifically sound research. Across the UK, different organisations will provide different levels of support to investigators conducting research projects within their jurisdiction, so it is essential that researchers understand their responsibilities within UK regulation and their roles in study conduct. This chapter is aimed at giving the readers an overview of key regulations that may apply to their surgical research practice, and to provide guidance on approval systems in the UK. This chapter is not intended to be a comprehensive guide to UK research practice, and the readers should seek specific advice relating to their research conduct from their employers or UK collaborators.
54.2 Regulation The UK research environment is strongly driven by regulation, much of which has it’s origin in the European Union and is the response to EU Directives that are released for implementation from time-to-time.
54.2.1 The Medicines for Human Use (Clinical Trials) Regulations The Medicines for Human Use (Clinical Trials) Regulations 2004 (CT Regulations) [3] are the UK response to the European Union Clinical Trials Directive 2001/20EC [4] and specifically cover the legal requirements for clinical trials involving an investigational medicinal product. Before the Clinical Trial Regulations were in force, research conduct operated under a variety of regulations and standards, including The International Conference on Harmonisation of Good Clinical Practice (ICH GCP) [5]. ICH GCP was a set of standards for research conduct but was not a legal requirement until current regulations came into force. The regulations were initiated as an attempt to standardise administrative processes
G. C. Roper
across the EU, and therefore promote competency in research conduct and the development of medicinal products. The Clinical Trials Regulations bring into force a number of legal conditions relating to investigational medicinal product research and define offences relating to breach of regulation. It has recently undergone two amendments to incorporate the EU Good Clinical Practice Directive [6] and to include conditions for research in emergency situations. It is now a legal requirement to comply with the principles and processes covered by Good Clinical Practice. The definition of a clinical trial was originally included in the EU Clinical Trials Directive 2001/20/ EC and this definition forms the basis for the regulations that have been put in place in each EU country. any investigation in human subjects intended to discover or verify the clinical, pharmacological and/or other pharmaco-dynamic effects of one or more investigational medicinal product(s), and/or to identify any adverse reactions to one or more investigational medicinal product(s) and/or to study absorption, distribution, metabolism and excretion of one or more investigational medicinal product(s) with the object of ascertaining its (their) safety and/or efficacy. This includes clinical trials carried out in either one site or multiple sites, whether in one or more than one Member State. (Article 2 (c) 2001/20/EC)
In general terms, this means that if the compound being tested in the UK is classed as an investigational medicinal product then it should have the same classification in other countries. It is important to note that this may not always be the case. UK regulations are broad and cover compounds such as gene therapies and stem cell research, which in other countries may be covered by separate regulation and have separate legal requirements. This may mean that there are additional administrative processes to deal with if the trial is running at multiple centres in the EU. It would be prudent to investigate any possible variations early on to prevent any delays in approval. Within the UK a body called the Medicines and Healthcare products Regulatory Agency (MHRA) is responsible for reviewing and approving all clinical trials, which fall within the definition of the CT Regulations and subsequent amendments. The UK has included clear timelines for the approval process, which have been omitted in other countries, and issues like insurance cover and the delegation of responsibilities can vary from country to country.
54 Research Governance in the UK: What the Academic Surgeon Needs to Know
Countries outside the EU are referred to as “third countries” in UK regulations. If a trial is to be conducted in sites outside the EU, particular attention will need to be given to what approvals are required in that country. The MHRA makes it a condition of trial approval that they are permitted to inspect any sites in third countries as part of their role as a Competent Authority. If an international trial is being conducted, it is essential to assess the regulatory and ethics requirements in each country where the research will take place. Forward planning in the early stages of trial development will help to prevent later complications when it comes to ensuring all approvals are in place.
54.2.2 Medical Device Regulations The UK Medical Devices Regulations came into force in June 2002 [7], and bring together the provisions of EU Directives, which include the Medical Devices Directive, the Active Implantable Medical Devices Directive and the In Vitro Diagnostic Medical Devices Directive. These regulations establish systems under which a manufacturer must submit to the UK Competent Authority for approval for clinical investigations of medical devices. The UK Competent Authority is the MHRA. Medical Devices may incorporate a medicinal substance, and if the device can be classed as a new delivery system or procedure of administration then the sponsor would also need to apply for appropriate MHRA approval for the substance to be included in a clinical trial. This applies even if the substance is a licensed medicinal product because the new delivery system would not be covered under the original licence. Manufacturers wishing to make an application for pre-clinical assessment of an active implantable medical device or a medical device to be carried out in part or in whole in the UK should apply to the MHRA. The aim of these regulations is to ensure the safety and performance of medical devices, and to prohibit the marketing of devices that might compromise the health and safety of patients, users or any relevant third party. Devices that conform to EU regulations are referred to as being CE Marked. Confusingly, the CE does not appear to stand for any particular abbreviation and there is a lot of discussion regarding what the letters actually stand for.
671
Under the UK regulations, in order to be able to CE mark any device, a manufacturer must demonstrate that the device complies with the relevant essential EU requirements. Clinical data is normally required to demonstrate this, which could take the form of either: • A compilation of the relevant scientific literature currently available on the intended purpose of the device and the techniques employed. This should be accompanied by a critical evaluation of the compilation where appropriate. • The results and conclusions of a specifically designed clinical investigation. The information you would need to provide for a device application varies depending on the material used in the device and its purpose. Additional information would be needed if the device: • Incorporates software or is a programmable device • Uses infra-red, laser, microwave, MRI, ultrasound, X-ray etc. • Contains a medicinal substance • Incorporates animal tissue • Is classed as an active implant The MHRA provides specific guidance on what information is needed for a device application.
54.2.3 The Human Tissue Act The Human Tissue Act 2004 (HT Act) [8] came into force on the 1st September 2006 and is governed by the Human Tissue Authority. It is a framework for the regulation of storage and use of human tissue from the living, and the removal, storage and use of tissue and organs from the deceased for specified purposes. The Act makes it a requirement that all organisations that either use or store human tissue must be licenced to carry out specific activities referred to as scheduled purposes in the regulation. Activities requiring a licence are: Carrying out of an anatomical examination Post-mortem examination Removal of relevant material from a deceased person Storage of relevant material from a deceased person (other than for a specific ethically approved project) • Storage of anatomical specimens • • • •
672
• Storage of relevant material from a living person for research (other than for a specific ethically approved project) or for human application • Public display of a body or material from a deceased person If a tissue research project has recognised ethics approval, and that tissue is not going to be stored for any other purpose after the project is completed, then a licence is not required to use or store the samples. It is important to note that, at present, the only recognised ethics committees in the UK are part of the National Research Ethics Service (NRES). The Human Tissue Authority has the responsibility of ensuring that the requirements of the HT Act are adhered to through the issuing of licences, inspection and the provision of guidance. The Authority has published nine codes of practice to provide an overview of requirements and to act as a basis for inspection. The codes are: • Code of Practice 1 – Consent • Code of Practice 2 – Donation of organs, tissue and cells for transplantation • Code of Practice 3 – Post-mortem examination • Code of Practice 4 – Anatomical examination • Code of Practice 5 – Removal, storage and disposal of human organs and tissue • Code of Practice 6 – Donation of allogeneic bone marrow and peripheral blood stem cells for transplantation • Code of Practice 7 – Public display • Code of Practice 8 – Import and export of human bodies, body parts and tissue • Code of Practice 9 – Research The overriding principle of the HT Act is Informed Consent. Code of Practice 1 – Consent [9] provides comprehensive information on how and when informed consent should be obtained, and applies restrictions to what can be carried out under the terms of consent. Consent provision under the HT Act is described in Table 54.1.
G. C. Roper Table 54.1 Consent provision under the HT Act Consent required Consent not required Living persons
Living persons
Obtaining scientific or medical information which may be relevant to any other person, now or in the future
Clinical audit
Research in connection with disorders, or the functioning, of the human body
Performance assessment
Public display
Quality assurance
Education or training relating to human health
Public health monitoring
Transplantation Deceased persons
Deceased persons
After post-mortem, continued storage or use of material no longer required to be kept for coroner’s purposes
Carrying out investigation into cause of death Keeping material after a post-mortem under the authority of a coroner
Removal, storage and use for: Anatomical examination
Keeping material in connection with a criminal investigation
Determining cause of death Establishing, after death, efficacy of any drug administered to patient Obtaining scientific or medical information relevant to any future person Public display Research in connection with disorders, or the functioning, of the human body Clinical audit Education or training relating to human health Performance assessment Public health monitoring Quality assurance
54.2.4 The Data Protection Act The Data Protection Act 1998 [10] governs the protection and control of personal data. The Act is all-encompassing and therefore is not written solely with research in mind, but its requirements must be upheld in research as in any other area where personal data are used.
The Act defines eight principles of data protection and defines the role of the data subject (the person who is the subject of the data) and the data controller (the person who processes the data). The eight principles of data protection require that data are:
54 Research Governance in the UK: What the Academic Surgeon Needs to Know
• • • • • • • •
673
Fairly and lawfully processed Processed for specified purposes Adequate, relevant and not excessive Accurate Not kept longer than necessary Processed in accordance with the data subject’s rights Secure from unauthorised access or alteration Not transferred to countries without adequate data protection
appropriate help before anyone concludes that they cannot make their own decisions. • That individuals must retain the right to make what might be seen as eccentric or unwise decisions. • Best interests – anything done for or on behalf of people without capacity must be in their best interests. • Least restrictive intervention – anything done for or on behalf of people without capacity should be the least restrictive of their basic rights and freedoms.
Personal data defined in the Act covers both facts and opinions in relation to individuals. A list of what the Act constitutes as sensitive data has also been included. Sensitive data comprise of information relating to:
Section 30 of the MCA relates to the inclusion in research of adults who lack capacity and stipulates that research can be lawfully carried out if the research has been approved by an “appropriate body” (i.e. a research ethics committee (REC) ), and if a carer or nominated third party has been consulted and agrees that the person would want to be involved in the research. It is important to note that any sign of resistance by the person who lacks capacity would effectively demonstrate that they do not wish to take part in the study. Any clinical trial that falls under the requirements of the Clinical Trials Regulations is excluded from the MCA as the requirements for involving adults who lack capacity are covered by the former.
• • • • • • • •
Racial or ethnic origin Political opinions Religious or other beliefs of a similar nature Trade Union membership Physical or mental health/condition Sexual life Offences (including alleged offences) Criminal proceedings, outcomes and sentences
UK ethics committees pay particular attention to the use of identifiable data for research purposes, and the NRES application form contains a number of sections that require the researcher to define the rules and process for custodianship and access to the data.
54.2.5 The Mental Capacity Act The Mental Capacity Act 2005 (MCA) [11] regulates the processes of decision-making on the behalf of adults who lack capacity to consent. The MCA is not specifically for the protection of research participants, but rather for regulation to protect persons who lack the capacity to consent to actions or processes for themselves. The Act covers England and Wales only, with separate legal provisions covering other UK areas. The MCA is underpinned by five key principles: • A presumption of capacity – every adult has the right to make his or her own decisions and must be assumed to have capacity to do so unless it is proved otherwise. • The right for individuals to be supported to make their own decisions – people must be given all
54.3 Approvals In addition to the approvals required by regulatory bodies such as the MHRA, there are a number of steps requiring approval before a research project can commence. Additional approvals include gaining an ethics opinion for the study, gaining sponsor approval and insurance cover and obtaining approval from the hospitals where the research is to be carried out. All of these steps are crucial to getting a project up and running, and each stage of the approvals process informs what is needed to protect the research participants and the integrity of the research data.
54.3.1 Ethics Approval Ethics review processes can vary widely from country to country, but the same basic principle of seeking approval from an independent ethics committee generally applies. In the UK, the National Health Service (NHS) REC’s are managed and coordinated by the NRES.
674
NRES is managed by the National Patient Safety Agency (NPSA) and therefore participant safety is their primary remit. NRES is responsible for developing, implementing and maintaining standards that are consistent across the UK. NRES has defined standard operational procedures to streamline and standardise the approval process nationally and to ensure that projects do not have to undergo multiple reviews from multiple committees before a project can commence. The NRES system includes: • One national electronic application form • Single, independent review • Central allocation system for multi-centre studies and clinical trials • Tight timelines (60 day maximum to communicate a decision) • Written clarification permitted on one occasion only • Approval for amendments The NHS ethics system has been developed to encompass the broad range of healthcare research that is carried out in the UK. Due to the ever developing nature of new and novel investigations, NRES has instigated specialist committees to review and approve certain types of research, such as the eleven ethics committees that have been allocated the task of reviewing medical device trials that are to be conducted in the UK. Research that is required to be reviewed by an NRES ethics committee includes projects involving: • • • • • • • •
Patients and users of the NHS Relatives or carers of patients or users of the NHS Access to data, organs or tissue of NHS patients Foetal material and In Vitro Fertilisation involving NHS patients Recently dead in NHS premises Use of NHS premises or facilities NHS staff recruited by virtue of their professional role Tissue samples, including samples that originate from outside the NHS or abroad, that are to be used in the UK for research
54.3.2 Amendments Amendments to an ethics and regulatory approved project require that any change that is classed as substantial is notified to, and approved by, the ethics committee and regulatory authority as appropriate. These
G. C. Roper
can be changes made to a protocol, other essential documentation or other aspects of a study’s arrangements such as the Chief Investigator. All research protocols need a version number and date in order to maintain accurate records and audit trails. Any amendment to a research protocol should have a chronological amendment to the date and version number so that an auditor or inspector can view the progression of any changes to a trial and match them to the ethics or regulatory approval documents. Amendments can be classed as substantial or nonsubstantial:
54.3.2.1 Substantial Amendment A Substantial Amendment can be defined as an amendment to the protocol or any other supporting documentation that is likely to affect to a significant degree: 1. The safety or physical or mental integrity of the subjects of the trial 2. The scientific value of the trial 3. The conduct or management of the trial 4. The quality or safety of any investigational medicinal product used in the trial
54.3.2.2 Minor (“Non-Substantial”) Amendments A minor amendment can be defined as a change to the details of a study, which will have no significant implications for participants or for the conduct, management or scientific value of the study. Examples of minor amendments include: 1. Correction of typographical errors in the study documentation 2. Minor clarifications to the protocol 3. Changes to the research team (apart from changes to Chief Investigator or Principal Investigator) 4. Extension of the study beyond the period specified in the application form 5. Changes in funding arrangements 6. Changes in the documentation used by the research team for recording study data (i.e. case report forms) 7. Changes in the logistical arrangements for storing or transporting samples
54 Research Governance in the UK: What the Academic Surgeon Needs to Know
A substantial amendment to a clinical trial of an investigational medicinal product must be reported both to the MHRA and the main NHS REC, which approved the study before the amendment is made to trial practice. The MHRA Substantial Amendment Form must be used for this process as the standard ethics substantial amendment form cannot be used for requesting amendments to clinical trials. The Substantial Amendments Form should summarise any changes to the study and briefly explain the reasons in each case. It is important that the form is completed using language comprehensible to a lay person as UK ethics committees are comprised of both professional and lay members. It is essential that the trial sponsor and the NHS site are also made aware of, and approve any changes to the protocol before they are submitted to the ethics committee or MHRA as they may affect the terms of sponsorship, insurance provision or resources at the NHS site. Any NHS site involved must then approve the amendments before the changes are implemented.
675
Table 54.2 Key sponsor and investigator responsibilities Sponsor
Investigator
Approvals
Approvals
MHRA submission
Ethics submission
Amendment notification and annual updates
Ethics amendments and annual updates NHS site approvals
End of trial notification
Other approvals as required
Funding Contractual arrangements Compliance
Compliance
Trial audit
Patient safety
GCP/regulatory compliance
Informed consent
IMP/device supply and manufacture
Urgent safety measures
Operational procedures
Day to day monitoring
Urgent safety measures
Data collection and management
Training
Study management Study specific training
54.3.3 Sponsor Approval Traditionally, the word sponsor has been used to describe the funder of a research project. Under UK CT Regulations and the Research Governance Framework, the sponsor has been redefined to mean the individual or institution that takes responsibility of the initiation, management and financing (or arranging the financing) of a study. As the role encompasses all aspects of trial management including insurance and indemnity provisions, it is rare that an individual would take on the role as significant time and resources are required to oversee sponsor responsibilities. Key sponsor and investigator responsibilities are presented in Table 54.2. Where the sponsor is an academic institution or NHS Trust, it is usual for the sponsor to delegate some of its responsibilities to the Chief Investigator of the study as they are best placed to oversee its management. Before a study is commenced, it is essential that investigators understand what their responsibilities are and identify the procedures put in place by the sponsor to protect the research participants and the integrity of the research.
Pharmacovigilance and safety
Pharmacovigilance and safety
Recording and reporting serious adverse events
Reporting serious adverse events to sponsor
Expedited safety reporting to competent authority and ethics committee
Providing urgent information for expedited reporting
Regulatory compliance
Clinical decisions and urgent safety measures
Insurance and indemnity
Health and Safety compliance
54.3.4 NHS Trust Approval Any research that is being conducted on an NHS site will need to be reviewed and approved by the organisation. This may be via the ethics approval system where site-specific assessment can be delegated to the organisation where the research is to be conducted, or there may be additional approval processes to be completed at the research location. It is essential that all research sites are informed of the study early in the approval process as NRES ethics approval is conditional on having the approval of the NHS site before the study can commence.
676
References 1. Department of Health (DoH) UK (2005) Research governance framework for health and social care, 2nd edn. Available at: http://www.dh.gov.uk/en/Publicationsandstatistics/ Publications/PublicationsPolicyAndGuidance/DH_4108962 2. Scottish Executive Health Department (2006) Research governance framework for health and community care, 2nd edn. Available at: http://www.sehd.scot.nhs.uk/cso/Publications/ ResGov/Framework/RGFEdTwo.pdf 3. United Kingdom Parliamentary Regulation (2004) The medicines for human use (clinical trials) regulations. Available at: http://www.uk-legislation.hmso.gov.uk/si/si2004/20041031. htm 4. European Parliament, Council of the European Union (2001) Clinical Trials Directive (Directive 2001/20/EC of the European Parliament and the Council of 4 Apr 2001). Available at: http://europa.eu/eur-lex/pri/en/oj/dat/2001/ l_121/ l_12120010501en00340044.pdf 5. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) (1996) ICH harmonised tripartite guideline E6(R1): guideline for good clinical practice (CPMP/ICH/135/95/
G. C. Roper Step5, explanatory note and comments CPMP/768/97). Available at: http://www.ich.org/LOB/media/MEDIA482.pdf 6. European Parliament, Council of the European Union (2005) Good clinical practice directive (Directive 2005/28/EC of the European Parliament and the Council of 8 Apr 2005). Available at: http://eur-lex.europa.eu/LexUriServ/LexUriServ. do?uri = OJ:L:2005:091:0013:0019:EN:PDF 7. United Kingdom Parliamentary Regulation (2002) the medical devices regulations. Available at: http://www.opsi.gov. uk/si/si2002/20020618.htm 8. United Kingdom Act of Parliament (2004) Human tissue act. Available at: http://www.opsi.gov.uk/ACTS/acts2004/ ukpga_20040030_en_1 9. Human Tissue Authority (United Kingdom) (2006) Human tissue authority code of practice – consent. Available at: http://www.hta.gov.uk/_db/_documents/2006–07–04_ Approved_by_Parliament_-_Code_of_Practice_1_-_ Consent.pdf 10. United Kingdom Act of Parliament (1998) Data protection act Available from: http://www.opsi.gov.uk/Acts/Acts1998/ ukpga_19980029_en_1 11. United Kingdom Act of Parliament (2005) Mental capacity act. Available at: http://www.opsi.gov.uk/acts/acts2005/ ukpga_20050009_en_1
Research Funding, Applying for Grants and Research Budgeting in the UK: What the Academic Surgeon Needs to Know
55
Karen M Sergiou
Contents 55.1
Introduction ............................................................ 677
55.2
Applying for Research Project/Programme Funding: Funding Sources .................................... 678
55.2.1 55.2.2 55.2.3 55.2.4
Research Councils .................................................... Other Government Departments .............................. Charities ................................................................... Industry/Private Companies (National and Multinational) .................................................... 55.2.5 International Organisations (e.g. Commission of the European Communities, National Institutes of Health) ..................................................
678 679 679 680
680
55.3
Applying for Research Project/Programme Funding: Administrative Considerations............. 681
55.4
Applying for Research Project/Programme Funding: Financing ................................................ 681
55.5
Principles of Full Economic Costing .................... 681
55.5.1 Directly Incurred Costs ............................................ 683 55.5.2 Directly Allocated Costs .......................................... 684 55.5.3 Indirect Costs ........................................................... 688 55.6
Abstract A diverse range of organisations, ranging from public bodies, not-for-profit organisations to private companies and international organisations, provide significant levels of project/programme funding in the support of specific research activities. Costing practices within UK HEIs is governed by the Transparent Approach to Costing Full Economic Costing methodology (TRAC FEC). It is important for grant applicants to be familiar with requirements of TRAC FEC and know how this relates to the achievable price. It is also important for grant applicants to be familiar with the remit of the funding available when scoping the research proposal and to ensure sufficient consideration of all funding, resourcing, financing and approval administrative issues. This chapter summarises the FEC methodology used by HEIs across the UK for the costing of research projects and the varying approaches to subsequent pricing to assist academics and administrators apply for research funding.
Principles of Pricing ............................................... 689
Appendix Key terms .......................................................... 690 References ........................................................................... 692
55.1 Introduction
K. M. Sergiou Research Office, Imperial College London, Exhibition Road, London SW7 2AZ, UK e-mail: [email protected]
Research undertaken within UK Higher Education Institutions (HEIs) is funded from a wide range of funding sources. The Higher Education Funding Councils (HEFC) comprise of the Higher Education Funding Council for England (HEFCE), the Scottish Further & Higher Education Funding Council (SFC) and the Higher Education Funding Council for Wales (HEFCW). The Funding Councils contribute significantly to the funding of research in English, Welsh and Scottish universities. Funding for research in Northern Ireland is directly supported by the Department for Employment and Learning, Northern Ireland (DELNI). Key to the
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_55, © Springer-Verlag Berlin Heidelberg 2010
677
678
success of the research base is the underpinning of HEI research activity by a dual support system of public funding; (1) Block Grant Funding and (2) Project Funding. 1. HEFC Block Grant Funding: distributed by the Funding Councils to HEIs, on the basis of an assessment of quality, to support the research infrastructure (including salaries of permanent academic staff, premises, libraries and central computing costs) and to enable blue sky research in line with the HEI mission. The majority of HEFC Block funding is allocated under the following two headings; Quality-Related (QR) and Research Capability Funding. • QR funding is allocated selectively according to research quality. A number of measures are used to establish the volume of research, which in turn are linked to quality ratings as determined by periodic Research Excellence Framework, the results of which subsequently determines each HEI’s QR grant. An important component of the QR grant is the Charity Research Fund, which is an additional income awarded to HEIs in recognition of the fact that charities do not meet the full costs of research. The level of Charity Research Fund component awarded to an HEI is dependent on the volume of peerreviewed, open-competition research income awarded by registered UK and overseas charities to eligible departments in universities and colleges. Further guidance on the eligibility of research income from charities for HEFC support can be found on the HEFCE website [1]. • The Research Capability Fund supports research in emerging subject areas where the research base is currently not as strong as in more established subjects. Further information relating to the fund can be found on the HEFCE website [2]. 2. Research Council Research Project/Programme Funding: distributed by the Research Councils to HEIs in response to peer-reviewed competition/ application processes to support specific research projects and programmes. In addition to the funding streams above, the Science Research Investment Fund (SRIF), provided by the Department for Business, Innovation and Skills (BIS), is allocated to HEIs to address past under-investment in research infrastructure. Further information relating to the fund can be found on the HEFCE website [3].
K. M. Sergiou
While critically important, public funding is only one component of the HEI research funding portfolio; a diverse range of organisations, ranging from other public bodies, not-for-profit organisations to private companies and international organisations, provide significant levels of project/programme funding in the support of specific research activities.
55.2 Applying for Research Project/ Programme Funding: Funding Sources The main funders of health and social care research in the UK include the Research Councils, the Department of Health (Health Departments in Scotland, Wales and Northern Ireland also support health and social care research and development) and the Medical Research Charities. Industry is also a major investor in healthcare research and development. International organisations such as the Commission of the European Communities, and the National Institutes of Health USA are also major contributors to funding. Funders promote funding opportunities/calls on their websites detailing the application process; however, the variety of potential funding sources available to academics is significant and therefore it is difficult to maintain an overview of calls within specific research areas. HEIs take varied approaches to the dissemination of funding opportunities ranging from internal bulletins to external database subscription. If this service is not provided, there are free funding opportunity websites available such as RDfunding [4] that provide information on health-related research. Brief overviews of the main funder types are described in the following sections.
55.2.1 Research Councils The BIS is responsible for the allocation of the UK Science Budget into research via the seven Research Councils, which are organised by discipline • Medical Research Council (MRC) [5] • Biotechnology and Biological Sciences Research Council (BBSRC) [6] • Engineering and Physical Sciences Research Council (EPSRC) [7]
55 Research Funding in the UK
• Science and Technology Facilities Council (STFC) [8] • Arts and Humanities Research Council (AHRC) [9] • Economic and Social Research Council (ESRC) [10] • Natural Environment Research Council (NERC) [11] The Research Councils invest ∼2.8 billion/annum (as stated on the Research Councils UK website [12] in research ranging from medical and biological sciences to astronomy, physics, chemistry, engineering, social sciences, economics and the arts and humanities. The aim, scale and balance of research projects funded reflect national research priorities agreed by the Research Councils in consultation with Government and other stakeholders. Details of funding opportunities are publicized on the Research Councils’ websites. Applications to the BBSRC, EPSRC, STFC, AHRC, ESRC or NERC must be made online via the Research Council Joint Electronic Submission (Je-S) system [13]. Applications to the MRC must be made through the MRC Electronic Application and Assessment (EAA) system [14].
55.2.2 Other Government Departments The Department of Health (DH) is the main Government Department supporting health research through a portfolio of national research programmes including: • Health Technology Assessment Programme (HTA): The HTA programme commissions research into devices, equipment, drugs and procedures across all sectors of healthcare in three different ways: by advertising standard calls for research proposals that address specific topics, by advertising special calls for research proposals that address themed areas and by funding HTA Clinical Trials that are proposed directly by researchers. HTA is designed to answer the key questions of commissioners of healthcare, providers and users of services: Does this treatment work? For whom? At what cost? How does it compare with alternative treatments? More information can be found on the HTA website [16]. • Service Delivery and Organisation R&D Programme (SDO): The aim of the SDO programme is research and development of organization and delivery of services to increase the quality of patient care, ensure better strategic outcomes and contribute to
679
improved health. More information can be found on the SDO website [17]. • New and Emerging Applications of Technology Programme (NEAT): The NEAT programme funds initiatives in both the life and physical sciences. The programme promotes and supports applied research into the development and exploitation of new or emerging technologies. The aim is to produce innovative health care products and interventions. It will support strategic and applied research, the outputs of which must be generalisable and capable of being applied to a defined health or social care need. NEAT is managed by the National Co-ordinating Centre for New & Emerging Applications of Technology, more information can be found on their website [18]. Other programmes of research and development in the NHS through the National Institute for Health Research (NIHR), cover a broad range of healthcare matters including the provision of funding to support the training and education of future health researchers, for example through Fellowship Awards (http://www.nihr. ac.uk). In addition, DH spends about £30 million per annum through ad-hoc research budgets held by Departmental policy branches and research undertaken by its arm’s length bodies including the Health Protection Agency. More information of DH funding can be found on their website [19]. The DH is the main Government Department supporting health research.
55.2.3 Charities The key aim of the research charities is to generate knowledge that benefits the public good. Charities therefore provide an important independent stream of research funding, which complements the objectives of the Research Councils and the Department of Health. In practice, there are hundreds of research funding charities, while these may have a wide range of aims, they are all regulated by Charity law and therefore are required to adhere to certain obligations and restrictions on the use of charitable funds for research, for example the requirement to disseminate research findings and a proscription on funding research for the purpose of commercial or private gain. The Association of Medical Research Charities [20] is a member organisation of the leading UK charities that fund medical and health research. There are currently
680
117 members, including the world’s largest charity, the Wellcome Trust [21], all with a common aim of improving human health by funding a wide range of types of research including basic, applied and disease specific. Their combined expenditure on medical research is in excess of £935 million per annum. These charities provide funds in a variety of different ways ranging from small pump-priming grants to substantial level of funds intended for programmes of research. Medical research charities can only fund research that falls within their charitable objectives, which may focus on a particular disease or condition, a range of diseases or more widely on improving human health through education and research. Members are listed on the AMRC website [20], applications for funding must be made directly to the Charity and not through the AMRC.
55.2.4 Industry/Private Companies (National and Multinational) A spectrum of activity is funded by Industry and the private sector; the more the Funder expects the outputs/ outcomes of the work to be relevant to its immediate needs, the less generally applicable, or available, the research tends to be. A good understanding of the market context is critical when entering into negotiation with Industry, including; understanding the Investigators’/HEI’s position within the wider market (e.g. retaining or gaining market share), acquiring sufficient knowledge of competitors, taking advantage of opportunities (e.g. gaps in the market), minimising risks and threats, relating supply with demand (e.g. reacting to Funders’ priorities, where appropriate), understanding the Funders’ willingness and ability to pay and the value of the research to its business, and consideration of multiple services to provide a competitive edge. HEIs take varied approaches to negotiation of commercial contracts; often dedicated contract negotiators will liaise with the company on the terms and conditions of funding, other HEIs have dedicated Business Development Managers that provide the link with commercial partners. Key to contract negotiations is the requirement to retain the academic freedom to disseminate knowledge and ownership of background and arising intellectual property, as such negotiation can be a lengthy process.
K. M. Sergiou
55.2.5 International Organisations (e.g. Commission of the European Communities, National Institutes of Health) There are a number of International Organisations that are important Funders of HEI research, including: Commission of the European Communities (CEC): The CEC’s main mechanism for funding research in Europe is through their Research Framework Programmes. The current programme, Seventh Framework Programme (FP7) 2007–2013, offers a range of funding opportunities to UK HEIs. Within the FP7 programme, there are a number of funding opportunities for health research, including a dedicated “Health” theme that aims to improve the health of European citizens and increase the competitiveness of European health-related industries, while addressing global health issues such as emerging epidemics. Other themes incorporate issues including diet and nutrition, health applications of information and communication technologies, health applications of nanotechnology, and environment and health. There are three groups of activities within the “Health” theme: • Biotechnology, generic tools and medical technologies for human health • Translating research for human health • Optimizing the delivery of health care to European citizens Further information on the the Health theme is available on CORDIS website http://cordis.europa.eu/fp7/ health. More detailed information on CEC funding can be found through the UK Research Office (UKRO) http://www.ukro.ac.uk, which is the UK’s leading national information and advice service on European Union funding for research and higher education. UKRO’s mission is to promote effective UK participation in EU-funded research programmes, higher education programmes and other related activities by; supporting sponsors and subscribers through early insight and briefing on developments in European programmes and policies; disseminating timely and targeted information on EU funding opportunities; and providing high-quality advice, guidance and training on applying for and managing EU projects.
55 Research Funding in the UK
National Institutes of Health, USA (NIH): The NIH is the USA’s national medical research agency, consisting of 27 Institutes and Centres. It funds grants, cooperative agreements and contracts aiming to support the advancement of fundamental knowledge about the nature and behaviour of living systems to meet the NIH mission of extending healthy life and reducing the burdens of illness and disability. The NIH website (from http://grants.nih.gov/grants) contains comprehensive information ranging from NIH policies, funding opportunities and application processes.
55.3 Applying for Research Project/ Programme Funding: Administrative Considerations Applying for research funding is often considered a complex process, which is not only due to the wide range of Funders, but also due to variety of schemes offered by each Funder. Individual Funders may provide numerous types of funding schemes that are tailored to particular objectives (e.g. Research Grants, Collaborative Grants, Fellowships, Travel and Conference Grants, Equipment Grants). Each scheme will often have specific application instructions, and may have ranging pricing policies (which can be either prescribed e.g. Research Councils and Charities or open to negotiation e.g. commercial research) and differing terms and conditions of award. It is important for applicants to be familiar with the remit of the funding available when scoping the research proposal. Table 55.1 summarises the key administrative issues that should be considered when preparing an application for research funding.
55.4 Applying for Research Project/ Programme Funding: Financing In July 2004, the UK Government published its Science and Innovations Investment Framework 2004–2014, in which it outlined its objective for sustainable university finances by the end of 2014. Sustainability for Universities is defined along the following lines: An HEI is being managed on a sustainable basis, if taking
681
1 year with another, it is recovering its full economic costs across its activities as a whole, and is investing in its infrastructure (physical, human and intellectual) at a rate adequate to maintain its future productive capacity appropriate to the needs of its strategy, plan and customer requirements. To manage research on a sustainable basis, all HEIs need to; establish and recognise the full economic costs of research, secure better prices for research, manage the research activity strategically, improve project management and cost recovery and invest in the research infrastructure [22]. The first step in providing a framework for managing long-term financial sustainability was the implementation of the Transparent Approach to Costing Full Economic Cost methodology (TRAC FEC). UK Universities were required to implement minimum TRAC FEC requirements, which had been set out by JM Consulting in consultation with the Joint Costing and Pricing Steering Group by September 2005 with full compliance required by August 2007 (for costing undertaken from 1st February 2008). The initial requirements were outlined in TRAC Volume III Full Economic Costs; however, this was subsequently supplemented with a number of updates [23]. The following section aims to summarise the FEC methodology used by HEIs across the UK for the costing of research projects and the varying approaches to subsequent pricing to assist academics and administrators apply for research funding.
55.5 Principles of Full Economic Costing 1. The Full Economic Cost (FEC) of a project represents the cost of all resources necessary to undertake that project. The FEC is not dependent upon what the Funder will pay. 2. The Price represents what the Funder is willing to pay and the HEI is willing to accept (i.e. the Price can be equal, lower or higher than the FEC). The Price of the proposal is reached bearing in mind the FEC and not limited by it. 3. Applications for external research funding should be costed using FEC methodology and priced in line with either the funder’s terms and conditions (where pricing policy is prescribed) or in line with the HEI pricing policy (where pricing policy is open to negotiation).
682
K. M. Sergiou
Table 55.1 Applying for research funding: administrative considerations Checklist Funding
The proposed research is in line with the funding objectives/remit outlined in the call for proposals The applicant and the research organisation are eligible for the identified funding scheme The applicant and research organisation are registered users of the Funder’s electronic submission system (registration can be a lengthy process) and are familiar with the system The resources to be requested are within the scheme provisions e.g. the funding conditions should be checked for disallowable costs, capped budgets etc. The Funder’s application/assessment processes are understood and adhered including; formatting, submission processes, deadlines (it is important to note that submission of an application to a Funder such as the Research Councils is not direct, therefore sufficient time must be given for the HEI to undertake its checks) The HEIs application/internal approval processes are understood and adhered to (Electronic submissions are often routed automatically to the HEI’s administrative authority) The terms and conditions of funding are understood and acceptable to the HEI including obligations relating to; publication & acknowledgement of support, confidentiality, intellectual property ownership/commercial exploitation, liabilities & insurance, termination, conflicts of interest, research monitoring/evaluation (including scientific and financial reporting requirements), research governance, research ethics, use of animals, health and safety, financial controls (including payment terms) etc.
Resourcing
Staff capacity of the Principal Investigator, Co-Investigators and any named researchers or support staff should be adequately considered/addressed relative to other priorities and commitments Required staff resources are identified at the appropriate level commensurate with the skill base required Required non-staff resources are established (consumables, travel, new equipment etc.) to enable the project to be undertaken Required internal resources (i.e. space, facilities) are accessible and available and that resource owners (such as Facility Managers) have been consulted Required external resources within other HEIs, NHS Trusts etc. have been verified for accessibility and appropriately agreed Any projects involving; creation of new building and/or major modifications to existing buildings; installation of large equipment requiring special access; external environmental, security or planning bodies; or projects considered politically sensitive (e.g. biological warfare) are subjected to appropriate review and approval The “Justification of Resources” within the application adequately addresses the requested resources
Financing
The identified resources (e.g. staff, non-staff, internal and external facilities) necessary for the research are costed in accordance with Full Economic Costing methodology The pricing of the project is in line with HEI and Funder policy e.g. the price is in line with any prescribed pricing policy, or where the price is open to negotiation, that it is informed by market value The financial information submitted to the Funder is in accordance with terms and conditions of application, for example the budgets submitted to the Research Councils must be based on current levels and must not be indexed for pay awards (this is neither the FEC nor the Price) Research Partner/Collaborator budgets are adequately financed (costed and priced) and have been appropriately authorised by the Partner/Collaborating Institution
Approval
That any internal requirements or regulatory approvals have been considered/obtained e.g. institutional authorisation, health and safety, clinical research governance office registration etc. That any external requirements or regulatory approvals have been considered/obtained e.g. Sponsorship agreement,a ICH/135 Protocol, Clinical Trials Authorisation from the Medicines and Health Care Products Regulatory Agencies (MHRA), EudraCT database registration, appropriate Ethics Approval (via NHS REC, GTAC or ICREC), Human Fertilisation and Embryology Authority (HFEA) Approval, UK Xenotransplantation Interim Regulatory Authority (UKXIRA) Approval
a All research projects involving humans, their tissue or data must have an identified Sponsor (not to be confused with the Funder). The Sponsor is responsible for the initiation, management and financing (or arranging the financing) of a research project. The responsibilities of the Sponsor are as follows; to satisfy itself that the relevant standards are met; assure scientific quality of the proposed research; ensure ethics approval has been obtained; ensure robust systems for data collection and storage are in place; ensure arrangements are in place for dissemination of findings
55 Research Funding in the UK
683
Fig. 55.1 Categories of full economic costs
FEC CATEGORIES CATEGORIES FEC
DIRECTLYINCURRED INCURRED DIRECTLY
DIRECTLYALLOCATED ALLOCATED DIRECTLY
ResearcherSalaries Salaries Researcher
INDIRECT INDIRECT
Estates Estates
Technicians Technicians dedicated) (if(ifdedicated)
Principal / Co-Investigator Principal / Co-Investigator Salaries Salaries
Consumables,Travel Traveletc etc Consumables,
Infrastructure Technicians Infrastructure Technicians
New NewEquipment Equipmentand and associatedmaintenance maintenance associated
PoolTechnicians Technicians Pool
FECResearch ResearchFacility Facility FEC Access (DI) (DI) Access
FEC FECResearch ResearchFacility Facility Access (DA) (DA) Access
4. The Recovery Position illustrates the financial impact to the HEI of undertaking the research i.e. the difference between the FEC and Price i.e. • Full recovery (100%) is where the funds received meet the costs in full (FEC = Price). • Under recovery is where the funds received do not meet the costs in full (FEC < Price). The shortfall between the FEC and the Price is usually referred to as the Institutional Contribution. • Over recovery is where the funds received are in excess of the cost (FEC > Price). The surplus between the FEC and the Price is usually referred to as the Institutional Surplus. 5. Project costs should not be excluded from the FEC in order to anticipate a pricing decision. 6. Project Costs under FEC are classified under three categories: • Directly Incurred Costs – specific to a project, charged as the cash value actually spent and supported by an audit record. • Directly Allocated Costs – resources used by a project that are shared by other activities, charged to projects on the basis of estimates rather than actual costs. • Indirect Costs – Non-specific costs charged across all projects based on estimates that are not otherwise included as Directly Allocated costs.
The categories are summarised pictorially in Fig. 55.1. The following sections provide detailed guidance relating to the three FEC cost categories.
55.5.1 Directly Incurred Costs Directly Incurred Costs are specific to a project, charged as the cash value actually spent and supported by an audit record e.g. research staff, technical and clerical staff costs (dedicated), non-staff costs (consumables, equipment purchase etc). Transfer of budget may be allowed between Directly Incurred headings subject to Funder terms and conditions. 55.5.1.1
Directly Incurred Staff Costs
Directly Incurred staff (Clinical Research Fellows, Research Associates, Research Assistants, Research Fellows, Research Nurses etc.) are those staff who will be considered by the Funder to be wholly working on the research project, whether that be in a full-time or part-time capacity. Support staff (Technicians and Clerical support) can also be directly incurred but only if the activities are dedicated to the project. Staff working on more than one project should be proportionally allocated against respective projects. It is important to
684
note that Directly Incurred staff should be charged against the appropriate project(s) based on the actual costs incurred at the time incurred. Certain categories of Directly Incurred staff (technical/clerical support charged to more than one funding source, and researchers charged to more than one research project within the same time period) are required by TRAC to complete timesheets. TRAC Timesheet requirements should be considered the minimum requirement. Reference should always be made to the Funder’s terms and conditions as these may detail additional timesheet obligations e.g. CEC FP7 funding requires all researchers to complete timesheets. When costing Directly Incurred staff, the following requirements should be adhered to: 1. Staff costing should be based on the appropriate pay scale (based on HEI criteria for qualifications) and should reflect the category of staff, type of work and proportion of effort required. 2. The staff costing should include basic salary, London allowance (where applicable), additional allowances (where applicable), plus employer’s superannuation and national insurance contributions. 3. The costing should also take into account incremental progression with the exception of those staff on salaries fixed as prescribed by the Funder (e.g. Marie Curie Fellows [24]) and be indexed for pay awards in line with the HEI Indexation policy. 4. Where discretionary points or promotion are likely to be awarded (e.g. upon award of PhD), such salary increases should be considered within the costing. Fixed salaries should only be calculated for specified funding streams (e.g. Marie Curie Fellowships). This term should not be confused with those staff on negotiable increment grades. 5. Redundancy and severance pay should not be included as part of the FEC as this is considered an Indirect Cost. In the event that a non-government Funder will meet redundancy costs as a direct cost, these costs should be detailed in the price (not the FEC).
55.5.1.2 Directly Incurred Non-Staff Costs Table 55.2 provides general guidance on non-staff costing.
K. M. Sergiou
55.5.2 Directly Allocated Costs This includes the costs of resources used by a project that are shared by other activities. They are charged to projects on the basis of estimates rather than actual costs. Virement/transfer of budget is not allowed between Directly Allocated headings as dictated by TRAC. The four categories of Directly Allocated Costs; Investigator, Estates, FEC Research Facilities and Shared Technicians (Infrastructure and Pool) are described in the following sections.
55.5.2.1 Directly Allocated Investigator Costs FEC requires the costs of the Principal Investigator (PI) and Co-Investigators (Co-I) to be included in the calculation of the cost of a project, irrespective of how they are funded. All Investigators should carefully consider all the time that can reasonably be attributed to a project i.e. the direct time required to manage the project, undertake the work and supervise the research. It is important to note that Directly Allocated staff should be charged against the appropriate project(s) based on the estimated time and cost. Directly Allocated staff are not required by TRAC to complete timesheets; however, reference should always be made to the Funder’s terms and conditions as these may include additional timesheet obligations e.g. CEC FP7 funding requires all funded investigators to complete timesheets. Guidance for estimating time cannot be prescriptive, absolute or all encompassing as the time demanded from investigators on specific projects will be differentially influenced by a number of factors such as the; project period, nature/complexity of the project, number of co-investigators involved (where staff are comanaged, the time should be split appropriately), number and level of staff to be supervised, and whether new staff are to be employed (and hence need additional training) as opposed to continued funding for current staff. There are certain restrictions that investigators should be aware of when estimating time: 1. The time estimated must always relate to direct project activity, not associated costs such as replacement teaching costs. 2. All investigators should carefully consider all the time that can reasonably be attributed to a project i.e. the direct time required to manage the project,
55
Research Funding in the UK
685
Table 55.2 Directly incurred non-staff costs Category Guidance Recruitment
The cost of advertisement for unnamed posts is allowable under FEC, the amount to be requested will depend on the chosen media and length of advert. If a number of posts are to be recruited at different stages of the project then recruitment costs must be considered for each post Associated costs such as postage, application forms, administration, applicant travel expenses are considered Indirect Costs, and therefore must not be included as directly incurred
Consumables
For example; laboratory supplies; computer sundries and small equipment; test costs; licences; fees (including patient fees if applicable for clinical research); specialist journals specific for the project; project specific courier costs. Office consumables (e.g. photocopying, printing, stationary, computer consumables, telephone costs) should not be included as a directly incurred cost UNLESS usage is exceptional e.g. survey based projects and justified accordingly, otherwise they are considered as indirect costs
Biomedical services
The purchase of animals should be considered a directly incurred cost and any surgical procedures specific to the project. Animal maintenance can be considered directly incurred; however, HEI policy should be referred to as HEIs may choose to charge their maintenance as directly allocated
Travel and subsistence costs
Travel should be by the means of transport that combines both economy and consistency with the objective of the trip. Standard fares should be quoted to include taxes, parking costs etc. Costs to consider include; UK and overseas travel, subsistence costs, conference fees, staff and patient travel
NHS Trust costs
All associated NHS costs should be included (e.g. research nurses, ethics committee approvals etc.) when these costs are to be recharged to the Trust or calculated by the Trust but not charged (as the costs are already deemed to be covered through “Other Clinical Services” provided under knock-for-knock arrangements)
Professional services It is recommended that costs (inclusive of VAT where appropriate) are obtained from third-party (external consultan- subcontractors in writing and where appropriate draft subcontracts negotiated. Example services include; cies & subcontracts) transcribing/translating, analysing samples/data etc. Research partners costs
Where research is to be undertaken by a number of collaborating institutions (Research Partners), and a single application is to be submitted to a Funder, the lead Institute must obtain the Research Partner costs at the level of detail required by the Funder
Equipment
All new equipment required for the project should be included. Where equipment quotations are obtained, it is important that these state the price applicable at the proposed purchase date rather than current price Costs to consider include; purchase, installation, set-up, testing, import duty, delivery, spare parts, maintenance/service contracts (for project life), software and exceptional procurement costs, insurance costs, buildings modifications. Where VAT is chargeable, all new equipment required for the project should be costed inclusive of VAT at the appropriate rate It should be noted that Equipment costs can be shown on a project in any one of three ways Equipment purchased for a project (Directly Incurred costs) Equipment already owned by the institution – and being directly allocated to the project on the basis of usage using charge out rates (Directly allocated) Equipment already owned by the institution but not being directly allocated to projects on the basis of usage will form part of the Estates charge.
686
undertake the work and supervise the research including; • Investigator research time • Staff management/supervision time • Project management time • Travelling time: if explicitly part of the project e.g. fieldwork, seminars funded during the funded period, then time can be included, although this might simply be captured under the research time estimation. • Reports: work involved in the writing up and dissemination of the results of the research (PI and Co-I time) e.g. final report, intermediate reports should be estimated. 3. Investigators can estimate their time in a number of ways; (i) a month-by-month build up of time, (ii) estimating the number of hours on average a year or (iii) using a proxy of hours per researcher plus time at the end for writing up. 4. Investigators’ time does not have to be profiled across the life of the project (the actual life is often longer than the life assumed for funding purposes e.g. writing the final report after the “end date”). 5. To ensure that costing is not complex, time (and associated salary) can be spread evenly across the period of funding. The weekly input method (a) could be used to illustrate a variable time profile for an Investigator i.e. the first year may require a significant PI time investment in contrast to following years. The total input method (b) will show a flat profile i.e. equivalent effort throughout the project. Both methods are acceptable under TRAC. If it is likely that there will be significant troughs and peaks in activity, the former method is advisable, in particular, if effort is weighted towards the end of the project where the effects of increments and indexation on pay will be compounded. 6. Time estimates must be based on the default TRAC assumption of 7.5 h/day, 37.5 h/week, 220 days/year, 1,650 h/year. This standard working year of 1,650 is not indicative of the actual working year (contracted or otherwise); it is imposed by TRAC so that HEIs do not recover more than 100% of a salary from public bodies. This does not mean that across all areas of activity, an individual cannot be working more than 1,650 h per annum; however, it is a good
K. M. Sergiou
management practice to ensure that workloads of awarded projects are reasonable. 7. Investigator time should not include any activity that is categorised by TRAC as “support” since these costs are recovered through the indirect cost rate. Support activities can be defined as those which are; undertaken in support of teaching (e.g. timetabling, admissions work and exam boards), research (e.g. drafting research project proposals, refereeing papers) and other activities (e.g. drafting project proposals, negotiating terms with Funders); relate to professional development (the maintenance and advancement of personal knowledge); relate to HEI management or administration (e.g. membership of departmental/institutional committees, quality assurance etc); relate to the time spent training or supervising research students, except in the case of project studentships and any other students who are recognised as a member of the project team by the Funder; the preparation of bids for a project is considered a support activity because much of the work is abortive and cannot be linked to any one project; time associated with staff recruitment (this forms part of the Indirect Costs); general researchrelated activity such as refereeing papers, editing journals, etc; communication of results: if part of the post-project academic publication process, then is part of “own-funded” research time, and hence not part of the project itself. Once the investigator time has been estimated, the associated costs should be calculated in accordance with the following TRAC requirements: 1. PI/Co-I time must be estimated for all projects with possible exception for small travel/conference funding, equipment only bids (dependent on HEI policy) or where the PI/Co-I is in receipt of a full-time fellowship for the same period. 2. Directly Allocated salaries can either be estimated using pay banding or actual salaries, or a combination thereof (dependent on HEI policy), and must not exceed 1,650 h/annum per investigator. 3. Pay bands are calculated by HEIs to include allowances, honorariums and associated on-costs and provision for likely pay increments and promotions. Pay bandings do not make provision for bonuses, payments that relate purely to clinical work (e.g. NHS merit awards/clinical excellence awards, intensity
55
Research Funding in the UK
payments or Additional Duty Hours (ADHs), nor do they include academic overtime. 4. A salary cost based on hourly rates should be applied to each investigator’s time except when those costs are paid neither by the HEI or their collaboration partner (e.g. some visiting fellows, visiting professors from industry, retired academics or senior research investigators (SRIs). In cases, where a living allowance is paid, this should be included. 5. The time of NHS academic staff working on a project should be estimated and included. However, a salary cost should only be associated to this time if it is deemed by the HEI to be part of the “knockfor-knock” arrangements or if the Trust specifically recharges an appropriate part of the salary costs. 6. Where the Principal Investigator is to be charged directly to the project (e.g. Senior Fellowship application), the cost and time must be considered as Directly Incurred, not Directly Allocated, as the latter is an estimated cost.
55.5.2.2 Directly Allocated Estates Costs Estates costs relating to research are directly allocated to projects through an Estates Rate. HEIs have been required to implement a minimum of two estates rates since September 2005. Estates costs are defined for this purpose as; premises expenditure (repairs and maintenance, utilities, rates, estates staff, rents, gross buildings depreciation and the net TRAC infrastructure adjustment, buildings insurance, cleaning, porters and security), and equipment and research facilities that are not defined (and therefore not charged out) as FEC Research Facilities. The Financial accounts are modified by two significant adjustments: the Infrastructure adjustment and Return for Financing and Investment (RFI). The Infrastructure adjustment ensures that the HEI takes the current replacement value of its assets into account and allocates sufficient funds for their long- term replacement; therefore, it is included within the Estates Rate. The RFI adjustment is an indirect cost. The Estates Cost for a specific project is calculated as a multiple of the Project Full Time Equivalent
687
(£/FTE). The Project FTE is the sum of the research time committed to the project (i.e. the sum of all directly incurred research staff, postgraduate students (weighted) and directly allocated investigators, but excluding support staff). While researchers may not attract a salary cost (a visiting academic, for example), their time will count towards the Project FTE, and must therefore be included in the FTE calculation. For example, if three full-time researchers work on a project within a laboratory facility for 12 months with a rate of £30,000/FTE, then the Estates Cost for the project would be £90,000. The Estates Rate should be applied in accordance with the following TRAC requirements: 1. The estates rate applicable is dependent on the location of the majority of the research and irrespective of whether or not a salary is to be incurred. 2. An estates cost should be calculated for the time of researchers working in a facility owned by the host HEl, irrespective of the funding of that facility. 3. It is a good practice for the rate to reflect the location where each researcher is working; however, a single estates rate is commonly applied to represent where the majority of research is to be undertaken. 4. On collaborative projects between institutions, the estates rate should reflect the costs of the project (i.e. where the staff are working), irrespective of which institution is the lead. Where research is to be undertaken within a collaborating institution, the appropriate costs should be obtained from the relevant institutional authority. 5. If the facility is deemed part of an NHS Trust knockfor-knock arrangement, then the appropriate HEI Estates Rate can be applied. 6. If a researcher works off-site for 6 months or more in aggregate during a project, then no estates rate should be applied for that period of time unless the off-site location provider intends to invoice the HEI, then those costs should be counted within the FEC (under Directly Incurred Professional Fees/ Services). 7. The estates rate for project postgraduate students should be weighted dependent on the Estates type e.g. generic estates rate by a factor of 0.5 and laboratory estates rate by a factor of 0.8.
688
55.5.2.3 Directly Allocated FEC Research Facility Charge-Out Rates Directly Allocated FEC Research Facilities can be comprised of a single piece of major equipment, a collective suite of equipment or a research service (such as a statistical advisory service) that is owned by the HEI. Associated charge-out rates must be calculated in accordance with TRAC methodology for either small or major research facilities as an hourly running cost (inclusive of facility staff costs, maintenance, consumables etc.), but should not make provision for replacement. This exercise is usually undertaken within the HEI’s Finance Department, as the identified costs must be extracted from the institutional estates rate calculation as part of the Annual TRAC exercise. The minimum TRAC requirement for HEIs is for Biomedical Services to be charged out using FEC; the approach towards other facilities is at the discretion of the HEI. It is common for HEIs to restrict the number of FEC Research Facilities to avoid burdensome administration and proliferation of cross-charging. In order to do this, HEIs may define eligibility criteria based on the number and type of users, the existence (or absence) of a charging culture, historical financial position and future financial recovery projections.
55.5.2.4 Directly Allocated Shared Laboratory Technicians FEC defines two categories of “Shared Technician” categories; Infrastructure and Pool Technicians, which are described below; Directly Allocated Infrastructure Technicians: Infrastructure technicians are defined as those that provide support including health & safety, stores, workshops, laboratory equipment maintenance, and laboratory management & administration etc. Infrastructure technician rate(s) are set as one or more HEI rates; their application is mandatory and will usually mirror that of the HEI’s approach to the Estates rate. The infrastructure technician cost at project level is calculated as a multiple of the £/FTE i.e. it is proportional to the sum of the time of Directly Incurred researchers,
K. M. Sergiou
post graduate students (weighted) and Directly Allocated investigator costs. Directly Allocated Pool Technicians: Pool technicians are similar to Directly Incurred technicians, in that they support specific research projects, but instead are costed using an hourly rate and are not required to complete timesheets (unless a specific condition of funding). The time estimated for pool technicians on a research project must not include any staff time that is being directly incurred or can be considered infrastructure technician activity. It is important to note that not all HEIs have chosen to implement this category of technician; therefore, specific reference should be made to HEI policy.
55.5.3 Indirect Costs Indirect Costs are non-specific costs charged across all projects based on estimates that are not otherwise included as Directly Allocated Costs e.g. • General office and basic laboratory consumables • Library services/learning resources, typing/secretarial, finance, personnel, public relations and departmental services • Central and distributed computing if not Directly Allocated • RFI adjustment (includes redundancy costs) Prior to FEC, indirect costs (otherwise known as overheads) were generally calculated as a percentage of direct staff costs (or as a percentage of all costs for European (CEC) applications). Staff time (not cost) is considered a preferable driver under FEC; as such, HEIs are required to annually calculate at least one Institutional Indirect Cost Rate. The Indirect Cost is calculated as a multiple of the £/FTE i.e. it is proportional to the sum of the time of Directly Incurred researchers, postgraduate students (weighted) and Directly Allocated investigator costs. The Indirect Cost Rate should be applied in accordance with the following TRAC requirements: • If an academic/researcher has allocated time to a project, then the indirect cost should be applied to that time, irrespective of whether the project is being led by or taking place in another institution or off-campus.
55
Research Funding in the UK
• If an academic/researcher has allocated time to a project, then the indirect cost should be applied to that time, irrespective of whether there is a salary also allocated to the project (e.g. visiting academic). • Where there is a visiting academic on a project, indirect costs might be charged from the collaborator to the lead institution. This recharge can be based on either the collaborator’s rate; or alternatively, the indirect cost rate for the employer of the majority of the staff could be used for all staff. • The indirect rate for project postgraduate students should be weighted dependent a factor of 0.2.
55.6 Principles of Pricing Improved pricing of research grants and research contracts is essential for the future sustainability of the UK’s HEI research base; however, pricing is a complex area and one where it is often necessary to make judgements. In certain cases, it will be considered beneficial to adjust the price in order to acquire the benefits, which will contribute to the future academic or commercial endeavour i.e. the price of a project may be less than FEC where justified due to strategic reasons. Where Funders are required to meet the Government directive to fund in accordance with TRAC FEC e.g. the Research Councils, Other Government Departments (public good research) or within other schemes that regulate prices e.g. Charities and Europe (CEC), there is little scope for price negotiation because the Funder prescribes the pricing policy. The FEC must be translated in accordance with the Funder’s terms and conditions to establish the Price. An important general principle of public funding is that public bodies funding research that demonstrably contribute to the enhancement of the UK research base, or in some other way provide a public scientific good, should never be charged at more than FEC. This applies to all Research Council work, as well as public good research funded by Other Government Departments (OGDs) and Charities. The varying approach to pricing from the key funder groups is summarised below;
689
• Research Councils: will pay a proportion of FEC on grants awarded. This is currently set at 80% of FEC, with exceptions for directly incurred equipment over £50 K and postgraduate studentship stipend and fees, which are paid at 100% FEC. It is important to be aware that Research Councils do not accept indexation as calculated by HEIs, therefore applications to the Research Councils must be based on current pay scales. Research Councils will then apply indexation based on the GDP deflator. Unfortunately, this is frequently below the pay awards received, and therefore shortfalls on Research Council staff budgets are a common problem. • Other Government Departments (OGDs): should generally pay 100% of FEC (on non-competitive contracts) and should only pay at the same rates as the Research Councils (currently set at 80%) if the Government has determined a different policy for a particular department. The initial negotiating stance with OGDs should not be less that FEC unless; (a) the NHS (not the Department of Health) is to fund a clinical or health-related project, then the price should be calculated as per Research Councils or (b) where the HEI specifically decides to price a particular research project at less than 100% FEC. It is important to consider that OGDs may potentially pay more than 100% of FEC if the work is competitive or other market conditions apply. • Charities: Under FEC, Charities have continued to meet the (eligible) directly incurred project costs (and, in parallel, some capital funding for infrastructure). In addition, Some Charities will meet also meet certain directly allocated costs such as Research Facility Charge-Out Rates and some on occasion contribute to Investigators’ costs; however, no Charity will meet the Estates or Indirect Costs of research projects, as this is considered offset by the Charity Research Fund income paid by HEFCE to HEIs through the QR Grant. When pricing a charity application, one should consider any costs defined by TRAC as Directly Allocated that could legitimately be included in the price i.e. pool technicians (as pool technicians are calculated on an average hourly rate, it may be appropriate to recalculate the salary using actual salary scales) and Equipment Access Charges. Certain charities such as the Wellcome Trust and Arthritis Research Campaign
690
K. M. Sergiou
will not currently accept institutional indexation rates, and indexation must be recalculated in accordance with their specific terms and conditions. • Commission of the European Communities (CEC): the CEC will not meet the costs of FP7 at 100%. The recovery percentage is dependent on the activity to be undertaken e.g. RTD @ 75%, Demonstration @ 50% and Management Activities @ 100%. As the reimbursement (recovery) rate applicable depends on the activity and as more than one activity is possible on each individual FP7 project, projects may be awarded reimbursement rates of 50, 75 and 100% within just one project. FP7 recovery is further reduced due to the restrictions relating to VAT associated with FP7 expenditure, which cannot be claimed from the CEC or recovered. While the CEC will meet the costs of Principal and Co-Investigator time (at the recovery levels explained), they require actual salary and time to be charged, as such timesheets are a requirement of the funding. The CEC will not recognise the TRAC Estates or Indirect Costs, instead the “indirect costs” must be either priced in line with a default
rate (i.e. 60% of the total direct eligible costs excluding subcontracts) or can (within the UK) adopt. TRAC EC-FP7 (a development of TRACFEC), in which the HEI must implement an Institutional Indirect Rate for FP7 pricing. Where Funders are not required to adhere to TRAC FEC i.e. Industry (UK and Overseas) and Other Government Departments (competitive tender), the pricing strategy is open to negotiation. Therefore, when pricing commercial research, the value attached to that activity by the Funder needs to be understood and recognised to inform the pricing decision i.e. the price should reflect what the market will bear. Where projects are jointly funded from differing Funder types, the approach to pricing may draw from both non-commercial and commercial pricing principles. For example, Industrial partners may co-fund a research project with a Research Council, through a formal collaboration mechanism or through participation in a standard research grant.
Appendix Key terms
Glossary of terms Term
Acronym (if used)
Academic staff
Definition Includes research staff and any staff returnable under the RAE / REF or as a researcher to a Funder. This is irrespective of the Length of their contract (regular, quasi-permanent, rolling, fixed term, visiting lecturer) Type of contract (academic, nursing lecturer, research, retired) Funder (fellow, university paid, research council paid, centrally supported, external funded
Co-investigator
Co-I
A person who assists the Principal Investigator in the management and leadership of the project
Collaborative research
The usual characteristics of collaborative research are There is a joint investment of physical (including financial) and intellectual resources The aim is to develop understanding in a field of substantial interest to both the collaborator (for example one or more companies) and the HEI Arising IP may be shared or (especially where generic) may remain with the HEI Arising results or knowledge must be capable and free for publication in keeping with normal academic practice. Where rights of publication of results is excluded within the terms of the contract, the activity cannot be administered through the research ledger i.e. the activity should be considered under a service agreement (also known as “contract research”)
Directly allocated costs
The costs of resources used by a project that are shared by other activities. They are charged to projects on the basis of estimates rather than actual costs
55
Research Funding in the UK
691
Glossary of terms Term
Acronym (if used)
Definition Shared technicians, which are not specific to research projects and do not complete FEC timesheets. There are two types of shared technician:
Directly allocated shared technicians
Pooled technicians – those who work on several research projects, but do not complete timesheets Infrastructure technicians – those who provide infrastructure support or services to laboratories, including health & safety, storeroom/supplies, hazardous materials handling, laboratory equipment maintenance, carpentry or in administration
Directly incurred costs
Costs that are specific to a project, are charged as the cash value actually spent and are supported by an audit record
Full economic costs
FEC
The full economic cost (FEC) of a project represents the cost of all resources necessary to undertake that project and must be calculated in accordance with TRAC
Full time equivalent
FTE
The amount of time the employee works as a percentage of full time
Higher Education Funding Council for England
HEFCE
The government body responsible for the distribution of public funds to higher education institutions across England
Higher Education Institution
HEI
A higher education institution as defined under the Higher Education Act of 1997
Indirect costs
Non-specific costs charged across all projects based on estimates that are not otherwise included as Directly Allocated costs
Infrastructure technicians
Infrastructure technicians – those who provide infrastructure support or services to laboratories, including health & safety, storeroom/supplies, hazardous materials handling, laboratory equipment maintenance, carpentry or in administration
Institutional contribution
The contribution made by the Institution in support of a research project i.e. the shortfall between the cost of the research (the FEC) and the external funding secured (the Price) i.e. where FEC > Price
Institutional surplus
The surplus between the cost of the research (the FEC) and the external funding secured (the Price) i.e. where FEC < Price
Knock-for-knock
The arrangement between Higher Education Institutes (HEIs) and their associated NHS Trusts by which costs are incurred in lieu of clinical services i.e. research services received from clinicians or research work carried out by academic clinicians at the same time as clinical services
Principal investigator
PI
The person to whom the research project is assigned. The PI takes responsibility for the intellectual leadership of the research project and for the overall management of the research
Project
A project is an activity that is considered by a Funder to be a separately fundable piece of work such as a research project, programme grant, centre grant or fellowship
Project FTE
The Project FTE is the sum of the researchers FTE (both academic and research staff) and project postgraduate students
Quality related funding
QR
Quality-related (QR) funding is allocated selectively to HEIs according to research quality by the Funding Councils (e.g. HEFCE). A number of measures are used to establish the volume of research, which in turn are linked to quality ratings as determined by the periodic review (e.g. RAE / REF), subsequently determining the institutional QR grant. HEIs conducting the best research receive a larger proportion of the available funds. It should be noted that after 2008, the RAE will be replaced by a metric assessment system to include statistical indicators such as research income, publication citations etc.
(continued)
692
K. M. Sergiou
(continued) Glossary of terms Term
Acronym (if used)
Definition An important component of the HEI’s QR is the Research Fund, which is additional income payable to HEIs in direct proportion to charity funded research undertaken. The Research Fund is paid to HEIs in recognition of the fact that charities generally cover only a proportion of the project costs. The Research Fund relates specifically to charity research income, which has been awarded through peer review and open competition and is awarded by a charity registered in the UK or an overseas body with exclusively charitable purposes, consistent with the definition set out by the Charity Commission.
Recovery
The recovery is the percentage difference between the FEC and the Price. Full recovery (100%) is where the funds received meet the costs in full (FEC = Price). Under recovery is where the funds received do not meet the costs in full (FEC < Price). Over recovery is where the funds received are in excess of the cost (FEC > Price)
Research
The generally accepted definition of research is the Frascati definition. The definition below is as per that used for the Research Assessment Exercise: “Original investigation undertaken in order to gain knowledge and understanding. It includes work of direct relevance to the needs of commerce, industry, and to the public and voluntary sectors; scholarship; the invention and generation of ideas, images, performances, artefacts including design, where these lead to new or substantially improved insights; and the use of existing knowledge in experimental development to produce new or substantially improved materials, devices, products and processes, including design and construction. It excludes routine testing and routine analysis of materials, components and processes such as for the maintenance of national standards, as distinct from the development of new analytical techniques. It also excludes the development of teaching materials that do not embody original research”
Researcher
A Researcher in a project is anyone who will make a significant intellectual contribution to a research project. Typically, such a person would be qualified to carry out independent or supervised research, might provide an academic lead for research, or could provide expert advice to a research project. Researchers have a thorough understanding of what they are doing, can interpret results and devise appropriate ways forward (rather than, for example, carrying out a set of routine operations under carefully supervised conditions)
Research Council
RC
Research Councils provide funding for specific research projects and programmes following some form of peer-refereed competition. There are seven research councils organised by discipline; Arts and Humanities Research Council (AHRC), Biotechnology and Biological Sciences Research Council (BBSRC), Engineering and Physical Sciences Research Council (EPSRC), Economic and Social Research Council (ESRC), Medical Research Council (MRC), Natural Environment Research Council (NERC) and Science and Technology Facilities Council (STFC)
Transparent approach to costing
TRAC
TRAC is the standard methodology used by HEIs across the UK to determine the cost of their main activities (Teaching, Research, and Other activities). Introducing TRAC was a government requirement and part of the Government’s Transparency Review
References 1. Higher Education Funding Council for England Support for research income from charities. Available at: http://www. hefce.ac.uk/research/funding/charities/ 2. Higher Education Funding Council for England Research Capability Fund. Available at: http://www.hefce.ac.uk/Research/ funding/rcf/
3. Higher Education Funding Council for England Science Research Investment Fund (SRIF). Available at: http://www. hefce.ac.uk/research/srif/faq.htm 4. RDFunding. Available at: http://www.rdfunding.org.uk/ 5. Medical Research Council. Available at: http://www.mrc.ac.uk 6. BBSRC. Available at: http://www.bbsrc.ac.uk 7. EPSRC. Available at: http://www.epsrc.ac.uk 8. STFC. Available at: http://www.stfc.ac.uk 9. AHRC. Available at: http://www.ahrc.ac.uk/
55 Research Funding in the UK 10. ESRC. Available at: http://www.esrc.ac.uk/ 11. NERC. Available at: http://www.nerc.ac.uk/ 12. RCUK. Available at: http://www.rcuk.ac.uk/ 13. RCUK The Portal for the Research Councils’ Electronic Grant Services. Available at: https://je-s.rcuk.ac.uk/ 14. MRC The MRC Electronic Application and Assessment (EAA) System. Available at: https://www.eaa.mrc.ac.uk/ 15. Department of Health (DH) UK (2007) Research funding and priorities. Available at: http://www.dh.gov.uk/en/Research anddevelopment/A-Z/DH_4069152 16. National Institute for Health Research UK(NIHR) NIHR Health Technology Assessment programme. Available at: http://www.hta.ac.uk/funding/ 17. National Institute for Health Research UK(NIHR) NIHR Service Delivery and Organisation programme. Available at: http://www.sdo.nihr.ac.uk/
693 18. National Institute for Health Research UK(NIHR) New and Emerging Applications of Technology (NEAT). Available at: http://www.nihr-ccf.org.uk/site/programmes/neat/ 19. Department of Health(DoH) UK; Available at: http://www. dh.gov.uk/en/index.htm 20. AMRC. Available at: http://www.amrc.org.uk/homepage/ 21. Wellcome Trust. Available at: http://www.wellcome.ac.uk/ 22. H M Treasury (2004) Science & innovation investment framework 2004–2014. Available at: http://www.hmtreasury.gov.uk/spending_sr04_science.htm 23. JCPSG Key publications and past guidance. Available at: http://www.jcpsg.ac.uk/resources/publications.htm 24. MCFA Marie Curie Fellows Association. Available at: http:// mcfa.eu/
How to Enhance Development and Collaboration in Surgical Research
56
Peter Ellis
Contents 56.1
Background and Introduction............................... 696
56.1.1 Situation ................................................................... 696 56.2
Complications ......................................................... 697
56.3
Opportunity ............................................................ 697
56.3.1 Academic Institutions............................................... 697 56.4
Public Benefits of Research ................................... 698
56.4.1 56.4.2 56.4.3 56.4.4
Research into the Determinants of Health ............... Medical Research ..................................................... Health Care Delivery Research ................................ Translational Research .............................................
56.5
Collaboration and Its Benefits............................... 700
56.6
Evidence of Value/Demonstrated Benefit ............. 701
56.7
Implications of Collaboration ............................... 704
56.7.1 56.7.2 56.7.3 56.7.4
Virtual Organisation ................................................. Co-Location.............................................................. Joint Ventures ........................................................... Merger ......................................................................
56.8
Benefits of Collaboration to Research .................. 705
56.8.1 56.8.2 56.8.3 56.8.4 56.8.5
Discovery Research .................................................. Clinical Research ..................................................... Health Services Research ......................................... Clinical and Population Epidemiology .................... Technology Convergence .........................................
56.9
Additional Benefits of Collaboration .................... 706
699 699 699 700
704 704 704 704
705 705 706 706 706
56.9.1 Advantages to Education.......................................... 706 56.9.2 Impact on Allied Professions ................................... 706
P. Ellis People in Health, Ability House, 7 Portland Place, London W1B 1PP, UK e-mail: [email protected]
56.9.3 Advantages to Health Service Delivery (NHS) ...... 706 56.9.4 Aligned Capital and Funding Opportunities .......... 707 56.9.5 Commercial Opportunities and Economic Spin-Off Benefits .................................................... 708 56.10
Contract and Industry Research......................... 708
56.11
Biosciences Cluster ............................................... 708
56.12
Advantages to Biotech Sector Development ...... 709
56.13
Biosciences Business Park and Incubators ........ 709
56.14
Alternative Models ............................................... 709
56.14.1 Successful Models from Other Jurisdictions (Academic Health Science Centres) ....................... 709 56.15
Aligned Vision, Structure, Process and Resources ....................................................... 711
56.16
Conclusion and Summary ................................... 713
56.17
Search Strategy and Interviews .......................... 713
References ........................................................................... 714
Abstract Development and Collaboration in Surgical Research is required for three basic reasons: 1. New approaches are allowing for constant re-evaluation and rethinking of heretofore accepted techniques and protocols 2. Observation and experience in interacting with one’s patients should be the stimulus and trigger for seeking new and improved outcomes 3. Surgical disciplines, more than any other, are dependent on technology convergence that requires knowledge, not only from other medical disciplines but also from other sciences (physical, material, information and chemistry etc.) Surgery has some unique opportunities and challenges. It is, however, part of a larger system and as such is enabled or constrained by the appropriate alignment of
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_56, © Springer-Verlag Berlin Heidelberg 2010
695
696
structures, processes and systems to facilitate collaboration and the development of innovative practices. The benefits of research in improving the health and wealth of the nation are well demonstrated, and include: 1. Research that has helped understand the determinants of health and informs preventative strategies 2. Medical research that develops new interventions to cure and treat disease 3. Research into innovative ways of delivering care 4. Translational Research that bridges between scientific discoveries and clinical practice 5. Economic spin-offs that a strong research base provides by attracting industry and jobs Particularly relevant to surgery are the examples of improvements that have occurred through translational research i.e. improved care and outcomes that came from learning and experience when researchers and clinicians worked in unison to develop, evaluate and test novel interventions and improvements in the clinical setting. It is recognised that surgical disciplines have, for many reasons, real and imagined, been seen as lagging behind their medical colleagues in academic productivity and hence innovation. The correlation as to cause and effect is difficult to quantify, but at the very least, it will benefit from better alignment and integration and the embedding of academic goals and culture into practice and the learning environment.
56.1 Background and Introduction 56.1.1 Situation Stereotypes exist which characterise medical internists as thoughtful, analytical individuals who are mentally stimulated and challenged by complex disease processes. They are seen as determined in their desire to discover and understand the scientific basis of a particular reaction or outcome; surgeons, on the other hand, are categorised as action-oriented “cowboys” who cut first and think later. As with all stereo types, there may be some inkling of truth in this perception. It is believed that the attraction to a particular speciality within the medical profession is determined by that speciality’s appeal to the
P. Ellis
inherent behavioural characteristics of the individual and how that individual’s characteristics fit with the speciality’s perceived modus operandi. In the paper “Surgery Promotes World Peace” [1], the author recognises that surgical disciplines lag behind their physician colleagues in the availability of measures of academic productivity. In examining, however, the available research comparing the respective disciplines and also reviewing and correlating the causes for possible higher productivity within some surgical training centres over their peers, the author finds the variables too numerous to draw any scientifically valid conclusions, hence the ironic title of the paper. This being said, there are significant variances, and this chapter takes the view that although surgery has some unique cultural challenges, it, more than others, will benefit from attempts to raise the overall level of academic integration and collaboration. Modern medicine is becoming increasingly complex from the perspective of the needs of the patient and the required knowledge and expertise of the provider. This militates against the individual “jack of all trades” practitioner and requires recognition of an individual’s limitations and conversely the need for a teambased approach. Having acknowledged this possibility, it is difficult to refute the generally accepted premise that care, based on a culture of rigorous enquiry, will lead to enhanced patient outcomes. This precept equally applies to surgeons as to any other medical discipline. This chapter attempts to address two drivers, which would enable and promote the creation of an embedded commitment to rigorous enquiry 1. The development of surgical research 2. Collaboration within surgical research These issues are addressed from the perspective and responsibilities of: 1. Academic institutions 2. Public sector 3. Commercial sector It should also be noted that the issues of research development and collaboration are examined from a broader perspective than just surgery, as many of the cultural and structural drivers that are required are not limited to one speciality. In fact, addressing them purely from the surgical perspective would be counter intuitive to the concept of collaboration.
56
How to Enhance Development and Collaboration in Surgical Research
56.2 Complications The achievement of the laudable goal of collaboration is hampered by the structural, reimbursement, political, fiscal and geographic barriers that exist between the various stakeholders, including, but not limited to, the universities, their faculties, hospitals, community services and the professions. Attempts to patch over these complications have led to a number of ad hoc measures with varying degrees of success. The publication of the NHS guidelines on “The NHS as an Innovative Organisation” [2] signalled a new era in recognising the importance of harnessing the opportunities of working in a seamless way with academic partners. Other task forces have addressed the NHS/University interface [3] and the Nuffield Trust is publishing a series of ongoing reports and recommendations on University/Hospital partnerships [4].
56.3 Opportunity In addressing the opportunities for greater collaboration and integration, we cannot deal with surgery in isolation; although its peculiar and particular challenges are discussed later on in this chapter. Many of the opportunities and barriers are structural and emanate from the top; hence, the overarching structural barriers have to be addressed at the institutional level.
56.3.1 Academic Institutions The first challenge is to define what is included in the term “academic institutions”. Traditionally, one would think of universities, medical schools, faculties of medicine and the royal colleges. This begs the question whether the so-called “teaching hospitals” in the UK are classified as academic institutions. UK teaching hospitals have changed dramatically over the last 30 years. The level of their interactions with academic institutions and the derived benefits have seen some waxing and waning over the years, as both institutions have been subject to repeated structural changes. These changes have, at various times, enhanced or diminished that relationship. The underlying “direction of travel” has been to create a wider gulf between the medical schools/faculties of medicine and their respective affiliated hospitals.
697
Significant milestones have included: • In 1974 the demise of the teaching hospitals as a self-governing entity with loss of their quasi-independent status • The proliferation of “university” and “teaching” hospitals throughout the country where the academic activity is limited to a minor clinical teaching roles • The establishment of NHS Trusts and Foundation Trusts • The merger of Hospitals into larger trusts • Research Institutes loss of independent status • Merger of Schools and Faculties of Medicine • The establishment of separate university biomedical research centres • The creaion of new split campus medical schools • Changes to the respective systems of funding by HEFC and DH have lead to misaligned systems and incentives. Hospitals and Universities recognise the benefits that accrue to the health of the population; economic spinoffs and the reputations of their respective organizations and staff from closer physical and organisational relationships over the years. Within some hospitals and universities, there is a belief that evidence and examples exist of even greater potential being achieved from further integration as has been demonstrated in other jurisdictions. The creation of Foundation Hospitals should have provided further opportunities to rethink the model to best exploit these benefits in the UK. To understand the art of the possible, we need to look outside of the UK. The integration of teaching, education and service in the healthcare environment has been the Holy Grail in many jurisdictions. It is premised on the belief that teaching students and delivering care and services in an environment that values rigorous enquiry and evidence-based decisions leads to exemplary health care. Physical proximity is a basic ingredient to remove the barriers for interaction and enhance translation research. Common governance, purpose and shared resources can then be applied to drive through the benefits that the opportunity provides. This integration is sought across many differing parts of the system not just within medicine; between disciplines (medicine, psychology, nursing and physiotherapy); between discovery, developmental and clinical research; between faculties and between institutions. It is also important to recognise that this could impact more on surgical disciplines than medical. Unlike
698
P. Ellis
metabolic and molecular advances, surgery, by its nature, requires innovation to be developed and implemented at the bedside and/or in the operating room. Thus, the integration between the clinical and academic setting has to be as seamless as possible. A search reveals many models and approaches to achieving this across the world. Why is this so important in other countries and why is it essential to ensure the UK has structures flexible enough to pursue this vision? There are a relatively small number of potentially true Academic Hospitals in the UK. Some University Hospitals, Research Institutes and Faculties of Medicine recognise that they would benefit from closer integration from governance, structural, clinical and programmatic perspectives. Foundation Hospitals provide the opportunity to develop customised governance models to build on this historic relationship. The recent embracing of the Academic Health Science Centre (AHSC) concept as a result of the Lord Darzi review has provided a framework for creating a UK version of such structural concepts. The figure below (Fig. 56.1) represents a model that can be considered within the bounds of the possible within the UK legislative framework. The ability to integrate and have joined-up governance and management without compromising legislated accountability is possible. What it requires is the willingness of regulators and bureaucrats to use their enabling legislation to make it happen and to accept the consequential risks in the light of the potential rewards. It will also require
the DH and NHS to recognise the academic dimension in the programme focus and strategy of such entities. Similarly, the involvement and leadership of the respective universities in creating these entities will be crucial to their ongoing success. Such structural alignment will enhance the integration of research, while improving health outcomes and becoming a magnet for related health research and economic development in the area. It may also be a model for a new form of University/Hospital partnership that enhances the UK’s academic medical institutions to compete more effectively in the “global academic marketplace” for grants, recognition and staff. The potential benefits are outlined in the next section.
56.4 Public Benefits of Research The benefits of research to the health and wealth of the population can be demonstrated and categorized into four areas: • Research that has helped understand the determinants of health and informs preventative strategies • Medical research that develops new interventions to cure and treat disease • Research into innovative ways of delivering care • Translational Research that bridges between scientific discoveries and clinical practice Parliament
Monitor
HEFC
Dept of Health
SHA University/College Governors
Commissioners
NHS FT Trust(S) National and International Board Organisations
Joint Venture Academic Health Science Centre Board Granting Agencies
Fig. 56.1 Academic Health Science Centre accountability under 2006 NHS Act
Funding Accountability Schematic 1
AHSC Revenues Fund Raising
56
How to Enhance Development and Collaboration in Surgical Research
56.4.1 Research into the Determinants of Health In the desire to promote health rather than solely focusing on illness, much research has been undertaken to determine the factors that correlate to the onset of the disease. Some of these are social and cultural factors, whereas others are environmental in nature. Prof Sir Michael Marmot’s extensive research work known as the “Whitehall II Study” [5] demonstrated the socio-economic factors and cultural influence on the incidence of heart disease and overall health of the UK civil service. This work has influenced attitudes to early childhood development, stress, job control and satisfaction. Although these determinants may well be social in origin, it was the study of the incidence of the early onset of certain disease among differing socio-economic groups that led to the identification of the likely causes. It has led to a raft of recommended changes to early childhood development and working practices that are intended to reduce illness and absence from work in the longer term. Similar correlations between smoking and heart disease and lung cancer; sun exposure and skin cancer; and diet and heart disease are all the result of clinical observations that have been demonstrated through population epidemiological research. These research results have in turn lead to recommended lifestyle changes, and hence reducing or delaying the burden of illness on the state and its healthcare system. This approach has also been applied with success into understanding the impact of differing practices in the workplace and the early intervention in the management of workplace injuries. This learning has been demonstrated to have a positive impact on employee absence and long-term disability with its consequential substantial economic benefits [6, 7].
56.4.2 Medical Research The examples of benefits to the health of the society and the wealth of the nation that have been derived from medical advances are well documented from vaccines for the most virulent diseases to antibiotics and infection control. Recent advances in genomic and proteomic research have opened a new field of opportunity
699
to eradicate or reduce the burden of diseases that have proved resistant to traditional approaches such as cancer, heart disease and degenerative diseases. The management of these diseases is particularly costly to society and deprives quality of life years from individuals. Genomics and Proteomics are also opening the door to “personalised medicine” by indicating the geno- and pheno-types that would be most responsive to specific interventions, thus enabling more targeted treatments. An oft quoted example is that a particular drug is only effective in 33% of cases. Unfortunately, to date, no one has been able to determine which 33% of the population is likely to be responsive, hence an expensive process of trial and error and distress for patients. Recent UK initiatives such as the government’s approval of stem cell research [8] and the Biobank project, an initiative of the MRC, The Wellcome Foundation and the DH are projects that promise major advances in the health and quality of life of our population. Surgeons and surgery could be dramatically affected by these initiatives, both in the interventions available to them and the specific skills they will need.
56.4.3 Health Care Delivery Research Notwithstanding the long-term benefit of discovery research, many of the advances that have improved the care and quality of life of individuals have been through innovation in the way care is delivered and managed. These have included: • • • • •
Diabetic Education Centres Angioplasty Keyhole surgery Home-based chemotherapy and respiratory care Image guided surgery
While the UK is not known as an early adopter of innovative ways of delivering care, it has been a major beneficiary from a political perspective. Its historically low investment in healthcare, 6% of GDP, has resulted in a similar reduced level of access and consequentially, some of the worst outcomes and survival rates relevant to other developed countries [9]. Without the ability to meet latent demand through the improved throughput, the NHS would have faced even greater crises. It is innovations, resulting from research into novel ways and technologies to improve access and
700
throughput that have dramatically improved the utilisation of expensive resources and have allowed the system to expand and meet some of the increasing burden of an ageing society.
56.4.4 Translational Research Improvements in health care delivery can often be shown to be the result of the synergy that occurs when clinical experts work collaboratively with academic partners to develop and confirm intuitive beliefs, clinical observations and discoveries. Translational research has allowed clinical observation to discover new and innovative applications of existing products and/or technologies with dramatic improvements in healthcare delivery and outcomes. In surgery, the standard approach to stomach ulcers was turned on its head by the Australians Warren and Marshall who observed the role of H pylori as the potential cause. This approach has been succinctly phrased by Albert Szent Györgyi, 1937 Nobel Laureate in Physiology and Medicine “Discovery consists of seeing what everybody has seen and thinking what nobody has thought”. The developments and benefits of translational research are well documented by Annetine Gelijns et al. in a New England Journal of Medicine article and a subsequent book on the subject [10]. In the article and book, the authors demonstrate that within the realm of pharmaceuticals, for example, the discovery of unexpected beneficial new uses for existing products is a widespread and often serendipitous phenomenon that appears to be dependent upon extensive empirical experience. The authors point out that aspirin was the most widely used pain killer in the world for a century before its usefulness in cardiovascular disease was established. Similarly, adrenergic beta blockers were originally used for arrhythmias and angina; today, they are used in the treatment of more than 20 diverse conditions, including hypertension, gastrointestinal bleeding, alcoholism and migraines. Dr. Gelijns and Dr. Rosenberg’s study focuses on both diagnostic and therapeutic technologies. On the diagnostic side, imaging devices and endoscopic technologies were selected in view of their clinical and economic significance. In the case of therapeutic interventions, the management of clinical conditions – ischemic
P. Ellis
heart disease and gallstone disease – was selected as the starting point for analysis. This approach, the management of a particular condition, was chosen because it involves alternative technological solutions provided by different medical specialties. Consequently, the development of these technologies does not take place in a vacuum, but is shaped by the underlying patterns of medical specialisation, the competition and/or cooperation among medical specialists and their relationships with the industrial firms. The authors’ findings on the nature of the inventive process have several implications, which are related to optimising the investment in medical research and exploiting its potential economic benefits • The huge uncertainties inherent in the later stages of the medical technology life cycle, and the unexpected, beneficial uses that emerge only after extensive clinical experience have important and challenging implications for the allocation of medical research budgets among different categories of research (basic, translational and clinical evaluative research). • Academic hospitals, which encourage interactions between people with very different competencies and skills (e.g. basic research, clinical practice and clinical research), are key institutions in the innovation process. International differences in the organisation and strength of academic medicine are critical to explaining cross-national differences in the rate and direction of medical innovation. Policymakers should recognise that changes in the “market” position and financial strength of these centres will shape, and could threaten, patterns of medical innovation and its location. • The authors discuss improvements in working relations between academic medical centres and private firms, whose cooperation is vital to medical innovation.
56.5 Collaboration and Its Benefits In surgical research, as has been stated, collaboration is a critical component. In fact, it would be hard to think of a discipline where this is more of a requirement. If we consider the main recent innovations in surgery: • Minimally invasive surgery • Bloodless by-pass surgery • Lithotripsy
56
How to Enhance Development and Collaboration in Surgical Research
701
• Robotic surgery • Image guided surgery • Stents
56.6 Evidence of Value/Demonstrated Benefit
All of the above and others have only been achieved through intense collaboration with a variety of other disciplines and professions. These include:
The underlying assumption is that a natural synergy exists between research, education and clinical care. This synergy adds value and produces additional beneficial outcomes for the population’s health that could not be achieved independently. Evidence of this benefit is difficult to prove as multiple factors can influence outcomes. The figure below (Fig. 56.2) shows the current difference in standardized mortality rates between Academic and Non-academic hospitals in the UK. This is data from the late 1990s. The inference is that academically affiliated hospitals have lower mortality. Additional evidence is hard to come by. In reviewing the current literature, mainly statements of ideals are to be found e.g. “Close collaboration between the Universities with medical schools and the NHS is essential.” The successful outcome of this co-operation is a key feature in determining the quality of the nation’s health... “NHS university interactions have been at the forefront of change in healthcare services, translating advances to patient care and demonstrating this through their education role……” (John Wynn Owen in foreword to Smith) [4]. There is evidence that successful models of integration required active management and incentives to align research education and clinical service strategies – and that approaches to date within the UK – require rethinking. (See Smith for a summary of recent development/reports on the University/NHS interface). In April 2002, HEFC/DOH published a statement of strategic alliance [19] – “a framework for strengthening partnership working….” (An MHA report – NHS support for Science was published in April 2002 [20]). New concepts and terminology are emerging and two major initiatives on either side of the Atlantic were identified:
Bio engineers Material scientists Information scientists Physicists Chemists Fluidic scientists Design engineers Electrical engineers It is therefore appropriate to suggest that in no other discipline of medicine is collaboration more important, and conversely, the structures and environment to encourage this combined with the belief, mistaken or otherwise; that surgeons tend to be developed and educated in the belief that they alone are accountable for their interventions, could be at the heart of the perceived below average performance when it comes to research productivity compared with other medical specialties. The other aspect of collaboration is the particular importance given to surgeons of translational research. Translational research allows discoveries to be developed, evaluated, enhanced and demonstrated in the clinical environment [11]. Hospitals and Research Institutes have recognised this approach as being at the root of their rationale for close proximity and integration. The premise is that translational research is facilitated and enhanced through close daily interaction between basic researchers, clinical researchers and clinicians, and further that it benefits the health of the population and the wealth of the nation. The work of Gelijns and Rosenberg cited earlier has identified many of the benefits, spin-offs and opportunities. This premise was further reviewed as part of this evaluation to determine if the value and the system benefits that will accrue through a more integrated approach can be demonstrated. The review was intended to identify recently documented examples of the value/demonstrated benefit of research to healthcare relevant to the UK environment. The link with teaching was also recognised as important. The literature review was undertaken using the approach and sources described in the Appendix Table 56.1 (below).
1. University Clinical Partnership (UCP) A strategic alliance between local partners in which the components of the virtual organisation agree to a strategic framework for co-ordinating research, education and service – both within the centre and through wider ‘networks’ – and aiming to harness research, as well as education and service provision. This initiative is being developed within Nuffield Trust [4].
702
P. Ellis
Table 56.1 Table of benefit of hospital academic collaboration cited in the literature Value Demonstrated benefit Vision
Source
The most desirable scenario would be a mutually beneficial relationship between industrial innovation and healthcare innovation
Peckham [4, 12]
Joint strategic commitment – harness effort e.g. “We heal, we teach, we discover” – University of Maryland
Smith [4]
Categories of payback/benefit
Knowledge Benefits to future research and research use Political and administrative benefit Health sector benefit Broader economic benefit
Buxton [12]
“Value”
University clinical centres “added value” as contribute to health services, service delivered based on best practice, the latest knowledge etc., which is fed into education and the quality loop – with clinical governance as specific opportunity for this added value to be demonstrated
Nuffield Trust [4]
Quality
… provides standard setting arenas for high-quality clinical care (with particular reference to clinical governance – as opportunity to demonstrate “added value”)
Smith [4]
Superior performance of teaching hospitals for the treatment of patients with hip fractures, heart failure and pneumonia – quotes US study on myocardial infarction [13]
Peckham [14]
Integration
Suggests conceptual framework for evaluating the costs and value added from the relationships between Veteran Affairs Medical Centres (VAMC) and their affiliate medical schools. The characteristics of affiliation (relationship of trust, shared programmes and integration of physician facilities) and costs and value added are linked to impact on resources (value added – resident workforce, broader base of specialities and resources for research and costs – those inherent in medical education) and impact on physicians (value added – continuing education and recruitment and retention and costs – medical schools control of physician incentives). Recognising the lack of literature, the article suggests that a further study be undertaken to quantitatively measure the various components of the model
Leeman [15]
Co-location
“….Facilitates effortless interaction between clinical and non-clinical investigators – putting laboratories in close physical proximity to clinical institutions with access to patients makes enormous sense”
Blumental in Smith [4]
Synergy
“… in a research continuum, from the bench to the bedside, the discoveries in basic science are transferred quickly to the clinical setting”…
Smith [4]
On UCPs “… the intention is to align clinical and academic resources for maximum impact”
Smith [4]
… “critical mass of expertise and activity in an arena in which there is an active dynamic between education, research and health services.. co-ordination between the NHS and university sectors is vital in exploiting the full potential of education, science and research…” … Can harness technological opportunities and basic science for healthcare delivery CHF – AHC …. working paper [16] Development of networks (UCC concept) … potential for best practice and knowledge and research networks and for associated resources to be effectively harnessed…pivotal role … in a knowledge-based NHS, as both a critical mass of expertise and as an arena in which there is a dynamic between education, research and health services.
Smith [4]
Benefit to future … Translation of new biological insights into practical application requires access research/research use to patients… (see also clinical protocols – health sector benefits)
Smith [4]
Health sector benefits
CHF – AHC working paper [16]
Knowledge
….potential to develop best practice models for more prospective care
56
How to Enhance Development and Collaboration in Surgical Research
Value
703
Demonstrated benefit
Source
.. Research department writes clinical protocols – which have been adopted regionally and nationally (Academic Medical Centre – Univ. of Amsterdam)
Nuffield Trust/ Smith [4]
Recruitment and retention
Blumenthal in Smith [4] Oinonen [17]
Survey of faculty members of 80 Academic Health Centres (AHCs) (members of University health system consortium) demonstrated consensus that clinical research offered AHC benefits including prestige recruitment and retention of faculty, criteria for promotion, and financial support. Also identifies opportunities for AHCs to provide wider range of incentives for the control of clinical research Broader economic benefit
Competition in global research markets requires access to large diverse patient populations
Blumental in Smith [4]
“Academic clinical centres make important contributions to regional and national economies. Centres are major employers … and investment … produce innovations that generate wealth and make a positive contribution to UK plc”
Nuffield Trust/ John Wyn Owen in Smith [4]
“It is my view that in collaboration, universities with medical schools and university teaching hospitals make a pivotal contribution to the regional health economy” Interaction/ interpersonal elements
… academic clinical collaborative relationships – development/implementation of reality-based learning for students
Gassner [18]
Global drivers encourage focus on quality, education and knowledge …in turn, need to facilitate continuous interplay and engagement between practice, education and knowledge …
Smith [4]
Hospital Standardised Mortality Ratio for Acute Trust 2004/5 130 3 SD limit 2 SD limit
Hospital Standardised Mortality Ratio (HSMR)
120
Non AUKUH AUKUH
110
100
90
80
70
60 0
500
1000
1500
2000
2500
Expected number of deaths
Fig. 56.2 Integration of teaching, research and healthcare provision – demonstrable clinical benefits
3000
3500
704
P. Ellis
2. Commonwealth Health Fund Task force on Academic Medical Centres [16]
56.7.2 Co-Location
This Massachusetts, USA initiative has produced a series of working papers, and a final report. The task force is focusing primarily on processes and associated management/governance to facilitate coherence and alignment of clinical and academic systems. The existence of this initiative confirmed the recognition of the challenge of linking aspiration with demonstrated results and is recognised in other jurisdictions that have a longer history of academic/clinical integration. Smith’s work draws heavily on a case study (Cardiff) and insights from international experience, concluding that while there are no “off the shelf solutions”, there is a general agreement on the need to effectively manage the overlapping areas of the tripartite mission and to create the active dynamic, which is the source of the added value created. Clinical governance is seen as a potential “driving force” for new systems to emerge. Material on “added value”/demonstrated benefits from the literature review is summarised in Appendix Table 56.1.
This physically locates various functions in proximity to each other. It allows regular interaction between the parties and benefits from the horizontal proximity of the respective facilities. In this model, the so called “white coat distance”, which allows people to move between clinical, educational and research functions without having to change into outdoor clothing, is recognised as a crucial element.
56.7 Implications of Collaboration Successful integration between academic hospital trusts and universities will require upfront financial investment and a strong will to overcome the many real and perceived barriers. However, it is not difficult to justify a model that, while assuring aligned goals that facilitate the required level of integration, will also generate significant health and economic benefits greater than the sum of the parts. There is a continuum of approaches to provide varying levels of partnership and integration. These are shown below in an escalating order of difficulty to achieve, with concomitant associated benefits.
56.7.1 Virtual Organisation This relies on electronic links and communication to bind together a variety of enterprises. An example would be national research consortia that bring together a geographically spread group of universities, research institutes and trusts linked by a single grant and shared research goals and information resources.
56.7.3 Joint Ventures This establishes a formal jointly owned entity that manages all or particular functions on behalf of the founding partners. It would normally have committed funding and a common set of objectives and performance criteria.
56.7.4 Merger This is the highest level of integration, which consolidates the assets and resources of the partner organisations into a single entity. It implies common governance and a single vision, purpose and shared resources. Current NHS restrictions on ownership and alternative governance structures severely limit the importing of effective models for world class AHSCs found in most other jurisdictions. The Universities enjoy a level of freedom and autonomy that would allow any of the above models. As a result, the highest current legally feasible level of integration between hospital trusts and universities that will enable and facilitate their interaction is often a “partnership”. Higher levels of integration are legally possible through the freedom promised by Foundation Hospitals. The legal framework exists under the 2006 NHS act for non-NHS trusts to apply under section 34 for designation as an NHS Foundation Trust, thus allowing universities or their subsidiaries to create single governance AHSC’s. Without this ability to invest in and create novel structures that align the mission and resources to common goals, the maximum potential of cooperation will not be achieved. This should not however be a reason for not maximising the opportunities within what is currently allowed and feasible.
56
How to Enhance Development and Collaboration in Surgical Research
Fig. 56.3 From genome to health – integrating knowledge
705
Organism
The Physiome Project
Health
Organ
Tissue
Cell
Gene, Structure & Function: Experiments, Databases System description
Molecule
Quantitative system modeling Archiving & dissemination
Genes http://nsr.bioeng.washington.edu
The underlying culture of the DH cannot be ignored. The political desire to avoid heterogeneity within the NHS is in itself a barrier, although there appears to be an emerging recognition that “one size fits all” is not going to carry this system forward in the long term. The way these concepts are presented will be as important as their reality. The advantages of integration accrue to the partners in many aspects of their operations.
56.8 Benefits of Collaboration to Research Earlier in this document, the benefits of research to the health and wealth of the population were identified. This section identifies why co-location will improve and enhance the opportunity to exploit these benefits. There have been dramatic changes in research that have been brought about by the advent of genomic and proteomic research. This is unlocking the possibility of personalised medicine. As a result, research is focussing on interventions that are specific to an individual’s genotype. Such a systemic view of health has lead to the coining of the term Physiome [21] to reflect this comprehensive approach. The ability to determine and predict patient-specific responses to interventions requires the ability to follow molecular research through clinical investigation into the clinic and to the bedside. Organisations that can offer capabilities across this spectrum in a co-ordinated and integrated fashion are
destined to be best placed to compete for research funding and deliver health gain (Fig. 56.3) (above). There are further benefits that will accrue from integration to specific areas of research.
56.8.1 Discovery Research This area, encompassing the basic sciences and including molecular, biological and genomic research, will benefit from the pull of addressing real health issues and challenges. Structures should allow groupings of investigators to mirror the major clinical programmes of the hospital, and thereby foster a “bench to bedside” culture and capability. In surgery, stem cell research in the regeneration of damaged tissue and the reduction in rejection in Transplantation through better matching are examples of the importance of links from genome to bedside.
56.8.2 Clinical Research Clinical investigation units and clinical services that operate in an environment of rigorous enquiry will ensure such units are perceived as centres of choice for delivering clinical research projects. These centres could be the incubators for collaboration as they bring together many disciplines. In surgery, the development of intelligent prostheses that monitor fit and wear are proven through such centres.
706
56.8.3 Health Services Research Research into optimum ways of organising, managing and delivering care is dependent on university centred resources to objectively design, manage and interpret studies of this type. The close proximity of academic and clinical services facilitates the relevance and dissemination of the results of these studies. The management of Breast Cancer surgery through an academic hub linked to general units has shown improved outcomes.
56.8.4 Clinical and Population Epidemiology As with health services research, this field requires close working level co-operation between scientists and clinicians, as well as shared access of clinical data. The correlation of mobile phone use to road accidents was the result of collaborative research by trauma surgeons and clinical epidemiologists using multiple data sources.
P. Ellis
clinical problem solving. Revised curricula are based on approaches such as “problem-based learning”, selfdirected learning, non-expert tutors and the GMC integrated course. This requires an integration of basic sciences and clinical skills into modules that are taught at the bedside from the first year of medical school. The onus is placed on the students and their tutors to solve problems using critical enquiry skills. In post graduate education, an environment of evidence-based decision making and keeping all faculties and clinicians up-to-date in the emerging techniques is dramatically improved through the interactions that occur in integrated facilities. The presence in surgical rotations of academic activity and research at an early stage in an individual’s medical education is one of the few initiatives that appears to have been correlated to the subsequent development of surgical scientists [1]. The Institutions involved in this initiative have recognised the need for physical proximity of hospital and medical school activities to do this. Harvard recognised that this could not be achieved with its affiliated academic hospitals through a centrally located medical school and created devolved academies based at its major teaching hospitals to facilitate this.
56.8.5 Technology Convergence 56.9.2 Impact on Allied Professions Materials, Engineering, IT, Physics and Physiology are just some of the many science and technological disciplines that are jointly involved in developing today’s sophisticated drug delivery systems and medical devices. Ease of access to these skills and competencies ensures a competitive edge for any research project requiring them. This is the area of great potential for the development of surgical research and collaboration. Material scientists and biomedical engineers have had a long track record of enabling major breakthroughs in surgical care.
56.9 Additional Benefits of Collaboration
“Patient focussed care” [23] philosophies are attempting to break down the professional silos that have been barriers to effective patient care delivery. By ensuring, professionals are integrated at an early stage in their education, the value of this approach becomes self evident. Increasingly, inter-professional modules involving students from faculties of medicine, nursing, psychology, physiotherapy etc. are being offered in clinical settings. Therapists and biomedical engineers have benefited from the integration into structures that are patient focussed i.e. hand clinics rather than professional departments.
56.9.3 Advantages to Health Service Delivery (NHS)
56.9.1 Advantages to Education Curriculum renewal in many medical schools [22] is realigning medical undergraduate education around
The presence on the same site of the leading academic staff in complementary areas of focus can only enhance an organisation’s ability to pursue its mission in
56
How to Enhance Development and Collaboration in Surgical Research
providing exemplary care in related diseases. It also ensures smooth translation of research between the clinical and academic settings. The surgeons incubator tends to be the unit or the operating suite rather than the laboratory, and for this reason, proximity of the supporting competencies in biomedical engineering and IT are prerequisites of successful collaboration and thereby innovation. 56.9.3.1 Proximity There are a number of economic benefits listed below in the resources section, which accrue from the economies of scale that come from co-location. The critical mass created permits specialism and super-specialism. Benefits will accrue to the reputation of the NHS from the creation of such “super academic hospitals”. Unlike many other jurisdictions, the NHS only has one generic classification of teaching hospital. This applies to a myriad of institutions. The suggested integration provides the opportunity to emulate the major AHSCs such as John Hopkins [24] by bringing together the academic and clinical activities in an integrated network or partnership on a single site.
707
As the innovations in surgery tend to be “products” that require material and engineering, the broader access to these skills that an academic institution can provide is vital. The potential for combining incubator, IP management and commercial management support will be enhanced. A centre of excellence of this standing should also attract donations and private sector support to establish a seed fund to launch companies and exploit the Intellectual Capital potential that such a concentration of resources and capabilities will generate.
56.9.3.4 Improved Resource Utilisation The resources required for supporting the activities of the research scientists and clinicians are both scarce and expensive. High throughput screening devices, MRI machines, containment laboratories, and the people that work in them are recognised as scarce, expensive resources. It would be foolish not to share access to these resources to maximise their utilization and to optimise the use of available space. This can only occur if they are integrated and co-located on shared premises. Surgeons in particular require animal facilities and imaging capabilities to facilitate their projects.
56.9.3.2 Recruitment and Retention
56.9.4 Aligned Capital and Funding Opportunities Attracting and retaining staff that are committed to working in an evidence-based culture is enhanced. These AHSCs will provide informal and formal opportunities for collaboration, cross appointments and access to resources. When it comes to international competition for clinical and research leaders, it will give the major academic centres a competitive offering.
56.9.3.3 IP Exploitation The combined intellectual capital potential emanating from the combined resources is significant. Currently, resources are being wasted as separate institutions duplicate each other’s efforts, and in some cases, compete over the same IP. Surgeons in particular are adept at developing innovative products that unlike molecules are easier to commercialise. The surgeons need ready access to advise on funding and exploitation that academic tech transfer offices are set up to provide.
There are significant differences in the approaches to PFI and the ways in which Universities and Hospitals account for depreciation. To overcome this, options could, it is believed, be developed that allow creative approaches to funding. These could be resolved by either a capital grant from the DH, a Treasury approved transfer, or the commitment by the DH to a revenue stream for R&D capital development that would allow sites to be developed through a single PFI. An integrated approach would also facilitate the proceeds of any sale of existing assets being available for the redevelopment, without unnecessary haggling over who owns what. Where Foundation Hospital status exists, there is an opportunity for an even more innovative governance structure that may allow dealing with PFI contractors on more adventageous and integrated terms. An important opportunity also exists to engage in joint fundraising for projects, levering the hospitals name
708
P. Ellis
and brand along with the respective university or research institutes’ capabilities in co-ordinated campaigns.
56.9.5 Commercial Opportunities and Economic Spin-Off Benefits The benefits of research to the wealth of the population have been addressed in some of their aspects above. Keeping individuals free from disease and maintaining a healthy workforce achieves both cost savings and improved revenues. Beyond these benefits, however, the UK has been particularly successful at attracting inward investment to support its burgeoning biotech sector to create the world’s second largest biotech industry by size [25]. The Biotech “Golden Triangle” of Cambridge Oxford and London is by far the academic engine and magnet for that industrial growth. In addition, each of the UK’s RDAs has identified a life or biological science priority for its development plan. An integrated approach that harnesses the IP with local capabilities and technology is the driver of economic investment in a community (see figure 56.4 below).
56.10 Contract and Industry Research Increasingly, industry is a major potential source of research funding. Industry’s choices of partners within academic and clinical centres are determined by the ease with which these relationships can be governed and managed, and the scope of services offered. One-stop shopping on a single site presents an attractive opportunity to them. Thus, the breadth of scientific disciplines and the degree of technology convergence will be factors in a company’s choice of location and collaboration partners. As an incentive to encourage participation in such clinical research activities, methods have to be instituted to allow the revenue generated from these activities to return directly to the clinical services involved for local investment in improving their range of services.
56.11 Biosciences Cluster The creation of a health-related bioscience cluster is highly feasible using the AHSC as a magnet. This would
Cash Flow
Investments
Intellectual property (patents, copyright, etc.)
Capabilities (skills and facilities)
Platform technologies Schematic 4
Fig. 56.4 Wealth is achieved through a virtuous circle that combines IP, capabilities and technology
Products and Services
56
How to Enhance Development and Collaboration in Surgical Research
require an adjacent “science park” and incubator facilities, which would attract related industry partners to establish a presence on the site. Evidence of surgical research critical mass is a major determinant for the location of medical device companies
56.12 Advantages to Biotech Sector Development The DTI has put in place a number of programmes to support the Bio-sciences sector. It is a declared priority area for the government. Creation of new companies and collaboration with the industry is a key aspect of that policy [26]. The success of this sector has been demonstrated as a result of proximity and the creation of clusters. Philip Cooke, in his article on creating clusters as key determinants on economic growth [27], cites the co-located initiatives such as the Bio Square Technology Park, which is supported by the HarvardMass General-MIT-Boston University cluster. This cluster had spun out some 200 companies and created in excess of 17,000 jobs. The report also cites the success of the Cambridge UK cluster in spinning out some 50 core biotech companies in a similar period. London, in general, has enjoyed some success in this area. The London First biotechnology network [28] identifies some 90 companies and 1,000 jobs that have been created by the UCH, IC, KCL and St. George’s campuses and related institutions. This number, however, is disproportionately low when considered in relationship to the amount of research funding and clinical activity in these centres. It is important to establish a cluster with sufficient critical mass of science, as well as clinical services to create the spin-offs and jobs and become the magnet for attracting the required industrial partners, collaborators and funders. The Cooksey Report in 2007 has reinforced the need for academic collaboration to keep the UK competitive in the Bio-science sector. The MaRS (Medical and Related Sciences) Centre in Toronto, Canada is a prime example. The MaRS Centre, both as a physical complex and as the hub for an extended virtual community, is designed to accelerate the commercialization of Canadian innovation by uniting the disparate worlds of science and technology with industry and capital. The MaRS Centre, which is housed in a building in excess of 150,000 sq. meters includes:
709
• Research facilities for some of the area’s top scientists and incubation facilities for young companies. • A cluster of professional services firms and investors, technology transfer offices, research and community networking organisations and mid-sized and established global companies. • State-of-the-art conference and multimedia facility, as well as the programming required to animate the shared spaces and maximize the impact of cluster development. • The unique urban setting of MaRS connects it to other research and educational facilities in the area, the financial district and the multi-cultural, creative city core through a direct link to Toronto’s public transportation system. • MaRS extends its community from this strong foundation to other regions of Ontario and beyond through people networks and an advanced Web portal. As an example of the benefits that accrue from such an initiative, Toronto will now be the Headquarters for the NIH funded International Cancer Genome Consortium.
56.13 Biosciences Business Park and Incubators As previously stated, a critical mass of research-oriented clinician scientists and innovation potential attracts industry and investment. While it may be perceived as an added complication, ways should be sought to partner with other adjacent research institutes, the pharmaceutical industry, established Biotech companies and the medical device sector. It also may be possible to consider establishing a surgery oriented bio science park and /or bio incubator to nurture the development of innovative technology, products and companies.
56.14 Alternative Models 56.14.1 Successful Models from Other Jurisdictions (Academic Health Science Centres) Across the globe, there are a variety of models that have been developed to achieve the goals of integration.
710
These are often either called Academic Medical Centres (AMC's) or Academic Health science Centres (AHSC's). The structures in many ways reflect the culture and societies in which they exist. There are however aspects and approaches that both legally and culturally are worth considering. The Table 56.2 shown at Appendix is adapted from work done in the US by Weiner, Culbertson, Jones and Dickler [29] to look at the current range of models being pursued by different organisations in North America. This is addressing the larger issue of Hospital/Medical School integration, but the models could equally well be used for a postgraduate specialist facility and a related academic institute. Some of the options and titles have been altered to fit the UK environment. Notwithstanding the source, it is possible to relate these models and their consequences to the UK, as each would be feasible given the will of government, the hospitals and the University.
56.14.1.1 University owned Health Science Centres These are to be found particularly in North America: • Duke University Medical Centre, where the hospital and research institute are a wholly owned subsidiary managed directly through a University Vice Chancellor • Sunnybrook and Women’s College Health Science Centre which, while the hospital and research facilities are owned by the university, are governed in perpetuity by an independent corporation with representatives of the University, community and government through order in council appointments • McMaster Medical Centre, Hamilton, Ontario was the first school to implement problem-based learning and abolish the preclinical/clinical split in the curriculum because of its integrated existence on a single site • The University of North Carolina (UNC) provides an example of flexible governance. The State Legislature granted UNC Health Care System a quasi-governmental legal status in order to release the system from civil service requirements, state purchasing rules and other regulations that hindered its ability to compete (e.g. make acquisitions). As a result, UNC Health Care System has its own system board. The dean of the medical school serves as the CEO of UNC Health Care System and as a fullvoting member of the system board
P. Ellis
56.14.1.2 Hospital Owned Academic Institutions Examples can be found in US, Canada and The Netherlands: • Massachusetts General has its own research institute, with a budget of $300 million. It is also part of the Partners Healthcare system, which is described in the Community Network below • Mayo Clinic has its own medical school, which grew out of the hospital and research institute • Academisch Ziekenhuis Leiden (Leiden University Medical Centre) has created a single organisational entity for the hospital and medical faculty governed by a single Board • Toronto General Hospital has its own wholly owned research Institute, which attracts $90 m in peerreviewed funds
56.14.1.3 Joint Governance Alliances • University Medical Centre Utrecht. Regulations preclude the hospital owning a degree granting institution and they have therefore appointed parallel Boards for the hospital and faculty of medicine with identical membership • Hudinger Hospital Stockholm has Karalinska University members on its board and its operation and management are integrated to maximise effectiveness • Inspital Berne – the hospital, which is Foundation owned, and the University of Berne Medical School have an operating committee and shared executive to Manage their academic activities
56.14.1.4 Coalition • Kantonsspital Basle. The Hospital Owned by the Kanton of Basle has, with the University of Basle, a jointly appointed research director with budgetary and recruitment authority to the respective institutions for the combined resources of the Hospital and University. • Royal Marsden/ICR. As previously mentioned, The Royal Marsden has created an effective coalition through cross appointments and joint strategy, and operating committees with its neighbouring Research Institute.
56
How to Enhance Development and Collaboration in Surgical Research
Table 56.2 Alternative models of organizational governance Organizational University Hospital owned Alliance characteristic owned
711
Coalition
Community leader
Community partner
Hospital Organization
Highly organized
Highly organized
Highly organized
Loosely organized
Loosely organized
Loosely organized
Medical School – Hospital Relationship
School or University owns Hosp
Hospital owns Medical School
Separate Contractual
Possibly own with separate faculty plan
Separate Consortium
Separate Consortium
Financial Interdependence
High
High
High
Limited
None
None
Functional Interdependence
High,
High
High,
Moderate,
Low,
Low,
Dean’s Budgetary Authority – Hospital
Approves Budget
Receives Allocation
Requests Support
Participates in Budgeting
Participates in Budgeting
Requests Support
Dean’s Allocation Authority
Absolute
Non-Existent
Negotiated
Negotiated
Negotiated
Negotiated
Deans Role in Hospital CEO’s Appointment
Dean hires
System Board
Consulted
Vetoes in Owned; Participates in Others
Consulted
Consulted
Hospital CEO’s Role in Deans Appointment
University Board
CEO hires in line relation
Hosp CEO consulted
University Board
University Board
Hosp CEO(s) consulted
Dean’s Role in Hospital Chiefs Appointment
Appoints Chairs and Chiefs
Consulted on Chairs and Chiefs
Appoints Chairs, Consulted Chiefs
Appoints Chairs, Vetoes Chiefs
Appoints Chairs, Consults on Chiefs
Appoints Chairs
Dean’s Role in Hospital Governance
Chair of Not a Board System Board Member or NV Member
Voting Board Member
Not a Board Member or NV Member
Voting Board Board Leader Member
56.14.1.5 Community Networks • Karalinska Hospital Stockholm has a loose affiliation with Karalinska University, but as a county council governed institution, no direct University participation in its governance. This as the hospital believes, has led to the deterioration in its academic standing • Network North Toronto consists of a geographic cluster of hospitals, university and community, which co-ordinate clinical, teaching and research initiatives through a board, to which they all appoint members • Partners HealthCare, Massachusetts. In March 1994, the MGH joined with Brigham and Women’s Hospital to form Partners HealthCare System, Inc., an affiliation established to create an integrated health care delivery system providing excellent, cost-effective care while maintaining the hospital’s historic dedication to teaching and research
56.15 Aligned Vision, Structure, Process and Resources The relevance of the structural model is in its aiding the alignment of purpose between the respective organisations. The historic links in the UK are a series of patch work connections, which are ambiguous and confusing, and lead to duplication and waste in what are already very limited resources. For illustrative purposes, the links in North West London that were intended to enable communication were shown below (Fig. 56.5). As of October 2007, some of this has been facilitated by a new structure, which has brought together Hammersmith and St Mary’s hospital into a partnership with Imperial College with an integrated Board for the newly merged Imperial Healthcare NHS Trust that has Imperial College participation and the combination of the roles of Principal of the Faculty of Medicine with the Chief Executive of the Hospital Trust. This will be UK’s first integrated “Academic Health Science Centre”,
712
P. Ellis Current links between NHS Trust and Imperial College
DH/NHS
Chelsea and Westminister
St Mary’s
Research Centres
Royal Brompton and Harefield
Hammersmith
Institutes
BEP
North West Hosp
London HUB
Imperial
Schematic 5
DfES
Fig. 56.5 North-West London – historic links between university and NHS
and a number of other similar models are being developed throughout the country to emulate this concept. True Integration and its consequential benefits are achieved when the overall vision and the necessary structures, processes and resources to support such a vision are aligned to the common purpose. As stated earlier, there is a continuum of increasingly more integrated models that two organisations can pursue to create an organisation with a common purpose. The schematic below (Fig. 56.6) demonstrates the potential dynamics of any new relationship between an academic trust and its university to maximise the potential of the co-location. Both organisations have independent visions and overall purposes driven by their mandates and funding. The opportunity is in the area of overlap to create a new and innovative entity with aligned governance, purpose, authority, structure and resources. The challenge is to carve out a sufficient role for existing organisations that aligns
its structures and resources to create a dynamic organisation that will be greater than the sum of its parts, and in doing so, bring added stature, kudos and benefits to both the academic and healthcare systems. There are a limited range of models within the UK to facilitate this approach. What exists are a series of ad hoc arrangements. The Royal Marsden and The Institute for Cancer Research (which is an independent associate college of London University) are examples of two organisations which, through governance crossappointments and joint strategic and operating committees, have achieved some measure of success in exploiting their co-location. The AHSCs approved in the first wave in the UK have differing models of governance and management. These will act as laboratories for assessing the merits of their chosen approach. As a minimum, it is strongly recommended that consideration should be given to a governance and
56
How to Enhance Development and Collaboration in Surgical Research
713
Fig. 56.6 Conceptual model of integration Hospital Vision Purpose Resources Governance
AHSC Vision Purpose Resources Governance
University Vision Purpose Resources Governance
Schematic 6
management structure representing the NHS Trust, the University, and independent directors. This would be resourced and managed to undertake all activities on behalf of the parties as it relates to the following areas: • • • • • • • • •
Clinical strategy Clinical appointments Translational discovery research Clinical research Health services research Clinical epidemiology Health economics Technology transfer Start up investments
Further, the parties would put in place a joint strategic programme development entity to oversee strategic issues, fund raising etc. Processes would also be agreed to ensure joint appointments in all key clinical and academic positions.
56.16 Conclusion and Summary Surgical research cannot develop in isolation. Neither isolation between hospital and academic institution nor the surgical specialties form other disciplines and professions. Surgical innovation requires more than any other speciality collaboration with other medical specialties and a wide range of professions and scientists. The development of surgical research has been hampered by the lack of overt collaboration and the structural barriers that militate against it. The perceived culture of surgeons as “lone rangers” and the lack of an appropriate tangible rewards and recognition for
academic activity throughout training results in a lack of role models for future surgeon scientist careers. Structural realignment and physical proximity is a basic ingredient to removing the barriers to collaboration. This may, in the restricted environment of the NHS, be the only feasible option in the current environment. This will then allow them to develop common governance with their academic partner organisation; purpose and shared resources can then be used to drive through the benefits that the integration opportunity provides. This will enhance integration of research, while improving health outcomes and becoming a magnet for related health research and economic development in the area. It may also be a model for a new form of University/ Hospital partnership that will maintain UK’s academic medical institutions at the forefront. Thus, its surgeons will also be perceived as innovative leaders and equip them for the global environment in which they compete for jobs, grants, recognition and staff.
56.17 Search Strategy and Interviews The information used in this document was gathered through secondary research on existing publications and documents in the public domain and through discussions with senior academics, executives and government officials in the UK, other European countries and the USA: • Searches focused on value benefit as applied to the terms “research/healthcare delivery, academic medical/healthcare centre, academic clinical partnership, university clinical partnership/centre, health campus” – and retrieval of material.
714
• Search of Kings Fund Library’s resources and specific Internet sites: Department of Health, St Marys’, Paddington re-generation partnership, Royal Society and Royal Society of Medicine – and retrieval of relevant material • Interviews with a range of management, clinical and research staff • Additional literature searches provided through Peggy Leatt, Professor of Health Care Management, University of North Carolina • Interviews with Senior NHS executives and NHS Chief Executive • Interviews with Scandinavian, Dutch and Swiss Academic Hospital Chief Executives
References 1. Harken AH (2007) Surgical research promotes world peace. Ann Surg 245:524–525 2. Department of Health(DoH) UK (2002) The NHS as an innovative organisation: a framework and guidance on the management of intellectual property in the NHS. Available at: http://www.dh.gov.uk/en/Publicationsandstatistics/ Publications/PublicationsPolicyAndGuidance/DH_4002660 3. James J (2002) Report on health and education strategic partnerships. Department for Education and Skills. Strategic Learning and Research Advisory Group for Health and Social Care, London 4. Smith T (2001) University Clinical Partnership: a new framework for NHS/University relations. Nuffield Trust, Oxford 5. Wilkinson R, Marmot M (1998) Social determinants of health: the solid facts. World Health Organization (WHO) Regional Office for Europe, Copenhagen 6. Health Development Agency (2004) The evidence about work and health. Available at: http://www.nice.org.uk/nicemedia/documents/CHB18-work_health-14–7.pdf 7. Rick J, O’Regan S, Kinder A (2006) Early intervention following trauma: a controlled longitudinal study at Royal Mail Group. Institute for Employment Studies, Brighton 8. Department of Health(DoH) UK (2000) Stem cell research: medical progress with responsibility. Available at: http:// www.dh.gov.uk/en/Publicationsandstatistics/Publications/ PublicationsPolicyAndGuidance/DH_4065084 9. Moise P, Jacobzone S (2003) OECD study of cross-national differences in the treatment, costs and outcomes of ischaemic heart disease. OECD Health Working Papers, Paris
P. Ellis 10. Gelijns AC (1992) Technology and health care in an era of limits. National Academy Press, Washington, DC 11. Smith R (1995) The scientific basis of health services. BMJ 311:961–962 12. Buxton M, Hanney S (1996) Assessing payback on the investment in research. In: Peckham M, Smith R (eds) The scientific basis of health services. BMJ, London, pp 72–81 13. Gottlieb S (1999) Updates for US heart disease death rates. BMJ. 318(7176):79 14. Peckham M (2000) A model for health: innovation and the future of health services. Nuffield Trust, Oxford 15. Leeman J, Kilpatrick K (2000) Inter-organizational relationships of seven Veterans Affairs Medical Centers and their affiliated medical schools: results of a multiple-case-study investigation. Acad Med 75:1015–1020 16. The Commonwealth Fund Task Force on Academic Health Centers (2002) Training tomorrow’s doctors: The Medical Education Mission of Academic Health Centers. The Commonwealth Fund, New York 17. Oinonen MJ, Crowley WF Jr, Moskowitz J et al (2001) How do academic health centers value and encourage clinical research? Acad Med 76:700–706 18. Gassner LA, Wotton K, Clare J et al (1999) Evaluation of a model of collaboration: academic and clinician partnership in the development and implementation of undergraduate teaching. Collegian 6:14–21; 28 19. Department of Health(DoH) UK, Higher Education Funding Council for England(HEFCE) (2002) Statement of strategic alliance for health and social care. Available at: http://www. hefce.ac.uk/aboutus/health/stratal.htm 20. MHA (2002) NHS support for science – final report. Available at: http://www.dh.gov.uk/en/Researchanddevelopment/A-Z/ NationalNHSRDfunding/DH_4002021 21. Physiome Project. Available from: http://www.physiome.org/ and http://www.springerlink.com/content/v656605064j2177j/ 22. Harvard Medical School. Available from: http://hms.harvard.edu/hms/home.asp 23. Sidky M, Barrable B, Stewart H (1993) Patients first: small hospitals in Ontario favour patient-focused care. Leadersh Health Serv 2:8–11; 40 24. Johns Hopkins University. Available at: http://www.jhu.edu/ 25. Ernst & Young (2002) Beyond borders: the global biotechnology report. Ernst & Young, New York 26. Department of Trade and Industry(UK) iBio – (Information Biotechnology). Available at: http://www.i-bio.gov.uk/ 27. Cooke P (2001) Clusters as key determinants of economic growth: the example of biotechnology. In: Mariussen Å (ed) Cluster policies – cluster development? Nordregio report, Stockholm, pp 23–38 28. London Biotechnology Network. Available at: http://www. londonbiotechnology.co.uk/ 29. Weiner BJ, Culbertson R, Jones RF et al (2001) Organizational models for medical school-clinical enterprise relationships. Acad Med 76:113–124
Mentoring in Academic Surgery
57
Oliver Warren and Penny Humphris
Contents 57.1
The Mentoring Construct: Background .............. 715
57.2
Current Challenges ................................................ 716
57.3
Mentoring: An Important Developmental Tool .. 716
57.3.1 What Is Mentoring? ................................................. 716 57.3.2 Potential Benefits of Mentoring ............................... 717 57.3.3 How Should Mentoring Be Delivered, and by Whom? ......................................................... 718 57.4
Creating a Formal Mentoring Scheme and Establishing a Clear Purpose for the Mentoring Relationship or Scheme .......... 720
57.4.1 Gaining Visible, Senior Support .............................. 57.4.2 Defining the Mentoring Process to All Involved...... 57.4.3 Understanding the Possible Costs of Implementing a Mentoring Scheme ..................... 57.4.4 Identifying Suitable Participants as Mentees and Clearly Defining Their Responsibilities ............ 57.4.5 Identifying Suitable Participants as Mentors and Clearly Defining Their Responsibilities ............ 57.4.6 Creating “The Contract”: Clarifying the Ground Rules and Commitments ....................... 57.4.7 Matching Mentors and Mentees............................... 57.4.8 Training .................................................................... 57.4.9 Administering, Quality Assuring and Evaluating the Scheme ...............................................................
720 720 720
Abstract Currently, a significant amount of surgeons’ personal development occurs passively and informally, predominantly through role modelling and mentoring provided by senior colleagues. However, these traditional mentor-mentee relationships are being eroded by increased clinical, research and administrative demands and the modernisation of surgical career pathways. The aims of this chapter are to outline exactly what mentoring is, describe its potential benefits to surgeons, as well as varying models and methods of initiating and delivering a mentoring programme and some of the associated challenges. It is intended to provide an appropriate overview of mentoring and a potential tool to encourage the establishment of mentor-mentee relationship at an individual level, or mentoring programmes at a department, faculty or organisational level.
721 722 723 723 723 723
57.5 Conclusion ................................................................. 724 References ........................................................................... 724
O. Warren () The Department of BioSurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
57.1 The Mentoring Construct: Background One of the key responsibilities of medical schools, postgraduate surgical training programmes and academic surgical units is to develop the future leaders of the surgical profession. To do this, they must develop academic surgeons who can teach research, publish and be excellent clinicians. Furthermore, these individuals will require a macroscopic outlook on healthcare provision and resource allocation, and an awareness of the political, economic, social and technological drivers for change that will influence the health and academic sectors throughout their careers. Finally, they must be able to utilise a range of non-clinical skills, especially the ability to manage and lead others. These skills include setting clear direction, delivering highquality healthcare in a rapidly changing environment,
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_57, © Springer-Verlag Berlin Heidelberg 2010
715
716
O. Warren and P. Humphris
working collaboratively, networking and having strong personal qualities that impact positively on others. If this multi-dimensional individual is to be developed, to both lead service improvement and take the medical profession forward, then stakeholder organisations must place professional and personal staff development at the centre of their priorities. To create even the most junior of surgical trainees requires a significant investment of time, money and effort, at both a personal and societal level. These individuals should be regarded as invaluable assets to the healthcare system that has created them, and nurtured to ensure that they reach their full potential. However, the need for a committed approach to the personal development of surgeons has yet to be fully recognised by either the employers or the profession itself.
make many junior surgeons feel inadequately supported, disenchanted, disengaged and overwhelmed. The final difficulty with the current scenario involves the individuals themselves. Clinicians, but particularly surgeons, tend to have difficulty in asking for help, or even in appreciating their need for help. The traditional surgical culture has the following characteristics: selfreliance, machismo, independence and a resistance to managed personal development. However, like their colleagues in other high-performance arenas, such as business or the armed forces, surgeons need to make important decisions within the workplace. These decisions are often made with limited available information and under the pressure of knowing that they can impact not only their professional and personal lives, but the lives of others as well.
57.2 Current Challenges
57.3 Mentoring: An Important Developmental Tool
Currently, a significant amount of surgeons’ personal development occurs passively and informally, predominantly through role modelling and mentoring provided by senior colleagues. However, these traditional mentormentee relationships are being eroded by increased clinical, research and administrative demands and the modernisation of surgical career pathways [1]. Where informal mentoring does occur, it is not without its disadvantages; mentors are often inextricably linked to the appraisal, assessment and future employment prospects of the mentee, may have competing interests and agendas, and potentially impose their viewpoints on workplace issues. Furthermore, traditional surgical mentoring has centred on clinical skill development – only a portion of the abilities desirable in a consultant surgeon. As mentoring relationships have come under competing pressures, so personal support, previously found from peers or seniors within the “firm” or the doctors’ mess, has diminished. Recent changes to the working practices of doctors, driven by the European Working Time Directive, increased patient choice, and an expectation of better work-life balance has led to a degradation of these support structures. Simultaneously, the required levels of change, responsibility, accountability and financial awareness have increased. Surgeons now have to cope with greater public and regulatory scrutiny than ever before, alongside more intensive revalidation processes, and a less certain, more unstable employment market. These professional changes have combined to
The above-mentioned difficulties mean that more than ever, the personal development of surgeons must be given priority. Mentoring can form a big part of this supportive process and, despite the current difficulties, is associated with too many positive effects to argue that it should be given no further consideration and that it can be replaced by other personal development methods. What is required is heightened awareness, coupled with a commitment to develop more suitable models of mentoring. This can only occur if this and other aspects of personal development are given priority at a departmental level and beyond. The aims of this chapter are to outline exactly what mentoring is, describe its potential benefits to surgeons, as well as the varying models and methods of initiating and delivering a mentoring programme and some of the associated challenges. It is intended to provide an appropriate overview of mentoring and a potential tool to encourage the establishment of mentor-mentee relationship at an individual level, or mentoring programmes at a department, faculty or organisational level.
57.3.1 What Is Mentoring? One of the key problems when discussing mentoring and mentors is the sheer heterogeneity of definitions
57 Mentoring in Academic Surgery
encountered within the literature. While the variability is high within the social science, corporate and human development literature, the situation is compounded by the medical literature, where mentoring has been used to describe the full range of learning and supporting behaviours. The recent interest in mentoring within academic healthcare has led to the term being frequently substituted for “teaching”, “instruction” or “tuition”, thus compounding the situation. Furthermore, there appears to have been an assumption that everyone understands the term and thus defining it is unnecessary. Historically, the name “mentor” is believed to take its origin from Homer’s “The Odyssey”. When the Greek king, Odysseus, left for the Trojan War, he left his close friend Mentor to raise his son Telemachus into adulthood. Early models of mentoring within the literature tend to reflect this parental-like role, suggesting that mentoring is about giving wise advice and counsel, and a relationship with a distinct senior and junior. However, since mentoring came to prominence in the late 1970s business literature, it has become increasingly obvious that the term encompasses an entire spectrum of learning and supporting behaviours, which allow one individual to help another to develop and grow. Many authors have tried to define mentoring further, but two definitions that we believe are both useful and applicable to modern mentoring relationships are: A form of human development where one person invests time, energy and personal know-how in helping another person grow and improve to become the best that he/she can become [2]. Off-line help by one person to another, making significant transitions in knowledge, work or thinking [3].
Jacobi, in a study examining the relationship between positive mentoring relationships and academic success, distilled five key elements to the mentoring relationship [4]. While not every mentoring relationship will entail all five of these elements or will place equal weight on all of them, their description enables a clearer understanding of the type of relationship that mentoring describes: (a) Three inter-related components: emotional and psychological support, direct assistance with career progress and professional development (b) A focus on achievement or knowledge acquisition (c) Reciprocity, where both mentor and mentee derive emotional or other tangible benefits (d) A relationship that is personal in nature, involving direct interaction
717
(e) Emphasis on the mentor’s greater experience, influence and achievement within a particular organisation or area
57.3.2 Potential Benefits of Mentoring Mentoring can bring benefits to the mentee, the mentor and the organisations they work for, and these benefits have been reported for several decades in the management and business press [5–7]. Mentoring has been used in both private and, more recently, public sectors to support the development of people throughout their careers and ensure that organisations are developing future leaders. Most of the research into mentor-mentee relationships has examined the benefits of mentoring to the mentee, finding that mentoring is related to important career outcomes, such as salary level and job satisfaction [8, 9]. These potential benefits have led individuals to seek out a mentor and many corporate organisations have encouraged mentoring relationships between organisational members [10]. Less work has been done evaluating the beneficial effects for the mentor, and for the organisations supporting these relationships. This may be partly due to the complexity of assessing and defining both what is or is not a mentoring relationship, and what may be the appropriate outcomes. The medical literature remains relatively sparse. A recent systematic review of mentoring in academic medicine identified less than 50 articles, but focused its search on articles investigating the impact of mentoring on career choices and academic advancement, rather than other areas of personal development [11]. They concluded that while mentorship was regularly reported to be an important influence on personal development, research productivity and career choice, the poor quality of the studies did not allow conclusions to be made on the effect size of mentoring. Elsewhere, a multi-site study of over 2,000 clinicians investigated the keys to satisfaction with mentoring relationships in medicine, identifying features not described previously such as longevity of relationship and strong counsel on important decisions and career plans [12]. While these and other studies identify a relatively strong qualitative argument for mentoring as a construct, it is clear that longitudinal, comparative studies with more quantitative outcomes are required in the future.
718
57.3.3 How Should Mentoring Be Delivered, and by Whom? Mentoring can be classified and described in a number of ways, and a detailed discussion of the many forms it can take is not within the remit of this chapter. The following is just a brief overview of the different forms mentoring and mentors can take, but further in-depth analysis can be found in David Clutterbuck’s book “Everyone Needs a Mentor” [13].
57.3.3.1 Formal vs. Informal Mentoring One of the key variants in mentoring is the level of formality involved in the creation and maintenance of the relationship. When the concept of mentoring was first described and studied, most mentoring relationships arose spontaneously. Mentors were frequently the senior staff within the workplace, who were sought by the junior colleagues to create beneficial relationships, based predominantly on knowledge transfer and patronage. Two key factors, one positive and one negative, drove the creation of more formalised mentoring schemes. The first was the association between career success and a mentor; those individuals who had established a relationship with a senior figure appeared to do better. Whether this was a case of self-selection was unclear, but mentoring appeared to be linked to productivity and performance within the workplace and thus became something that organisations were keen to foster. The second factor was social inclusion. Earlier studies demonstrated that women and black and minority ethnic populations were less likely to have mentors [14] (although interestingly may benefit more than others where it does occur [15, 16]). Thus, some organisations set about trying to redress this balance by aiding the creation of mentoring relationships for these groups. Therefore, formal mentoring schemes started to be instigated by certain large organisations. In these schemes, some level of organisational control is usually exerted upon either the inclusion or selection of the mentors and mentees and/or the duration of the scheme. There are advantages and disadvantages to both informal and formal mentoring. Informal mentoring relationships appear to be associated with higher levels of satisfaction among the participants. This may be due to lack of time pressures in starting and growing the
O. Warren and P. Humphris
relationships, or it may be due to the mentors being less likely to be involved due to some sense of obligation or as part of a desire to be seen committed to personal development. Finally, informal mentoring, often created by a mutual admiration or attraction, tends to offer stronger elements of friendship and empathy than formalised mentoring relationships. As alluded to previously, formalisation can place some control on a process when left alone, although may not always work to the advantage of all those in an organisation. Some have claimed that informal mentoring appears to worsen social exclusion, because better-educated, socially dominant individuals with good communication skills acquire the good mentors. The NHS Institute for Innovation and Improvement runs a scheme aimed at mentoring black and minority ethnic staff, a talent pool who have previously found it difficult to access mentors [14]. Elsewhere, others have investigated ways to improve access to mentors for women in academic surgery [17]. Furthermore, formal mentoring schemes have clarity of purpose, which creates a sense of direction and allows both parties to define clear goals for the relationship. Finally, formal schemes support the relationships once they are formed, through a mixture of financial and administrative input, concurrent knowledge transfer or the creation of other learning and networking opportunities. Training, alongside such a scheme ensures both parties are aware of what is expected of them and allows the organisation to monitor any dysfunctional relationships, intervening where necessary to help either party.
57.3.3.2 Who Should Mentor? Who should fulfil this role? Not everyone is suited to filling the role of mentor; it is not an easy task and can only really be carried out by someone who is interested in the process, open to learning more about themselves and others, and willing to invest the time to build a successful mentor-mentee relationship. In practice, mentors can range from parents and teachers, (especially, in earlier life), to community leaders, professional leaders and work associates. Regardless of their original role in someone’s life, their role as mentor is a common one; to motivate, empower and encourage, increase the selfconfidence of their mentee and lead by example. They should be there to offer wise counsel, and raise the performance levels of their mentees [18]. There are varying styles that will achieve this, but all good mentors
57 Mentoring in Academic Surgery
have similar traits; they must be committed, unselfish and trustworthy. They must be willing to develop their leadership and mentoring qualities, skills that can be developed and enhanced [19].
719
qualitative evidence to support faculty mentoring as a way of improving performance within academic and clinical medicine [22, 23].
57.3.3.5 External or “Distance” Mentors 57.3.3.3 Peer Mentoring Mentoring can be done by one’s peer group, a process referred to as “Peer Mentoring”. While this may reduce the inspirational aspect of a traditional mentor, the formalisation of peer-group mentoring can emphasise the importance an organisation places on mutual support and co-operation. Although peer mentors have no more power or access than their mentee, this is not always a bad thing; Problems can be shared without fear of upsetting seniors or people with influence. There are further benefits; peers tend to share common problems and obstacles to success, they understand the worries and stresses that colleagues are going through and may have techniques for conquering challenges that are unfamiliar to more senior mentors. While peer mentoring may not be the only mentoring one should aim to receive, it is particularly useful for those who are already very senior themselves, such as chief executives or medical directors, who may struggle to access more senior individuals as they themselves climb the professional ladder.
57.3.3.4 Senior Mentors: “Internal” The most common form of mentoring within medicine, and probably in the wider corporate world, occurs between senior and junior members of the same profession, department or organisation. Traditionally, within surgery, consultant surgeons have mentored junior staff, sometimes from as early in their career pathway as undergraduate years. There are many advantages to this system. The mentor’s specific experience and, in surgery, clinical skills allow for the knowledge transfer to be a key component of the relationship. Mentees aspire to attain the abilities, position and achievements of the mentor, possibly learning from their mistakes so as not to repeat them themselves. The mentor may also help “open doors” that may otherwise remain closed to a network of individuals, and who may be influential to career progression. This has been shown to be particularly helpful in the professional and personal development of minority groups [20, 21]. There is a considerable
There are a few examples in the medical literature of “Distance” or “External” mentoring, where an individual outside the professional or social circles of the mentee agrees to enter in a mentoring relationship. One model piloted at Arizona State University involved a Congressman, a State Senator and a former Surgeon General [24], all of whom were in some way interested in healthcare and agreed to mentor underrepresented minorities within the medical faculty. The authors have piloted a similar external mentoring scheme in their department, involving national level healthcare leaders mentoring junior academic surgeons [25]. While distance mentors may not be able to invest as much time as a senior figures working alongside others, the potential advantages of this approach deserve consideration. The mentees gained access to meetings, events and negotiations outside their normal sphere of interest, and were privy to opportunities that someone in their role would not normally access. In our pilot scheme, such opportunities allowed mentees to witness the factors influencing change and progress at a national and strategic level. More generally speaking, mentoring in this model allows mentees to seek the advice and wisdom of individuals with a different perspective on their professional and personal problems. Because mentors are not involved in the assessment or appraisal of mentees, their advice is genuinely “off-line” and less likely to be affected by conflicts of interest that arise within a shared workplace. Finally, our model of an inter-professional dyad has been rewarding for the mentors, offering them the opportunity to work with young energetic people. The relationships have been based on bi-directional knowledge transfer, as modern day mentoring should be. This has given high-level managers an insight into a clinical world of which they may know little in detail and provided them with the chance to reflect on their own assumptions or beliefs. We hope they will contribute to improved clinician/manager relationships in the present and in the future, and to increased enthusiasm in clinical engagement in the leadership and management of health services.
720
57.4 Creating a Formal Mentoring Scheme and Establishing a Clear Purpose for the Mentoring Relationship or Scheme The second part of this chapter is intended to work as a tool to allow you to set up a mentoring relationship or scheme for yourself or others. We suggest that there are ten key stages or goals to ensuring that any mentoring endeavour is successful; 1. Establishing a clear purpose for the mentoring relationship or scheme 2. Gaining visible, senior support 3. Defining the mentoring process to all involved 4. Understanding the possible costs of implementing a mentoring scheme 5. Identifying suitable participants as mentees and clearly defining their responsibilities 6. Identifying and recruiting suitable mentors and clearly defining their responsibilities 7. Creating “the contract” – clarifying the ground rules and commitments 8. Matching mentors and mentees 9. Training 10. Administering, quality assuring and evaluating the scheme Any surgical department setting up a mentoring scheme needs to be clear about the purpose of the scheme. Many schemes stand alone as effective development activities but others are part of, or incorporate further learning and development opportunities. These might include opportunities for mentees to come together for master classes, workshops or action learning sets where 6–8 mentees share and learn from each other’s experiences using a co-consultancy model. It is essential that those initiating a mentoring relationship or scheme within academic surgery decide from the outset what it is for i.e. what benefits are intended for both the individuals involved and the organisation. Furthermore, this must be explained clearly to all participants because there is a strong correlation between clarity of purpose for the mentoring relationship and/or scheme and successful outcomes for all involved. A published statement of purpose and a process for aligning expectations of mentors and mentees is essential [26]. The reasons for establishing a relationship or scheme may include some or all of the following:
O. Warren and P. Humphris
• An impartial source of advice and support – a sounding board for ideas • Improved understanding of issues around the internal work environment • Exposure to alternative approaches to dealing with work issues • A source of knowledge regarding the wider health care and academic sectors – their characteristics and culture • Challenge by someone in a different role and with different perspectives • Opportunity to learn more about oneself • Establishing and focusing on clear short-, mid- and long-term futures • Increase in confidence
57.4.1 Gaining Visible, Senior Support If individuals are to invest time and energy in a mentoring scheme, either as mentors or mentees, the scheme needs to be visibly supported by the Head of Department and/or Head of the Faculty and valued as an integral part of the personal development of staff. The contribution made by mentors and the involvement of mentees and their consequent learning needs must be recognised in annual appraisal. It is also important to involve the potential mentees in discussions about the establishment of any mentoring scheme as it is more likely to be successful if it meets their expressed, rather than assumed, needs.
57.4.2 Defining the Mentoring Process to All Involved Engaging in a mentoring relationship involves a number of key steps and responsibilities. Figure 57.1 is a flow diagram that may act as a template for all involved to use.
57.4.3 Understanding the Possible Costs of Implementing a Mentoring Scheme Establishing a formal mentoring scheme takes time, determination and energy. However, the financial costs
57 Mentoring in Academic Surgery
Identify goals and needs from mentoring relationship
Identify suitable mentor, possibly through mentee-guided matching scheme
Approach mentor and seek his/her agreement
Set up and prepare for first meeting; think through what you want to achieve at first meeting
At first meeting; explore parameters role and relationship; define clear goals; build rapport; ensure clear timeline in place
721
mentors and mentees and administering the scheme throughout its duration. Further costs may include the provision of written guidance and/or an introductory workshop for participants, the real or potential costs of time away from academic or clinical work, any potential travel costs (predominantly incurred by the mentees) and the costs of launching the scheme with some sort of event. This can create an initial momentum and ensure the scheme has a profile both within and outside your organisation. The extent of any further financial or human resources depends predominantly on the amount of learning and personal development activity that is run alongside any such scheme. We strongly advocate at least one training session for the mentees and mentors. Clutterbuck reports that training the mentor and mentees, along with ensuring an understanding of the process in organisational line managers will at least double the success rate of relationships in a formalised scheme [13]. Finally, we believe it to be essential to evaluate any scheme in both a formative and summative manner as it progresses, allowing for any changes to occur in response to problems, prior to the end of the scheme. If finances allow, and depending on the size and aspirations of the scheme, this is best done externally.
57.4.4 Identifying Suitable Participants as Mentees and Clearly Defining Their Responsibilities At each meeting review progress in mentee’s development and productivity of relationship
Redefine and evaluate the relationship to ensure that is continues to meet the needs of the participants
Fig. 57.1 The mentoring process
of mentoring schemes are low in comparison to many other forms of personal development. Areas that should be considered when costing a scheme include the time of those individuals initiating the scheme, matching
Mentoring can only be effective if the mentee is enthusiastic and committed to participating fully in the relationship. Suitable participants will be keen on their personal development, curious to understand more about the wider environment in which they will be working as academic surgeons, open to new information, practical advice, reflective learning and interested in developing new networks. An element of self-selection goes some way to ensuring this when determined individuals or small groups start mentoring relationships. However, this differs when a formalised scheme is created; here the options are to establish a scheme, which offers anyone an opportunity to be matched with a mentor or to have a selective element so that only those who demonstrate a desire to gain learning from such a relationship are selected to participate.
722
Once an individual commits, it is useful to define the mentee’s responsibilities, which are;
O. Warren and P. Humphris Table 57.1 The ideal characteristics of a mentor Holding a relatively senior position with good access to both contacts and information
• To be committed to the process and the relationship, and enthusiastic about opportunities that may arise • To reflect on the issues that he/she would like to discuss with his/her mentor and set the agenda for the meetings, having a clear idea of desired outcomes or goals for each meeting • To be responsible for his/her learning and development, producing notes if needed and taking any agreed actions between meetings • To respect the mentor’s individuality and contribution • To accept joint responsibility for the success of the mentoring relationship
Have clear learning goals of their own from the mentoring relationship. Able to communicate clearly and question appropriately to elicit ideas and help the mentee make linkages and reach conclusions Helping the mentee to gain insights from their reflections and avoids being judgemental Able to challenge traditional approaches and encourage creative and innovative thinking Able to help the mentee with analysis and taking a strategic perspective on issues Trustworthy and able to hold confidences Ensuring the mentee retains responsibility for determining outcomes and solutions
57.4.5 Identifying Suitable Participants as Mentors and Clearly Defining Their Responsibilities
Active listening and able to pick up verbal and non verbal cues A willingness to devote sufficient time, energy and commitment An enthusiasm for developing and nurturing talent
If a mentoring relationship or scheme is to be successful, it is essential to identify the right sort of mentor(s). Simply holding a very senior position does not ensure being an effective mentor; in fact, it is these individuals who sometimes struggle to guarantee protected time for their mentees. The characteristics of an ideal mentor are summarised in Table 57.1. Ideally, the pool of mentors should include a diverse range of people, so that mentees can guide matching by selecting mentors based on a range of attributes. These may include professional interests, gender, race and geographical location. However, most organisations will have a limited resource of developmentally focused senior leaders able to devote time to mentoring. Even if the scheme aims to engage external leaders as mentors, as previously described, the supply of suitable people will still be limited and so mentor selection must be managed carefully to ensure quality is attained. The mentor’s role is to commit to a set of responsibilities to enable the mentees to achieve their potential. These include: • To provide protected time and safe space for the mentee • To offer guidance, encouragement and stimulation • To encourage the mentee to manage his/her own learning and development through reflection, by
Approachable, constructive and offering a sensitive, responsive balance between support and challenge Successful in their own right with a wide range of experience
• •
•
• •
drawing out the learning and by providing carefully selected examples from their own experience to illustrate different approaches To provide constructive challenge where appropriate to the mentee’s beliefs and assumptions To create opportunities to experience relevant situations to which the mentee may not otherwise have exposure To be sensitive to the mentee’s needs at any particular time and manage the relationship along a support – challenge continuum To respect the mentee’s individuality and contribution To accept joint responsibility for the success of the mentoring relationship
As we mentioned earlier, mentors will need briefing and/or training to ensure that they meet their responsibilities, and for this, provision should be made in any formal scheme. A workshop at the commencement of the scheme allows both mentees and mentors to gain a
57
Mentoring in Academic Surgery
shared understanding of how the relationship will operate and workout the “rules of engagement”.
57.4.6 Creating “The Contract”: Clarifying the Ground Rules and Commitments It is important to have clarity in the mentoring relationship about roles, responsibilities and commitments. One way of doing this is to create “a contract” into which mentor and mentee enter, often at the first meeting. Here, the pair can set out the ground rules and principles upon which the relationship will be based. Each contract will be individual to each mentoring pair but areas that should be covered are: • Confidentiality – the mentoring conversation may involve disclosure of personal experiences, and it is important to establish at the outset what is confidential. • Frequency, duration and location of meetings – ideally, mentoring conversations should occur at 4–6 weekly intervals and last between 1 and 2 h. The mentoring relationship should be established for an agreed time period, usually 9–12 months, with the possibility of continuing beyond that initial period. A commitment to meet or talk more frequently in the initial stages of the relationship can often be useful in building an early rapport. • Clearly agreed boundaries – agree early what may or may not be included in the conversations. For example, some mentors may not see their role in covering personal issues, whereas others may see work and personal issues as inextricably linked. • How to address any problems that may arise – things can go wrong with a mentoring relationship. This may include frequent cancellation of meetings by either participant, insufficient challenge by the mentor, personal chemistry not being right, failure to manage the boundaries, lack of commitment to the mentee’s development by either party or inappropriate exercise of power by the mentor. In this case, a central scheme administrator can help act as a third party where the relationship is part of a formalised scheme, but this is not possible for informal or individual mentoring relationships. It is therefore helpful to agree that the relationship can
723
end on a “no fault” basis if either individual feels it is not working.
57.4.7 Matching Mentors and Mentees The matching process for mentors and mentees, wherever possible, needs to involve an element of choice for both parties. Although there will be an overall purpose to the mentoring scheme, each participant will have individual learning needs and interests. We advocate an in-depth discussion with each mentee (and to a lesser extent, mentor) to map these out and make these needs explicit, prior to offering a choice of at least two mentors. Equally, the mentor should have the opportunity to say no to any individual mentee.
57.4.8 Training Training for both mentors and mentees is essential for an effective mentoring scheme, and consideration should also be given to educating other stakeholders or influencers such as line managers and peers. This can be done through training programmes for mentors and mentees joining the scheme and through access to a library of materials on mentoring and related subjects. There is also a significant amount of “on line” learning materials.
57.4.9 Administering, Quality Assuring and Evaluating the Scheme If a systematic and sustainable approach is to be taken to a mentoring scheme, it is important that there is a central point from which a named person runs the scheme. This increases the likelihood that the scheme will be effective, and provides the necessary manpower to produce supporting documentation and organise training and evaluation processes. Quality-assuring a mentoring scheme is important to ensure a positive and fruitful experience for both mentor and mentee. While the interactions between mentor and mentee cannot be observed, there needs to be a clear understanding of how issues of concern can
724
O. Warren and P. Humphris
be raised if they cannot be resolved between the mentor and the mentee. Identifying a third party who can support mentors and mentees can be useful. This may be the individual coordinating the scheme or a “critical friend” to the scheme. Evaluating the benefits of such a subjective, qualitative process is extremely challenging but must be addressed to ensure that the scheme is delivering the desired results. A variety of models have been proposed within the literature, most based on questionnaires and Likert scales, to evaluate the effectiveness of mentoring relationships or schemes [12, 27, 28]. Broadly speaking, most measure, at some point, the four levels based on Kirkpatrick’s training evaluation model, namely [29]:
principles should exist throughout. These must include training for mentors, and recognition that the time needed to build mentoring relationship should be compensated and valued within the surgical community. Promotion and tenure criteria in most academic surgical units currently emphasize scholarship, at the expense of citizenship, and this imbalance needs to be redressed. Lessons should be learnt from the corporate sector, where mentoring occurs independently of appraisal and assessment, and may be performed by individuals outside the mentee’s department or even profession. Finally, any new scheme should be evaluated to ensure that trainees gain from the potential benefits that it may bring.
• Reaction of the Student/mentee – what they thought and felt about the learning opportunity. • Learning – the resulting increase in knowledge or capability. • Behaviour – extent of behavioural change and implementation/application of learned behaviour. • Results – the effects on the business/service or environment resulting from the trainee’s performance.
References
Another possible method is to compare the scheme to the “International Standards for Mentoring Schemes in Employment” developed by the European Mentoring and Coaching Council and Clutterbuck Associates in 2002. These are available on-line at www.ismpe.com. These provide a detailed benchmark against which mentoring scheme coordinators can measure their programmes.
57.5 Conclusion Mentoring is an essential aspect of personal and professional development, which for too long has been neglected within the surgical community. The decline of traditional mentoring relationships within surgery means that there is a genuine requirement for alternative models of mentoring. It may be essential for the process to undergo some level of formalisation, not only to ensure mentoring occurs, but also to make sure that these opportunities do not worsen social exclusion. Where schemes are created, this must occur based on examples of good practice already available. While no single scheme or model would be suitable for all, certain common
1. Poole A (2003) The implications of modernising medical careers for specialist registrars. BMJ 326:s194 2. Flaherty J (1999) Coaching; evoking excellence in others. Butterworth-Heinemann, Boston, MA 3. Shea GF (1997) Mentoring: a practical guide. Crisp, Menlo Park, CA 4. Jacobi M (1991) Mentoring and undergraduate academic success: a literature review. Rev Educational Res 61: 505–532 5. Kram KE (1988) Mentoring at work: developmental relationships in organizational life. University Press of America, New York 6. Levinson DJ, Darrow CN, Klein EB et al (1978) The seasons of a man’s life. Ballantine, New York 7. Roche GR (1979) Much ado about mentors. Harv Bus Rev 57:14 8. Fagenson EA (1989) The mentor advantage: perceived career/job experiences of proteges versus non-proteges. J Org Behav 10:309–320 9. Whitely W (1991) Relationship of career mentoring and socioeconomic origin to managers’ and professionals’ early career progress. Acad Manag J 34:331–351 10. Burke RJ, McKeen CA (1989) Developing formal mentoring programs in organizations. Bus Q 53:76–99 11. Sambunjak D, Straus SE, Marusic A (2006) Mentoring in academic medicine: a systematic review. JAMA 296: 1103–1115 12. Ramanan RA, Phillips RS, Davis RB et al (2002) Mentoring in medicine: keys to satisfaction. Am J Med 112:336–341 13. Clutterbuck D (2004) Everyone needs a mentor. Chartered Institute of Personnel and Development, London 14. Newman LA, Pollock RE, Johnson-Thompson MC (2003) Increasing the pool of academically oriented AfricanAmerican medical and surgical oncologists. Cancer 97:329–334 15. Redmond SP (1990) Mentoring and cultural diversity in academic settings. Am Behav Sci 34:188–200 16. Sirridge MS (1985) The mentor system in medicine–how it works for women. J Am Med Womens Assoc 40:51–53
57
Mentoring in Academic Surgery
17. Hoover EL (2006) Mentoring women in academic surgery: overcoming institutional barriers to success. J Natl Med Assoc 98:1542–1545 18. Souba WW (1999) Mentoring young academic surgeons, our most precious asset. J Surg Res 82:113–120 19. Souba WW (1998) The job of leadership. J Surg Res 80:1–8 20. Abernethy AD (1999) A mentoring program for underrepresented-minority students at the University of Rochester School of Medicine. Acad Med 74:356–359 21. Daley S, Wingard DL, Reznik V (2006) Improving the retention of underrepresented minority faculty in academic medicine. J Natl Med Assoc 98:1435–1440 22. Jackson VA, Palepu A, Szalacha L et al (2003) “Having the right chemistry”: a qualitative study of mentoring in academic medicine. Acad Med 78:328–334 23. Morzinski JA, Diehr S, Bower DJ et al (1996) A descriptive, cross-sectional study of formal mentoring for faculty. Fam Med 28:434–438
725 24. Lewellen-Williams C, Johnson VA, Deloney LA et al (2006) The POD: a new model for mentoring underrepresented minority faculty. Acad Med 81:275–279 25. Warren O, Smith A, Humphris P et al (2007) Management development in academic surgical trainees; A pilot formalised mentoring programme. In: Developing careers in surgical education: management, training and research conference, The Royal College of Surgeons of England, London 26. Clutterbuck D, Ragins BR (2002) Designing and sustaining a mentoring programme mentoring and diversity an international perspective. Butterworth-Heinmann, Melbourne 27. Berk RA, Berg J, Mortimer R et al (2005) Measuring the effectiveness of faculty mentoring relationships. Acad Med 80:66–71 28. Milner T, Bossers A (2004) Evaluation of the mentor-mentee relationship in an occupational therapy mentorship programme. Occup Ther Int 11:96–111 29. Kirkpatrick DL (1994) Evaluating training programs: the four levels. Berrett-Koehler, San Francisco, CA
Leadership in Academic Surgery
58
Oliver Warren and Penny Humphris
Contents Introduction ........................................................................ 727 58.1
What is Leadership? .............................................. 728
58.1.1 58.1.2 58.1.3 58.1.4
Early Work on Leadership........................................ Transactional and Transformational Leadership ...... Management and Leadership ................................... More Recent Work ...................................................
58.2
A Selection of Leadership and Managerial Models ..................................................................... 733
58.2.1 58.2.2 58.2.3 58.2.4
Lewin’s Leadership Styles ....................................... Situational Leadership Model .................................. The Managerial Grid ................................................ Four Framework Approach ......................................
58.3
Leading in Teams ................................................... 735
58.4
Leadership in Academic Surgery ......................... 736
58.4.1 58.4.2 58.4.3 58.4.4
The Challenge .......................................................... Why Is Clinical Leadership Important? ................... What Aspects of Care Should We Be Improving? ... What Are the Differing Leadership Roles in Delivering These Clinical Aims? ......................... 58.4.5 What Are the Common Attributes of Academic Surgical Leaders and Their Requirements? ............. 58.5
728 729 730 731
733 733 734 734
736 736 736 737 737
How Best to Develop Leadership Skills in Surgeons? ............................................................ 737
58.5.1 What Methods Are Available to Develop Leadership? .............................................................. 738 58.5.2 Who Should Be Responsible for Leadership Development in Academic Surgeons and How Should It Be Delivered? ........................................... 739 References ........................................................................... 740
O. Warren () The Department of BioSurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
Abstract Academic surgical departments are required to perform three key roles; provision of highly specialized clinical care, high-quality undergraduate and postgraduate education to create the next generation of surgeons and delivery of excellent, innovative research. To ensure that these commitments are delivered requires clinical and academic excellence, clinical engagement and high-quality leadership. In this chapter, we outline the traditional and current models of leadership, and their application to the surgical environment. We also assess the relationship between management and leadership and strategies for leadership development.
Introduction The last 10 years have seen unparalleled amounts of change in the healthcare industry within both the United Kingdom and the United States. Nowhere has this been felt more keenly than within academic surgery where, in less than a generation, selection criteria, methods of training and working roles have changed almost beyond recognition. The ensuing turbulence has generated significant levels of fear and distrust among clinicians, on occasions leading to obstacles to change being created, and goals and targets not being achieved. Despite far reaching changes, academic surgical departments are still required to deliver their key services; providing specialized tertiary and quaternary clinical care, educating the next generation of surgeons through high-quality undergraduate and postgraduate education, and performing excellent, innovative research. To ensure that these commitments are delivered to a high standard not only requires clinical and academic excellence and clinical engagement, but also effective
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_58, © Springer-Verlag Berlin Heidelberg 2010
727
728
O. Warren and P. Humphri
Table 58.1 Various definitions of leadership To go along with oneself; to accompany and show the way Oxford English Dictionary Leadership defines what the future should look like, aligns people with that vision, and inspires them to make it happen, despite the obstacles John Kotter Leadership is the capacity and will to rally men and women to a common purpose and the character, which inspires confidence Bernard Montgomery, British Field Marshall A leader is someone who understands where people are going and stands in front of them Gandhi Leadership is influence – nothing more, nothing less John C Maxwell
clinical leadership. While this may be recognised, little has been done to try and foster a culture or even interest in leadership development for surgical trainees. Leadership is not taught at an undergraduate level, there are no mandatory leadership courses or qualifications for surgical trainees, and leadership performance rarely figures within appraisal or assessment programmes. Perhaps worst of all, there appears to be only a handful of current world-class clinical leaders performing as role models for younger surgeons. If these deficiencies are to be addressed, it is essential that academic surgical trainees become acquainted with current knowledge and beliefs regarding leadership and leadership development and learn how to take on leadership roles in their eventual clinical and academic activities.
them within the context of achieving a common good (see Table 58.1). Kotter, one of the key writers on leadership in recent times describes it thus: “It (leadership) envisions the future and sets a new direction for the organization. Successful leaders mobilize all possible means and human resources; they inspire all members of the organization to support the new mission and execute it with enthusiasm” [1]. Thinking on leadership has evolved considerably over the last 80 years. To develop a view or model on leadership within academic surgery, there is much evidence from which to draw. However, very little of the literature is specific to surgery, most of it being developed and studied in the business world.
58.1.1 Early Work on Leadership 58.1 What is Leadership? No single topic pertaining to personal development has received as much attention in the last 15 years as leadership. There has been a proliferation of courses, scientific papers and books on leadership and leadership development, urging us to consider our own leadership ability and to seek to improve it. Why? Because, most organisations have recognised that effective leadership is critical to the success of their business and have thus invested increasing amounts of time and money in leadership development. But, what is leadership? Many prominent leaders and authors have attempted to define what it means to be a leader; all of
Modern studies of leadership began as far back as the 1920s with “Trait Theory”. On the basis of a careful study of individuals’ characteristics and traits, trait theory argues that successful leaders possess inherent qualities, which are natural to them and can be identified. Furthermore, it suggests that we can identify a pattern of inherent abilities associated with high-class leadership, and by focusing on the behaviours of these successful leaders, define a “best” style. While leadership theories have moved forward, most popular books on the subject continue to list traits that are thought to be central to effective leadership, and at first glance often appear to be helpful. They frequently include such traits as “good communication”, “courage”,
58
Leadership in Academic Surgery
“goal – setting” and “self-confidence” [2]. This theory, however, has clear flaws. Early researchers assumed that there was a definitive set of characteristics that made a leader, regardless of his or her environment, and the nature of the individuals whom he or she is trying to lead. They minimised the impact of the situation upon the method of leadership required [3]. Studying effective current day business or political leaders, such as Stephen Jobs, Richard Branson, Bill Clinton or Nelson Mandela, reinforces the failings inherent in this theory. It is clear that they have varied qualities, and that success can be achieved in the absence of some, if not many, allegedly important traits. Furthermore, exponents of trait theory tend to mix aspects of behaviour, skill, temperament and intellectual ability, into lists that while large, cannot be exhaustive. These deficiencies led others to suggest that the effects of one variable on leadership are contingent on other variables. This was the start of “Contingency Theory”, a major shift in thinking around leadership. Fiedler performed much of the early work in this area, and proposed that good leaders have a repertoire of styles, adopting any given style dependent on the situation in which they find themselves (“situational theory”) [4]. While Fiedler believed this was possible for highly skilled leaders, he also suggested that for most leaders this would be beyond their ability. Where possible, therefore, the situation should be structured to suit the leader or alternatively, leaders should be recruited whose styles were preferable to that situation. Other, more recent contingency theorists disagreed, suggesting the leader should be encouraged to develop their repertoire of skills to suit the contingencies he or she find themselves in, removing the emphasis on altering the situation to fit the leader [5] In the late 1960s, Hersey and Blanchard started to move the thinking on leadership forward, suggesting that it is not situational leadership style per se that matters for leadership to be effective, but the ability of any given leader to adapt their style to the needs of their followers [6]. The attributes of those being led could be broadly separated into their knowledge and experience of the task and their willingness to actively participate within it. They outlined four different leadership styles, among which they felt a good leader should be able to alternate, to reflect the differing combinations of commitment and competency of the individuals they were managing. These are explained later in this chapter.
729
The idea of balancing the needs of the followers with achieving the task in hand was further developed by John Adair [7] in a model defined by studying effective leadership at the Royal Military Academy Sandhurst. Known as the “Three Circle” model, Adair believed that successful leaders focus upon three distinct areas of need: the task, the team itself and the individual members of the team. At any time, the emphasis on each circle may vary, but all are interdependent, and thus, the leader must watch all three. Task needs include setting clear goals and managing the process, team needs are based on effective interaction, communication and workload sharing and finally, individual needs are dictated by how each person is behaving and feeling. While regarded by some as a little basic nowadays, it still provides a solid foundation for more complex human relations.
58.1.2 Transactional and Transformational Leadership The widespread economic recessions in the late 1980s and early 1990s, on both sides of the Atlantic, forced organisations to decentralise and function in a way that was more customer-centered. Where old leaders had managed in a world of relative stability and predictability, a different kind of leader was now required to enable their organisation to be flexible, adaptable and to rapidly evolve within a constantly changing environment [8]. To create this type of organisational behaviour, the staff they led could no longer be regarded as small cogs within a larger wheel, but must be valued parts of the organisation who could contribute significantly to its success. The recognition of Hersey and Blanchard that the needs of the followers were as integral to the process of leadership as the leaders themselves provided a bridge into a new era: Transformational Leadership. In 1985, Bass surveyed a large group of managers to investigate what they regarded as their best experienced leadership. Through this work, he developed a model of leadership, which identified two key different sets of behaviours and characteristics, which he named “transactional” and “transformational” leadership [9]. On the basis of the precept that team members agree to follow their leader when they commence a task, transactional leadership assumes that any individual member of a
730
O. Warren and P. Humphri
team is motivated to perform by reward and risk of punishment. The “transaction” is therefore the organisation rewarding the team members upon delivery of agreed goals, and likewise, the leader intervening with some punitive measure if performance does not meet expectations. Opinion on transactional leadership tends to vary; for some, it is really just a way of managing, rather than a true leadership style, as the focus is on the delivery of short-term tasks. Critics suggest that it limits knowledge-based or creative work. However, transactional leadership remains a common style in many organizations, predominantly because it keeps systems and processes running effectively, and tends to ensure that team members know exactly where they stand, and what is expected of them. Bass contrasted transactional leadership with “transformational” leadership, which recognised that empowering and enabling followers to aspire to higher-level purposes is crucial to effectiveness, particularly in environments that thrive on creative thinking. His model of transformational leadership was based on the following elements; • Be charismatic and highly esteemed. This inspires followers to regard the leader as a role model and align behind a common vision. • Create an environment in which challenging the status quo is encouraged. • Intellectually stimulate followers to release creativity and drive innovation. • Have a genuine concern for the feelings, aspirations and development needs of individual followers. This consideration of their needs encourages them to aim for higher levels of satisfaction.
58.1.3 Management and Leadership John Kotter, one of the most important recent writers on leadership, has written on the differences between management and leadership. He describes management as being about the present with a focus on maintaining, problem solving, focusing on systems and controls and execution. Contrast this to modern transformational leadership, which concentrates on creating a vision and inspiring and motivating followers to achieve perpetual change and improvement. Management and leadership can be combined in the same person, but he argues it is critical to recognise that the tasks are different: management is about the present, leadership is about the future and both are essential to business. He has set out a helpful comparison of transactional leadership (which he aligns with management) and transformational leadership, showing the contributions made by each [10], and these are outlined in Table 58.2. It is crucial that a leader not only has self-awareness and understanding, but also is aware of how others may perceive him or her. This is only possible if leaders are open to the views of colleagues, particularly their subordinates, whose views traditionally may not have been sought. This thinking led to the development of the 360-degree feedback process, which records perceptions of the leader by subordinates, peers and, where possible, superiors. When properly analysed and fed back, it enables the leader to understand how others perceive him/her and improves awareness of their own weaknesses, an ability that some have suggested is the key to effective leadership [11]. Furthermore, the
Table 58.2 Kotter’s comparison of transactional and transformational leadership transactional leadership (management) transformational leadership (leadership) Management Leadership Creating agenda
Planning and budgeting: developing a plan, a detailed map of how to deliver results
Establishing and developing direction: creating a vision along with a strategy for achieving it
Developing human resources
Organisation and staffing: Which individual best fits each Aligning people: major communication job and what part of the plan fits with each individual challenge, getting people to understand and believe the vision
Execution/delivery
Controlling and problem solving: monitoring results, identifying deviations from the plan and solving the problems
Motivation and inspiring: satisfying basic human needs for achievement, belonging, recognition, self esteem and a sense of control
Outcomes
Produces a sense of predictability and order
Produces changes – often dramatic ones
58
Leadership in Academic Surgery
731
Table 58.3 Kouzes and posner’s five characteristics of leadership Challenging the process
Good leaders encourage others to come up with new ideas and new ways of approaching their job, problems and processes within the organisation. They encourage experimentation and innovation and are prepared to take judicious risks
Inspiring a shared vision
Leaders create a vision for the future. They have a desire to change the way things are. They inspire others by finding out what their dreams, hopes and aspirations are and articulate a vision that enables others to see how all will be served by adopting a common underlying purpose
Enabling others to act
Leaders realise that they cannot achieve the vision on their own so they seek and enlist the support of others. They encourage collaboration, co-operation, build teams and empower others. Others may not only include staff whom they manage but also their managers, customers and suppliers
Modeling the way
Leaders must encourage the management process of planning and reviewing progress and taking corrective actions. In doing so, they must behave in a way that gains the respect and trust of others. They have considerable integrity, since they act in a manner consistent with their values
Encouraging the heart
Leaders realise that achieving the vision is exhausting and at times frustrating. They maintain morale by recognising and celebrating others’ achievements and by signalling how much they believe in and value their staff. Their expectations are positive and their praise genuine
model suggests that the test of a good leader is actually the followers, what they make of him or her, and how well their interests are served by their leader. This concept is similar to “servant-leadership” a practical philosophy first described by Robert Greenleaf, who suggested that good leadership starts with the desire to serve, and then leading as a way of expanding that service to individuals and institutions. Greenleaf suggested that servant-leaders should focus on the growth of their followers, empowering them to become more autonomous and, to a certain extent, develop their “followership” [12] Kouzes and Posner [13], stimulated by Bass’ work, analysed 5,000 managers’ accounts of their own personal experience of good-quality leadership, and from these, five characteristics of transformational leadership emerged (see Table 58.3). The fact that these two key separate studies produced broadly comparable results imparted some validity to the findings. In their model of transformational leadership, they emphasise the leader’s role in creating an environment in which everyone contributes their best, because the leader actively supports their development, and recognises their commitment.
58.1.4 More Recent Work Until the early 1990s, it was presumed that a high IQ was the most essential ingredient to success in life. That was until Daniel Goleman, then a science reporter
at The New York Times, came across an article by two psychologists, John Mayer and Peter Salovey [14], in which they outlined a concept called “emotional intelligence” which they defined thus; “a form of social intelligence that involves the ability to monitor one’s own and others’ feelings and emotions, to discriminate among them, and to use this information to guide one’s thinking and action” stimulated by their work, Goleman spent the next 5 years investigating and synthesizing a wealth of information from many different sources and scientific disciplines, to produce a book entitled “Emotional Intelligence”, published in 1995 [11]. It went on to spend more than one and a half years on the New York Times Best Seller list and continues to be one of the most successful and influential books of its genre. While the impact of what the Harvard Business Review hailed as “a ground-breaking, paradigm-shattering idea” was felt throughout a variety of societal domains, the impact on leadership and employee development within business was probably the most significant. Within the book, Goleman identified five aspects of emotional intelligence (see Table 58.4). Goleman did not doubt that leadership involves conventional intelligence. However, research comparing average senior managers with star performers suggested little difference in measures of traditional intelligence (IQ), and that nearly 90% of the variation may be down to emotional intelligence, as measured in self-report Emotional Quotient (EQ) tests. While these tests have undoubted flaws, and are open to criticism, it does appear that successful leadership, especially the ability
732
O. Warren and P. Humphri
Table 58.4 Goleman’s components of emotional intelligence Self-awareness
Understanding your own personality traits and their interaction with other people
Self-regulation
Being able to manage your own unproductive personality traits
Motivation
A reason for working that goes beyond your own prosperity
Empathy
The crucial ability to understand the personalities and motivations of others
Social skills
A broad term covering the ability to build and manage relationships
to motivate others, begins with an understanding of oneself and other people. Furthermore, it is likely that the commonly held view, that forceful personality is a prerequisite for motivating others, is incorrect. A recent study by Jim Collins [15] profiled eleven companies, which had significantly improved their performance. The result was the commonality of leadership style of the Chief Executives – dubbed Level 5 leaders – who “build enduring greatness through a paradoxical blend of personal humility and will”. Heifetz challenges the myth of leaders as specially gifted individuals at the top of organisations, who solve other people’s problems [16]. He sees leadership as an activity carried out by people at many different levels of the organisation, and consists in jointly confronting difficult issues and taking shared responsibility for tackling them. Heifetz identifies six dimensions of what he calls “adaptive” leadership: • Identify the adaptive challenge – be clear about the crunch issues, which have to be tackled even if it means tackling painful issues. • Give the work back to the people with the problem, rejecting the pressure and the projections from “followers” for the “leader” to solve their problems for them, and pushing responsibility back on to those who need to make the adaptive change. • Protect the voices of leadership from below, recognising that some of the most important insights into what needs to be done and the momentum for change may come from people lower in the organization. • Regulate the distress, recognising the pain of adaptive change and trying to create an environment in which difficult issues can be addressed at a pace with which people can cope.
• Pay disciplined attention to the issues, maintaining a sustained focus on the adaptive challenge rather than being diverted onto other less important but more comfortable issues. • Move between the balcony and the battlefield, combining the ability to take a strategic overview and being able to think ahead about possible options with the ability to know what is happening on the ground. Academic surgical departments, in common with many other organisations, need leaders who can meet these requirements of adaptive leadership and are thus able, as Goodwin outlines for the health care system, to: • Look ahead and influence their environments strategically • Spot opportunities and create the capacity and capability to go after them • Consider views from people at different levels of the organisation before making major decisions • Be personally more flexible and able to move between strategic and detailed viewpoints • Be in touch with what is going on in their organisations • Maintain focus on the key challenges • Reconsider traditional ways of doing things • Build their organisation’s capacity to learn, transform, change culture and adapt • Encourage and support innovation • Stay in touch themselves through continuous learning As Goodwin concludes; the emergence of a networkbased approach to leadership, underpinned by growing interest in emotional intelligence, reflects how “leadership tasks” have been replaced by an emphasis on people issues [17]. As a result, successful leadership depends on understanding context and developing a network of successful inter-personal and inter-organisational relationships to secure agreement for change. Leadership therefore becomes not as series of single actions but continual interaction with the environment and other people. Modern thinking about leadership emphasises the role of the leaders as strategists [18]. Leaders are seen as responsible for positioning the organisation for the future and able to inspire, organise and implement the pursuit of a vision.
58
Leadership in Academic Surgery
58.2 A Selection of Leadership and Managerial Models 58.2.1 Lewin’s Leadership Styles In 1939, Kurt Lewin and others identified a range of leadership styles as follows [19]: Authoritarian Leadership (Autocratic) – authoritarian leaders provide clear expectations for what needs to be done, when it should be done and how it should be done. There is a clear division between the leader and the followers. Authoritarian leaders make decisions independently with little or no input from the rest of the group. Researchers found that decision-making was less creative under authoritarian leadership. Abuse of this style is usually viewed as controlling, bossy and dictatorial. Authoritarian leadership is best applied to situations where there is little time for group decision-making or where the leader is the most knowledgeable member of the group. Participative Leadership (Democratic) – Lewin’s study found that participative (democratic) leadership is generally the most effective leadership style. Democratic leaders offer guidance to group members, but they also participate in the group and allow input from other group members. Leaders in this group are likely to be less productive than authoritarian leaders, but their contributions are likely to be of a much higher quality. Participative leaders encourage group members to participate, but retain the final say over the decision-making process. Group members feel engaged in the process and are more motivated and creative. Delegative Leadership (Laissez-Faire) – people under delegative (laissez-faire) leadership are the least productive of all three groups, make more demands on the leader, showed little cooperation and are unable to work independently. Delegative leaders offer little or no guidance to group members and leave decisionmaking up to group members. While this style can be effective in situations where group members are highly qualified in an area of expertise, it often leads to poorly defined roles and a lack of motivation.
733
leadership styles depending on the situation. Blanchard and Hersey characterised leadership style in terms of the amount of direction and support that the leader gives to his or her followers, and so created a simple grid: + Supporting behaviour
Supporting (S3)
Coaching (S2)
−
Delegating (S4) −
Directing (S1) Directive behaviour
Directing leaders define the roles and tasks of the “follower”, and supervise them closely. Decisions are made by the leader and announced, so communication is largely one-way. Coaching Leaders still define roles and tasks but seek ideas and suggestions from the follower. Decisions remain the leader’s prerogative, but communication is much more two-way. Supporting Leaders pass day-to-day decisions, such as task allocation and processes, to the follower. The leader facilitates and takes part in decisions, but control is with the follower. Delegating Leaders are still involved in decisions and problem-solving, but control is with the follower. The follower decides when and how the leader will be involved. Effective leaders are versatile in being able to move around the grid according to the situation, so there is no one right style. Blanchard and Hersey extended their model to include the development level of the follower. They said that the leader’s style should be driven by the competence and commitment of the follower, and came up with four levels: D4
High competence High commitment
Experienced at the job, and comfortable with their own ability to do it well. May even be more skilled than the leader
D3
High competence Variable commitment
Experienced and capable, but may lack the confidence to do it alone, or the motivation to do it well/quickly
D2
Some competence Low commitment
May have some relevant skills, but won’t be able to do the job without help. The task or the situation may be new to them
D1
Low competence Low commitment
Generally lacking the specific skills required for the job in hand, and lacks any confidence and/or motivation to tackle it
58.2.2 Situational Leadership Model Situational Leadership (referred to in first section of this chapter) was described by Blanchard and Hersey in the 1960s. A situational leader is one who can adopt different
+
734
O. Warren and P. Humphri
Development levels are also situational – people may be at different development levels in different situations. Blanchard and Hersey said that the Leadership Style (S1–S4) of the leader must correspond to the Development level (D1–D4) of the follower – and it is the leader who adapts.
58.2.3 The Managerial Grid In 1985, Blake and Mouton used a simple, yet attractive model to describe managerial behaviour. Their “Managerial Grid” bore a resemblance to the Adair model of leadership, described previously, but reported just two dimensions to describe managerial behaviour; “Concern for people”, plotted using the vertical axis, and “Concern for task”, plotted along the horizontal axis, both on a scale of 0–9 (Fig. 58.1). “Impoverished” Leaders (low task, low relationship) are not able to maintain relationships, nor are committed to the task at hand. They detach themselves from the team process and operate using a “delegate and disappear” management style. “Authoritarian” Leaders (high task, low relationship) are task-oriented people who do not invest time or energy in their team members. Junior staff does not gain the opportunity to contribute to development, and no allowance is made for collaboration. “Country Club” Leaders (low task, high relationship) invest heavily in relationships with colleagues, to
9
‘Country Club’
‘Team Leader’
Concern for People ‘Most people’
‘Impoverished Leadership’
the detriment of the task at hand. They fail to employ their legitimate and necessary power to accomplish goals, for fear of jeopardizing these relationships. “Team Leaders” (high task, high relationship) create, maintain and lead some of the most productive teams. They are positive people who encourage goal accomplishment in the most effective way possible. They ensure that all team members remain focused on the task at hand, and are motivated to achieve their potential, both as a team and an individual. The majority of the time, the most desirable place for a leader to sit on the two axis would be 9 on task and 9 on people – the Team Leader. It is unwise, however, to entirely dismiss the other three areas on the grid, as certain situations may call for one of the other three types to be used. For example, when leading a trauma team during a critical moment, it may not be appropriate to be overly concerned about the feelings or needs of other team members. An authoritarian approach would ensure all team members remain very closely focused on the task at hand and on the best outcomes for the patient. Likewise, relatively “impoverished” leadership may be acceptable when a goal or task is still distant and the leader is hoping to develop selfdependency and encourage others to step into leadership roles created by their holding back.
58.2.4 Four Framework Approach In 1991, Bolman and Deal described the “Four Framework Approach”, which suggests that leaders display leadership styles or behaviours in four types of framework; Structural, Human Resource, Political and Symbolic [20]. More than one approach is likely to be needed in any given situation, and the efficacy of the leadership will be partly dependent upon the appropriateness of the style for that given situation. They went on to suggest that the best leaders can adopt each approach when required and are aware of the limitations of using just their own preferred approach regardless of the environment. This requires a level of self-awareness and understanding.
‘Authoritarian’
58.2.4.1 The Structural Framework
0 0
Concern for Task
Fig 58.1 Blake and Mouton's Managerial Grid
9
This suggests a leader is also a social architect. Here, the focus is on structure, strategy, environment,
58
Leadership in Academic Surgery
implementation, experimentation and adaptation. The style is analytical and goal focused. When used ineffectively or inappropriately, these leaders can appear tyrannical and obsessed with detail.
735
In this framework, the leader is a servant or catalyst, who supports and empowers those he orshe leads. By being visible, accessible and by sharing information, they build a culture that supports and empowers people. The decision-making process is moved down the organisation. The disadvantage to this framework is that the leader can sometimes be regarded as weak, who abdicates too much responsibility.
How might this model transfer to effective leadership within academia or healthcare? During periods of significant organizational change, such as closures, mergers or the implementation of new professional models of working, it is likely that both structural and symbolic leadership frameworks are effective. A vision must be discovered and effectively communicated, but likewise, structure, strategy, environment, implementation and adaptation will be important in delivering the goal. During periods of relative stability, when strong growth is needed, the political and human resources frameworks would provide the opportunity to create new strategic networks and partners, and empower staff to innovate. Those who wish to be effective leaders need to be aware of their preferred approach, its limitations in certain environments, and work at developing weaker frameworks.
58.2.4.3 The Political Framework
58.3 Leading in Teams
58.2.4.2 The Human Resource Framework
Effective leaders build coalitions and create networks with key stakeholders. They use persuasion and effective negotiation to strategically place their organisation where necessary within the marketplace. They are able to understand the distribution of power, what they want and what they can get from others. If used ineffectively, the political leader can be seen as manipulative and untrustworthy.
58.2.4.4 The Symbolic Framework Certain situations require leaders to inspire others. They require a vision to be not only discovered but also communicated effectively to all staff. Leaders who predominantly use this framework view their organisations almost as a stage upon which they play certain roles and give impressions. They may use symbolic gestures to capture attention. When ineffective, the leader is seen as fanatical, foolish and out of touch. The four framework approach suggests that leaders have one dominant framework in which they prefer to work. However, there are occasions when that framework may be inadequate and an effective leader must then work in a different style. When developing our own leadership potential, we should strive to be conscious of all four frameworks.
It is common to think of leadership as the actions of a single person, but an increasing view is that teams of leaders should lead complex organisations – a particularly relevant strategy for service industries such as healthcare systems, where complex decisions must be made quickly across functions. As Drath suggests, “Many people in organisations and communities are beginning to think of leadership as a distributed process shared by many ordinary people instead of the expression of a single extraordinary person” [21]. However, leadership through teams requires a level of cohesiveness within teams as well as across boundaries. Øvretveit [21] puts forward the idea of a need for an organisation to develop a “system of leadership”. This involves purposefully identifying and developing all formal and informal leader roles, including groups and teams as leaders, to be able to promote and support improvement as part of the everyday work of an organisation. This is especially relevant to academic departments of surgery where there are many leadership roles for groups and teams, across a variety of different working domains e.g. innovation and research, teaching and training, patient safety and clinical service delivery. Because of the diversity of the working environment and goals, it is likely that differing leadership styles will be required to thrive in the differing domains.
736
58.4 Leadership in Academic Surgery 58.4.1 The Challenge The role of any academic surgical unit within a hospital tends to be triple-headed; delivering high-quality, evidence-based, tertiary level surgical care, providing sound surgical education at both undergraduate and postgraduate level and executing first class research, (the latter two existing to a great extent to ensure quality improvement in the first). These three different areas of productivity will create differing leadership challenges for the Departmental Head, and the day of the individual who can excel in all three areas is fast disappearing. Thus, for a department head to succeed, he or she must be pro-active in recruiting surgeons at all levels, who are not only good clinical leaders but who will take responsibility for and lead a certain area in which they excel e.g. the recruitment of an educationalist to head the department’s teaching commitments or a surgeon-scientist to lead basic laboratory-based research. This allows the Department or Divisional Head to lead on overall vision and major strategy while delegating and sharing some of the leadership responsibilities in these differing domains with others, in line with the distributed leadership models described earlier. Finally, to maintain credibility as a unit, it is essential that a culture of clinical excellence through collective responsibility is created and that all surgeons within the unit feel they have an obligation to lead in this area, regardless of level or role.
58.4.2 Why Is Clinical Leadership Important? Quality clinical leadership is at a premium. Healthcare in general, and surgical services in particular, have undergone phenomenal and unprecedented levels of change on both sides of the Atlantic in the last 10–15 years. The implementation of major healthcare reforms in order to reduce or maintain costs, while increasing quality and outcomes, has placed significant strain on surgeon-manager and surgeon-politician relationships. Despite occasional, well-publicised examples of clinical governance failures, offering a high standard of clinical care, coupled with excellent communication
O. Warren and P. Humphri
and compassion remains the aim for most surgeons. This cannot be achieved in isolation but must be in the setting of quality service, one with a culture of perpetual improvement and drive for excellence at its very centre, and this in turn will only occur if leadership is present. Surgeons at all levels have a key role to play in leading change; improvements in heath care depend first and foremost on making a difference to the experience of patients and service users, which in turn hinges on changing the day to day decisions of the clinical staff [22]. This premise is supported by the US Institute for Healthcare Improvement, which has demonstrated that quality of clinical leadership directly effects the rate at which improvements in the performance of the health care systems are and can be made [23]. The challenge therefore for academic surgeons is not to take the easier position and defend the status quo, but to actively lead in making changes in the methods of service delivery.
58.4.3 What Aspects of Care Should We Be Improving? The US Institute of Medicine outlined in their report “Crossing the Quality Chasm: A New Health System for the 21st Century” [24] six areas for improving the quality of care, and for which leaders within a healthcare system will and should be held accountable: • Safe – avoiding injuries to patients from the care that is intended to heal them; • Effective – providing services based on scientific knowledge, avoiding overuse, under use and simply wrong services • Patient-centered – providing care that is respectful of and responsive to individual preferences, needs and values • Timely – reducing harmful waits and delays for patients • Efficient – avoiding waste • Equitable – providing care that does not vary in access or quality by patient characteristics such as gender, ethnicity, geography and socio-economic status These six domains act as an excellent framework through which academic surgeons can aim to enhance the standard of clinical care they provide. While certain areas,
58
Leadership in Academic Surgery
such as efficiency and equity, are not the ones with which surgeons have traditionally concerned themselves, this may be because they did not feel empowered to do so. We suggest that these are the very areas that require a significant increase in clinical engagement if improvements are going to be witnessed at a service level.
58.4.4 What Are the Differing Leadership Roles in Delivering These Clinical Aims? The impact an individual clinician can have on any of these six areas will inevitably vary depending on his or her role within the organization,hisor her seniority and speciality. While certain core personal values, attitudes and behaviours should be common throughout (see below), the specific leadership role of any surgeon will vary depending upon his or her position within the employing organisation. Leaders of professional bodies, colleges, large academic units and hospital trusts are required to establish a long-term vision and strategy, create a climate of trust and develop a positive culture. They must liaise with other stakeholders on behalf of the other surgeons they represent. Leaders below them within the organisation, but who are still responsible for large service or functional areas, such as medical or divisional directors, are more likely to have to drive change forward and challenge culture. Newly appointed consulting or attending academic surgeons tend to have significant responsibility for service delivery and require strong leadership qualities to underpin the application of their management skills, which are used on a daily basis when dealing with the team and liaising with other teams or organisations. Finally, trainees, who may not necessarily see themselves in leadership roles in a formal sense, can implement change and solve problems on the frontline if equipped with the right skills.
737
supported by well-developed systems, involving good information provision, clear lines of reporting and responsibility, and an organisational culture that values such information and encourages its use as a vehicle for improvement of performance. Academic surgeons in senior leadership roles, whatever the organisation, must now be equipped with skills in service redesign and health care improvement that have been developed and applied in several settings. Elements of this skill set may include being able to see services from the patient’s point of view, streamlining care by eliminating unnecessary steps and matching demand for services with capacity. Leading change to bring about substantial service improvement with and through others will require vision and initiative, but perhaps hardest will require mobilising and motivation of other surgeons. Leaders have to enable people to achieve more than they did previously. Healthcare systems, such as the National Health Service, require a workforce with the leadership capacity and capability to take forward the service and deliver results. As Heifetz and Laurie commented: “Rather than providing answers, leaders have to ask tough questions. Rather than protecting people from outside threats, leaders should let the pinch of reality stimulate them to adapt. Instead of orienting people to their current roles, leaders must disorient them so that new relationships can develop. Instead of maintaining norms, leaders must challenge ‘the way we do business’ and help others distinguish immutable values from the historical practices that have become obsolete”. In surgery and in the NHS more generally, there is often a desire to maintain the norm rather than adopt or adapt to new practices. High-quality leadership is essential to ensure that change for quality improvement continues and the status quo is challenged.
58.5 How Best to Develop Leadership Skills in Surgeons? 58.4.5 What Are the Common Attributes of Academic Surgical Leaders and Their Requirements? To ensure a successful surgical service, anyone adopting a leadership role, at whatever level, needs to be
There is considerable debate about whether leaders are born or whether they can be made. While Kotter argues that leadership consists of a series of definable skills that can and should be taught, others have suggested that there must be a prerequisite level of innate natural leadership ability. Whatever the process through which they
738
arrived there, potential leaders need to be identified at an earlier stage and developed to lead, rather than hoping they will emerge out of the ether. How this can best be achieved has become an area of much interest across both public and private sectors. The model of leadership that one follows influences the approach that one takes to leadership development. If leadership is viewed as an individual skill set i.e. the person as a leader, this naturally leads to an emphasis on developing personal skills, leadership qualities and insight. If leadership is felt to be associated with a formal position of authority, such as a surgical consultant, medical director, or college president, this tends to shift the emphasis onto role refinement and development. However, this can ignore the benefits of distributed leadership, prevent recognition of leadership that some individuals may be displaying informally by working through influence and restrict a culture of collective leadership developing within an organisation. Finally, leadership may therefore be viewed as a complex interaction between the leader and followers, a culture or social process concerned with motivating and influencing people and shaping change, a process that requires nurturing. If this approach is taken, leadership development will involve building relationships and commitment to the vision of the organisation and the effective design of systems to facilitate this. Leadership development will have to be targeted at all those involved in change, regardless of formal position within the organisation. Prior to leadership development taking place, it is essential to identify and recruit individuals and/or groups who will benefit the most from any scheme or programme initiated for this purpose. To do so requires an assessment of what needs to be developed in each individual and an understanding of the varying types of development available and the methods through which they can be delivered. A range of diagnostic tools is available to help individuals self-assess their leadership skills, and superiors, colleagues and subordinates can also be involved in that assessment by utilizing a 360° appraisal mechanism. Conger and Benjamin describe three types of purpose in leadership development [25]: • Individual skills development • Socialisation of corporate values and vision • Promotion of dialogue and implementation of a collective vision
O. Warren and P. Humphri
58.5.1 What Methods Are Available to Develop Leadership? There are many different methods that can be employed to develop leadership, and all should be considered as options for those aiming to develop leadership skills within their organisation. Hartley and Hinksman set out a range of approaches on which the following is based [26]. They conclude that leadership development practice should be consistent with an organisation’s strategy, culture and human resources management. Mentoring is reported to be one of the most successful leadership development methods. Mentoring has been defined as “off-line help by one person to another in making significant transitions in knowledge, work and thinking” [27] and elsewhere as “a colleague in the same or parallel organisation who is not in a line relationship with the mentee…described as being a career friend, someone who knows the ropes in an organisation and can act as sponsor and patron” [28]. Mentoring was traditionally an informal process, but concerns over its accessibility to minority groups led to research and piloting of formalised programmes, which appear to have certain advantages, and disadvantages, over the more spontaneous traditional models. Mentoring is discussed at length in a separate chapter of this book. Coaching, especially for senior leaders, has been expanding over the last few years. Coaching is aimed at performance enhancement in a specific area i.e. it is goal orientated. The goals are usually set with the coach and the coach has ownership of the process. It tends to be a relatively short-term process. However, there is still insufficient research looking at what happens in the coaching process that can support leadership development, when it is successful, why it is successful in some settings and what sort of leaders benefit most from coaching. Currently, coaches tend to be external to the organisation. However, buoyed by positive experiences, some managers are acquiring coaching skills and using them with their own teams and organisations, and some organisations have “internal coaches” to support employees’ performances. Networking can play an important part in leadership development. Networks are sustained over a longer period than coaching, and possibly even mentoring. They provide a wide range of contacts, as well as leaders with a range of perspectives, view and information. They also allow successful leaders the opportunity to increase the
58
Leadership in Academic Surgery
profile of their organisation and create opportunities for their staff. The creation of networks requires effort by an individual, but with the support and help from the organisation for whom he or she works. Networks can be formal, for example, taking part in a national group or being an active member of a society, or informal, for example getting to know and work towards a common purpose, with people interested in and working on similar issues. Action Learning is being increasingly used in leadership development. It is based on the notion that learning occurs through the joint problem solving of real issues with reflection on what happened and why. Leadership knowledge, skills and attitudes can be developed during real life projects, and by observing and working with others. Where individuals are working on similar improvement or change projects in different organisations, a shared approach to problem solving can be beneficial in making progress. Three hundred and sixty degree Feedback can provide individuals with the views of their line manager, peers, direct reports and others with whom they work about their leadership style and skills and the impact they have on others. A variety of different methods are available, many based online, including the NHS Leadership Qualities Framework 360° process. It is essential that the final results are fed back to the individual concerned by an accredited individual to ensure that the process is constructive. This process is increasingly used in business, but few, if any, academic surgeons currently have access to this resource. Leadership programmes have a part to play and are especially important when the leadership team participates and uses challenges facing their organisation as a case study material to be explored within some of the sessions. However, using the plethora of courses available in the marketplace on an individual basis leads to a piecemeal approach to leadership development, which is likely to have a limited effect on leadership behaviour across the organisation and make minimal impact on large-scale service improvement. Job Challenge or “stretch” assignments can offer important development opportunities. They can either be within one’s own organisation or another but should require the individuals to work outside their comfort zone and learn new skills, knowledge and behaviours to achieve the desired results. Clear objectives and the chance to reflect on learning, together with support from both employing and hosting organisations, are important components of job challenge.
739
58.5.2 Who Should Be Responsible for Leadership Development in Academic Surgeons and How Should It Be Delivered? It is not clear who is immediately responsible for developing leadership skills in academic surgeons. A number of professional bodies and organisations could lay claim to having a role, conversely, it would seem that no one has yet to really take the responsibility. The colleges, professional associations and regulatory bodies are starting to recognise that there is a paucity of opportunities for surgeons to develop these skills, but progress, and the required change in culture, is slow. Many large academic institutions have their own staff development units, which take responsibility for leadership development, although understandably, these are not specific to clinicians, never mind surgeons, and thus tend to offer more generic solutions. All NHS organizations should have leadership strategies addressing the needs of leaders at all levels, including surgeons, and there are some excellent examples of where there is a real focus on developing people from clinical backgrounds to take on senior leadership roles. Strategic health authorities have a key role in ensuring that systems and processes are in place to identify and develop people with the potential to take on the most senior leadership roles at local and national levels for the NHS. Finally, individuals can do a great deal to help their own development as leaders. Leadership is somewhat about “doing it”, the cycle of trial and error and learning from both success and failure. Regardless of the methods used or ultimately who takes the responsibility, the role of surgeons as potential leaders of services and leaders of change needs to be highlighted at a national level, and leadership and service improvement skills need to be included in the curriculum both at undergraduate and postgraduate level, as an essential element of training. At postgraduate level, support for developing leaders should include team development, relevant skills training, change management, service improvement and leadership development. In addition, senior surgeons should be encouraged to act as mentors for more junior staff and should expose them to leading change within services, departments and divisions. For all surgeons in, or aspiring to, a leadership role, particular attention should be paid to putting in place and enabling the completion of personal
740
development plans covering leadership development. These plans should relate to the achievement of both personal and organisational objectives. A range of development opportunities should be offered ranging from basic leadership and service improvement skills through to team leadership and participation in programmes of change within and across organisations. The creation of collaborative programmes where surgeons with leadership potential are brought together with other clinical leaders and senior level managers to share experience in bringing about service improvement and aid in each other’s development should be considered. Some of this work is already underway. In the United Kingdom, a joint project between the NHS Institute for Innovation and Improvement and Academy of Medical Royal Colleges entitled “Enhancing Engagement in Medical Leadership” is designed to encourage doctors to become more actively involved in the planning, delivery and transformation of services and to help the NHS create a culture where doctors are engaged and lead. Summary of suggestions for leadership development of academic surgeons • Leadership development programmes need to start from where people are, not where others expect them to go • Programmes that work with individuals and teams in the context of their organisations are likely to be more effective than individuals taking part in separate programmes • The involvement of senior surgeons and representative bodies is crucial if leadership development is to be given high enough profile and the profession is to be engaged • Standard programmes are less effective than those tailored for the context • Leadership development needs to be both work and programme based and take account of organisational and professional culture • Successful development events are likely to be those where the emphasis is on the individual, their reactions and their impact rather than on imparting knowledge • Surgeons and managers must work together in change teams, rather than managers imposing change from above • Focus development on implementation and action rather than formal competence building
O. Warren and P. Humphri
References 1. Kotter JP (1996) Leading change. Harvard Business School Press, Boston 2. Gardner JW (1990) On leadership. Free Press, New York 3. Sadler P (1997) Leadership. Kogan Page, London 4. Fiedler FE (1967) A theory of leadership effectiveness. McGraw-Hill, New York 5. Saal FE, Knight PA (1998) Industrial/organizational psychology: science and practice. Brooks/Cole, Pacific Grove, CA 6. Hersey P, Blanchard K (1969) Management of organizational behavior. Prentice Hall, Englewood Cliffs, NJ 7. Adair JE (1983) Effective leadership. Aldershot, Gower 8. Alimo-Metcalfe B (1998) Effective leadership. Local Government Management Board, London 9. Bass BM (1985) Leadership and performance beyond expectations. Free Press, New York 10. Kotter J (1990) A force for change: how leadership differs from management. Free Press New York 11. Goleman D (1995) Emotional intelligence. Bantam Books, New York 12. Greenleaf RK (1998) The power of servant-leadership. Berrett-Koehler, San Francisco 13. Kouzes JM, Posner BZ (1987) The leadership challenge: how to get extraordinary things done in organizations. JosseyBass, San Francisco 14. Salovey P, Mayer JD (1990) Emotional intelligence. Imagination Cogn Pers 9:185–211 15. Collins J (2001) Good to great. Harper Collins, New York 16. Heifetz R (1994) Leadership without easy answers. Belknap Press of Harvard University Press, Cambridge, MA 17. Goodwin N (2005) Leadership in health care: a european perspective. Routledge, London 18. Swayne LE, Duncan WJ, Ginter PM (2002) Strategic management of health care organizations. Blackwell Business, Oxford 19. Lewin K, Llippit R, White R (1939) Patterns of aggressive behavior in experimentally created social climates. J Soc Psychol 10:271–299 20. Bolman LG, Deal TE (1991) Leadership and management effectiveness: a multiframe, multi–sector analysis. Hum Resour Manage 30:509–534 21. Drath WH (1998) Approaching the future of leadership development. In: McCauley CD, Moxley RS, Van Velsor E (eds) Handbook of leadership development. Jossey-Bass, San Fransisco 22. Ham C (2003) Improving the performance of health services: the role of clinical leadership. Lancet 361:1978–1980 23. Berwick DM (2003) Improvement, trust, and the healthcare workforce. Qual Saf Health Care 12:448–452 24. Institute of Medicine (2001) Crossing the quality chasm: a new health system for the 21st century. National Press, Washington 25. Conger JA, Benjamin B (1999) Building leaders: how successful companies develop the next generation. Jossey-Bass, San Francisco 26. Hartley J, Hinksman B (2003) Leadership development: a systematic review of the literature. NHS Leadership Centre, London 27. Clutterbuck D (1992) Everyone needs a mentor. IPM, London 28. Rogers J (2004) Coaching skills: a handbook. Open University Press, Maidenhead
Using Skills from Art in Surgical Practice and Research-Surgery and Art
59
Donna Winderbank-Scott
Contents 59.1 Introduction ............................................................... 741 59.2 Historical Perspectives .............................................. 742 59.2.1 59.2.2 59.2.3 59.2.4 59.2.5
Anatomical Art......................................................... Leonardo Da Vinci ................................................... Recording Surgical History...................................... Recording Surgical Technique ................................. Methods of Illustration.............................................
742 742 743 743 744
59.3 Similarities Between Art and Surgery .................... 744 59.3.1 59.3.2 59.3.3 59.3.4
Fine Motor Control .................................................. Spatial Awareness .................................................... Form and Shape ....................................................... Observational Skills .................................................
744 744 745 745
59.4 Interpreting Visual Information .............................. 745 59.4.1 Depth Perception...................................................... 746 59.4.2 Interpretation of Laparoscopic Images .................... 746 59.5 Practical Uses of Art in Surgery .............................. 746 59.5.1 59.5.2 59.5.3 59.5.4 59.5.5
Memory and Revision .............................................. Art in Clinical Practice............................................. Art and Teaching ...................................................... Digital Images and Manipulation............................. Art and Aesthetic Surgery ........................................
746 746 746 747 748
59.9.1 Illustration and Research.......................................... 750 59.9.2 Surgical Simulation .................................................. 750 59.9.3 Innovation ................................................................ 750 59.10 Advantages of Artistic Training for Surgeons ...... 750 59.11 Summary.................................................................. 750 References ........................................................................... 751
Abstract Many aspects of surgery and the arts are inexorably linked together, and the use of artistic techniques or ideas to improve surgical practice is present throughout the history of medicine. In all forms of surgery, throughout the ages, artists have been used to illustrate new ideas, new techniques and to document surgical practice. Some of the most successful art originated from surgeons who were also artists; perhaps due to their innate understanding of what they were portraying. Studying art and developing creative skills can enhance modern surgical practice in many different ways.
59.1 Introduction
59.6 Art Therapy ............................................................... 749 59.7 Art in Hospitals ......................................................... 749 59.8 Sources of Art Information and Useful Resources................................................................. 749 59.9 Inter-Relationships Between Professional Artists and Surgeons .............................................. 750
D. Winderbank-Scott The Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
Many aspects of surgery and the arts are inexorably linked together, and the use of artistic techniques or ideas to improve surgical practice is present throughout the history of medicine. In all forms of surgery, throughout the ages, artists have been used to illustrate new ideas, new techniques and to document surgical practice. Some of the most successful art originated from surgeons who were also artists, perhaps due to their innate understanding of what they were portraying. Studying art and developing creative skills can enhance modern surgical practice in many different ways. This chapter aims to highlight the uses of art in surgical practice and surgical research.
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_59, © Springer-Verlag Berlin Heidelberg 2010
741
742
D. Winderbank-Scott
Fig. 59.3 Oil on Canvas Board, D. W. S 2005 [14]
Fig. 59.1 Oil on Canvas Board, D. W. S 2005 [14]
59.2 Historical Perspectives Drawings and paintings are of utmost importance in understanding the history of surgery. Some of the earliest examples of surgical practice are recorded in ancient manuscripts, and cave-paintings have been found to depict procedures such as trepanning. The evolution of anatomical knowledge can be illustrated from successive anatomical texts, and an understanding of historical practices can be derived from paintings of surgeons at work (Figs. 59.1, 59.3).
in 1555 [2]. Many anatomists subsequently produced their own illustrations in various styles and with varying degrees of accuracy and imagination. Once the general public became interested in the new portrayal of the human form, anatomical illustrations became more tailored to public display. Images of isolated anatomical specimens were discarded in favour of detailed corpses in various poses. Indeed, some of these illustrations became so creative that John Bell (1763–1820), condemned “fanciful” artists in favour of “accuracy of representation”. John Bell was a successful surgeon in Scotland in the late eighteenth century and produced many surgical and anatomical illustrations, which are notable for including all the details present, including instruments and even reflections from a window! Anatomical illustration then returned to a more scientific perspective until a recent revival in public interest following the exhibition of work by anatomist Guther Von Hagen. By posing “Plastinated” anatomical specimens, the style popular with the Victorians is recreated in three dimensions for display to a new and modern audience.
59.2.1 Anatomical Art 59.2.2 Leonardo Da Vinci The study of Anatomy provides one of the most common uses of art in the history of medicine. The first widely published modern anatomical text [1] “De Humani Corporis Fabrica” was illustrated and written by Versalius (1514–1546), a Flemish physician and anatomist. He prepared the text in Italy with the help of local students, and initially published it in 1543 with an update
In the fifteenth century, renaissance artists started to study anatomy in order to portray the human body as realistically as possible (Fig. 59.6). Leonardo Da Vinci commenced his study of anatomy in 1507 following his apprenticeship to Andrea del Verrocchio. His dissections culminated in a group of anatomical drawings,
59 Using Skills from Art in Surgical Practice and Research-Surgery and Art
743
59.2.3 Recording Surgical History
Fig. 59.6 Vitruvian Man by Leonardo Da Vinci
produced sometime after 1510 in collaboration with a doctor, Marcantonio Della Torre [3]. These images were intended for publication, but they were not wildly distributed until after Da Vinci’s death, and unfortunately did not make much of an impact on anatomical knowledge at that time. The images themselves are notable for their accuracy and mostly remain relevant today. For example, notes containing images of musculature are surrounded by machinery, linking form to function. In 2005, his studies of heart valves and notes relating the anatomy to liquid flow and vortices inspired Mr Francis Wells (Papworth Hospital, Cambridge) to modify conventional Mitral Valve repair techniques [4]. Leonardo Da Vinci is also credited with some of the earliest designs for automated or robotic machines, hence the naming “Da Vinci Surgical Robotic System” in his honour (Fig. 59.2).
Fig. 59.2 Paintings of the Da Vinci Surgical Robot in use at St. Mary’s Hospital
Important events in the history of medicine were recorded for posterity by artists working in a variety of media. These paintings give us an understanding of how surgeons operated and how surgical techniques were taught. The majority of classical surgical paintings depict a notable surgeon and an anonymous patient at the centre of a full lecture theatre. For example, Adalbert Franz Seligmann (1862–1945) painted Theador Billroth (who gave his name to the Billroth 1 Partial Gastrectomy) in 1890, demonstrating abdominal surgery to an audience. Another distinguished painter Thomas Eakins (1844–1916) produced many seminal paintings of surgeons as teachers and educators. Other paintings depict important historical events such as the first public demonstration using Ether. This took place in Massachusetts General Hospital on October 16th 1846 and it was arranged that daguerrotypist Josiah Hawes would photograph the event. Notes made at the time record that Mr. Hawes was so disturbed by the proceedings that he was unable to operate his camera! The event was subsequently painted by Robert C. Hinckley (1853–1941) but the painting wasn’t completed until 1892.
59.2.4 Recording Surgical Technique Many surgeons who had artistic abilities illustrated their own techniques and refinements to procedures. American Neurosurgeon Harvey Cushing (1896–1939) was also an accomplished artist. His medical school and clinical notes are annotated with drawings of his patients, and he drew his surgical findings before even
744
removing his gloves. A friendship and collaboration with Medical Illustrator Max Brodel (1870–1941) resulted in the creation of beautiful graphite illustrations to depict new approaches to the Pituitary gland, and under Brodel’s tuition, Cushing’s artistic skills were refined further. Brodel was the first director of the first Medical Illustration Departments, created in 1911 at the John Hopkins School of Medicine. Students were taught both artistic and illustrative techniques along with anatomy and medicine, enabling closer collaboration between clinicians and artists in order to improve the quality of medical illustration.
D. Winderbank-Scott
created more easily and in less time than required for an artist to paint [1]. Currently, medical and anatomical imagery is widely available both in printed form and on the Internet. For example, Frank H Netter (1906–1991) is one of the most famous modern physician-artists who produced libraries of anatomical and disease illustrations throughout his lifetime. These are now available in image banks, online.
59.3 Similarities Between Art and Surgery
59.2.5 Methods of Illustration
59.3.1 Fine Motor Control
A variety of artistic techniques have been used to depict anatomical and surgical detail. The development of the printing press in the fifteenth century enabled mass production of surgical and anatomical texts, which revolutionised medical education and enabled the dissemination of new advances in surgery across the world. Woodcutting is an early technique dating from 1300, which enabled multiple prints to be made and is the technique used by Versalius[1] in “De Humani Corporia Fabrica”. A drawing is traced onto a wooden block and areas cut away with a knife. The wooden block can be inset with typeface blocks and a mechanical press used to print multiple pages. This method was subsequently refined in the 1700s where the wooden block was engraved instead of cut, allowing more detail. Copperplate engraving is a similar technique using copper sheets but required a more complicated printing process. Alternatively, coated metal plates were etched to remove the coating and then dipped into acid to dissolve the metal in the exposed areas. Coloured illustrations were more difficult to produce and originally, prints would be individually hand coloured until the use of multiple colour plates was developed. Lithography was invented in 1798 and uses the repellence between oil-based ink on the plate and water-based ink in the printing press to produce a coloured image. Photography was invented in the 1800s, using a variety of chemically sensitised surfaces, which react to light. This enabled people with no artistic talent to create images and records, which could be subsequently printed and disseminated. Photography also allowed accurate records of medical conditions to be
Both artists and surgeons need to have important skills. First, co-ordination and fine motor control is required in order to achieve the desired results, whether wielding a scalpel or a paintbrush. Surgeons refine these skills with practice and repetition e.g. the motions required in knot tying or suturing can be practiced until they become automatic. This may be one reason why surgical skill can be taught to some degree to candidates with little or no natural aptitude. The development of fine motor control in art is a more generalised process, possibly because an artist rarely produces the same image twice. Although the scientific theories of perspective, proportion and colour can be taught, the skill to accurately produce the desired brush stroke seems to require some underlying aptitude. Developing co-ordination and control through artistic endeavours may have a beneficial effect on subsequent surgical ability.
59.3.2 Spatial Awareness Spatial awareness is also well developed in both surgeons and artists. The ability to work with materials in three dimensions may be more important in sculpture than with two dimensional drawing, but the understanding of space and interactions between objects in that space is fundamental to all art forms. In surgery, an understanding of tissues and anatomical structures; how they lie and interact; and the mechanics of tissue displacement are of paramount importance. This knowledge originates in the anatomy of
59 Using Skills from Art in Surgical Practice and Research-Surgery and Art
tissues in a surgical field, but when a disease process has distorted the norm, a more holistic understanding of space and interactions may be required.
59.3.3 Form and Shape An understanding of form and function is needed in many types of surgery (Fig. 59.4). Shape can be considered in scientific terms as well as aesthetically. For example in Vascular Surgery, creation of an arterial anastamosis will depend heavily on the arrangement of vessels and constraints exerted by surrounding tissues. A successful anastamosis will depend on its shape, the degree of stenosis, the angle of flow and many other factors. An appreciation of liquid physics and how flow is affected by form is important, as well as the ability to recreate that ideal at the time of surgery. Translating ideas into shapes and structures is another element, which surgery has in common with both painting and sculpture.
Fig. 59.4 Monocular cues
745
59.3.4 Observational Skills In order to paint, draw or sculpt, the artist must first see the object or scene in great detail. This requires development of observational and perceptive skills. In particular, an artist must resist the temptation to draw what they “think they see”. Detection of abnormalities in a patient or deviations from the norm requires close observation. In surgery, identification of important structures such as nerves or vessels relies on observation of fine detail, especially when the surgical field has been distorted by disease processes. Such observational skills are usually honed through patient contact, experience and practice but similarly could be complemented by artistic training.
59.4 Interpreting Visual Information Laparoscopic and video-assisted surgery involves another set of skills, which can also be developed through art. The translation of three dimensional spaces into two
a
b
c
d
e
f
g
h
i
746
dimensions on paper or canvas is a fundamental skill, which is innate to some artists and studied scientifically by others e.g. by learning perspective theory. Interpretation of two dimensional images in laparoscopic surgery is a similar process in reverse, and involves the same elements of visual perception.
59.4.1 Depth Perception The interpretation of depth and understanding of three dimensional space normally involves the use binocular cues i.e. those requiring the use of both eyes. “Stereopsis” is the comparison of differences in the images reaching the brain from each eye to calculate distance. Physiological cues from eye movement (eyes converge and rotate to focus on a nearby object) and the degree of lens accommodation are also used by the brain for depth perception. When looking at a two-dimensional image, either on screen or canvas, binocular cues are absent and we rely instead on monocular cues. These are not innate and are learnt and developed through experience. Monocular cues include: (a) Relative size – objects further away appear smaller (b) Interposition – overlapping objects (c) Linear Perspective – converging lines and vanishing points (d) Light and shadow – illustrates both the form of an object and its relative position (e) Texture – the size of textural elements reduces and the texture becomes less detailed with distance (f) Clarity/focus – distant objects appear more blurred (g) Elevation – higher objects appear further away (h) Motion Parallax – when moving, further objects move slower than closer objects. (i) Colour – distant objects are usually lighter and bluer
59.4.2 Interpretation of Laparoscopic Images Interpretation of Laparoscopic images relies heavily on relative size and interposition. Shadow information is less reliable because the light source is on the camera. As the camera moves, shadows are reduced from the central
D. Winderbank-Scott
field and unseen objects can influence shadow detail. Textural information is minimal as the majority of tissues are smooth; elevation and perspective cues are absent. Focusing and clarity depends on the camera’s focusing length and depth-of-field but can be useful. For example, if different objects are in focus at the same time, they must be at the same relative distance from the camera.
59.5 Practical Uses of Art in Surgery The development of artistic skills can be very useful to a practicing surgeon in many different aspects of practice.
59.5.1 Memory and Revision The ability to reproduce accurate diagrams or drawings is very useful when revising. The act of copying a diagram requires a deeper understanding about the lines and form portrayed than can be achieved by simple observation. The actual act of reproducing the diagram can aid with recall and memorisation. Diagrams can then be included in revision notes and adapted, colour-coded or annotated as required.
59.5.2 Art in Clinical Practice The skill of being able to draw a recognisable diagram or picture is invaluable in consultation with patients. Although dedicated surgical and anatomical diagrams exist, they are rarely available when needed and can be too complex to be easily used. If a surgeon can spontaneously produce a diagram when required, it can be tailored to the individual patient’s circumstances and to their level of understanding. This diagram can be subsequently annotated if the patient asks for more detail; and can be left with the patient afterwards (Fig. 59.5).
59.5.3 Art and Teaching Similarly, being able to accurately draw a concept or illustrate a technique can be useful in situations where
59 Using Skills from Art in Surgical Practice and Research-Surgery and Art
747
Fig. 59.5 Diagram for a patient information leaflet to show Jump Grafting in Aortic Aneurysm repair. Pen and ink + digital manipulation D. W. S 2007 [14]
a surgeon is educating colleagues or medical students. In ad-hoc or opportunistic teaching environments, preprepared materials are unlikely to be available, so artistic skills can be useful in order to create realistic and representational images when necessary.
59.5.4 Digital Images and Manipulation When preparing for a teaching session or in formal presentations, the diagrams available may not be suitable or may not illustrate the point clearly. A frequent
problem encountered when creating presentations is that a diagram needs to be reduced in size and the annotations become unreadable. Alternatively, a low resolution diagram needs to be enlarged, which makes the image and text blurred. In these circumstances, being able to produce new diagrams or adapt existing digital images is a very useful skill. Adapting images doesn’t require artistic or technical ability and should be possible in most circumstances where an existing diagram is available but not suitable. There are various methods available. Traditionally, tracing paper and pencil would be used to copy the desired parts of the image and then
748
adaptations or annotations can be made. Carboncopy paper can also be used directly underneath the image. Another technique is using a light-box (or holding paper up to a window), which enables images to be traced without the need for transfer or tracing paper, but will not work with thick paper or card. Images can be enlarged or reduced in size using photocopiers or by scanning and re-printing. Scanned images can also be altered in size in digital art programs. When the text on a diagram becomes unreadable (either due to the resolution or the size) or the text is in the wrong place, it can be removed by using the program’s eraser tool or by using a paint tool with the paint set to the same colour as the background. The text can then be re-typed, or the image saved without the text and the annotations added back in PowerPoint. This enables greater control over the text size and positioning and ensures it is readable on the final slide. More complex image editing is possible with practice; for example re-drawing lines behind where text has been erased or changing the colour scheme of a diagram to fit in with a background. Programs are available online, which can be used to manipulate diagrams in this way. For example, “The GIMP” is an open source alternative to more complex digital art programs such as Adobe® Photoshop® and can be downloaded free of charge.
59.5.5 Art and Aesthetic Surgery Formal art appreciation and art classes are now an accepted part of Cosmetic and Plastic surgical training [5]. For an Aesthetic Surgeon, appreciation of the aesthetic ideal and the skill to recreate this form as closely as possible despite anatomical and tissue constraints is of utmost importance in achieving good outcomes. This appreciation can be enhanced by the study of conventional beauty in art, photography and sculpture. There are also various approaches to assessing form and structure, which can be taught to plastic surgical trainees. For example, methods have been developed to assess the 3D structure of the human face, identify areas for augmentation, and to appreciate what is
D. Winderbank-Scott
possible in terms of surgical alteration [6]. Artists have always studied the human body in order to recreate life-like and beautiful images. Michelangelo’s statue of David is notable for its perfect body proportions and Leonardo Da Vinci’s Vitruvian man shows that the human body fits into both circular and square geometries. While ideal proportions remain relatively fixed, the accepted convention of beauty changes over time or between cultures. This is reflected in the art of each period, for example contrasting the voluptuous female figures portrayed by Botticelli and Raphael with the wasp-waisted women of Victorian Portraiture. This understanding of conventional beauty and proportion is understandably important to an Aesthetic Surgeon [7]. An essential difference between recreating the human body on canvas or through sculpture and reshaping through surgery is the artist’s ability to go back and correct their work. Such adjustments can be very limited in surgical terms and therefore pre-operative planning becomes more of a priority. A recent development in this area is specific software to assist surgical planning. For example, such programs can be used to compare patient dimensions to the aesthetic ideal, calculate angles, create templates and to combine information from X-rays, CT scans and other imaging modalities where necessary. Morphing programs are also gaining in popularity among Cosmetic Surgeons. These can be used modify photographs of a patient to predict or show potential outcomes. This can be useful in consultation both to help the patient communicate their wishes to the surgeon and to offer the patient different choices of procedure or degree of augmentation. However, there can be problems if the predicted and surgical results differ. Digital manipulation doesn’t take into account healing and tissue characteristics or surgical and anatomical constraints; and the manipulation of an image is very operator dependent. Websites are now offering digital manipulation services directly to the consumer. Beautysurge.com allows potential patients to e-mail a photograph, which is then altered to show the result of their desired surgery, allowing them to choose whether or not to go ahead for consultation. This sort of service is usually independent from Aesthetic Surgical Practices and their images may
59 Using Skills from Art in Surgical Practice and Research-Surgery and Art
not correlate with what is surgically possible, or with an individual surgeon’s skill.
59.6 Art Therapy Art therapy is a form of psychotherapy using the creation of art as a means of therapeutic self expression to “effect change and growth on a personal level through the use of art materials in a safe and facilitating environment” [8] One of the main benefits of such therapy is to help people who have difficulty communicating to express themselves – e.g. stroke patients, those with learning difficulties and patients with mental health issues. Art therapy has also been used extensively in oncology [9] and palliative care settings. In paediatrics, it can be used both generally and in allaying fear prior to surgery [10]. Art therapists advocate benefits outside of patient care too. Group and individual therapy sessions are now used privately and in business to reduce stress, encourage self-exploration and creative potential and to increase insight and improve coping mechanisms. Awareness of art therapy is important in the medical profession so that patients can be referred appropriately, but there is also potential for art therapy techniques to be used by surgeons and healthcare professionals for their own benefit.
59.7 Art in Hospitals Most hospitals have some form of art on display at all times. The depth of involvement or commitment to art does vary between newer hospitals built with environmental aesthetics and art in mind (e.g. Chelsea and Westminster Hospital, London) and older hospitals designed for function alone, but in which dedicated volunteers display and rotate artwork throughout departments. It is difficult to scientifically assess the impact of enriching the hospital environment in this manner, and much debate has been held on the subject. Logically, patients would seem to benefit from aesthetically pleasing surroundings but the degree of influence this has on recovery or healing is hard to
749
define. Hospitals serve diverse populations whose appreciation of various forms of art is equally varied. Therefore, art and exhibitions within a hospital setting may have to be carefully selected. However, patients and staff as a whole seem to appreciate efforts to enhance the hospital environment, even if scientific evidence for this is lacking [11–13].
59.8 Sources of Art Information and Useful Resources
1. Anatomical resources http://fineart.sk: Anatomy books for the artist by Andrew Loomis available for free download www.portrait-artist.org: Step-by-step guides for drawing figures and faces with information on shading techniques and materials to use 2. Conventional Art resources www.artgraphica.net: on-line lessons in fine art in a variety of media www.how-to-draw-and-paint.com: Back to basic instructions on building images from simple shapes and how to use oil, acrylic, watercolour and other media. Includes video-based lessons 3. Digital manipulation www.computerarts.co.uk/tutorials/2d_and_photoshop/anatomy_illustrated: a tutorial on producing anatomical illustrations digitally. The site also contains many resources on learning Adobe® Photoshop®, other computer art packages and 3D modelling tutorials www.cbtcafe.com/photoshop/ or www.tutorialized.com: Basic Photoshop® tutorials, how to manipulate and resize images, use brushes and special effects. Sites also contains tutorials for Power-Point® and Web designing 4. GIMP – GNU image manipulation program This is a free to download and use, open source graphics program with many features for image manipulation, photographic retouching and creating new graphics. It can be downloaded from www. gimp.org and tutorials are available at www.gimptutorials.com
750
59.9 Inter-Relationships Between Professional Artists and Surgeons 59.9.1 Illustration and Research Surgeons may have to work with professional artists in a variety of ways. In scientific research, medical illustrators are often asked to create diagrams for publications or presentations to emphasise key points, or to illustrate particular techniques. In order to create an effective image, the artist needs to understand what is to be illustrated and the ability of the surgeon to explain this to a non-medically trained artist is important. For example, if illustrating a new surgical technique, the surgeon may be able to identify areas, which are of greater importance (e.g. avoiding the coronary sinus during valve implantation), to be emphasised in the diagram. Important elements can then be depicted more accurately or in more detail than the rest of the process.
D. Winderbank-Scott
understanding of the operating theatre environment (e.g. to dictate the size of the robot), the needs of the surgeon (in terms of ergonomics and accuracy of movement required) and the needs of the surgery (e.g. in the design of the instruments). Collaboration on a smaller scale can be equally effective. For example, the Academic Department of Surgery at St. Mary’s Hospital, Paddington [14], London; and the Royal College of Art, have developed close links in order to encourage innovation in healthcare. Ongoing joint research projects involve both designers and clinicians to improve the design of existing equipment and develop new solutions to clinical problems to improve patient safety. Students from the Royal College of Art have taken part in projects supervised by surgeons, for example in the re-design of suturing packs for solo use and to improve the safety of sharps disposal equipment.
59.10 Advantages of Artistic Training for Surgeons
59.9.2 Surgical Simulation The development of surgical simulators and virtual reality surgical training also requires a close working relationship between surgeons and digital artists in order to create accurate representational surgical models and environments. This mainly takes place in dedicated research centres but surgical simulations are being increasingly used in training and assessment of surgical skill. An understanding of visual perception theory can be invaluable when using and being assessed on such simulators, in order to improve performance.
59.9.3 Innovation Innovation requires close collaboration with designers and engineers. Often surgeons lack the necessary design knowledge to produce a design to fulfil a need; and conversely, the designer lacks the surgical experience to see where a need occurs and what elements of the design will be of importance to the surgeon. In partnership, both the need and the solutions can be identified. An example on a commercial scale is in surgical robotics. Development of a surgical robot requires extensive design and technological knowledge, an
Art can provide a distraction, an escape, a form of relaxation and for a surgeon, can be used to develop skills, which are also applicable to surgical practice. Drawing, painting and sculpting can all be used to improve observational skills, fine motor control, spatial perception and co-ordination, which are all essential in surgery. Producing finished artwork is an additional benefit, whether the art is a surgical illustration for research, teaching or presentation; or a painting for display or sale. The therapeutic nature of creative pursuits in regards to stress relief and coping mechanisms should also be emphasised. For Aesthetic and Cosmetic Surgeons, studying art and beauty is more readily accepted as beneficial to their clinical practice. However, surgeons of all disciplines can appreciate the desirability of a good cosmetic result, the importance of incision placement and the accuracy of skin closure.
59.11 Summary There are many areas wherein surgery and art can be complementary. Interactions between surgeons, artists and designers can be mutually beneficial, in terms of both
59 Using Skills from Art in Surgical Practice and Research-Surgery and Art
innovation and self-development. Understanding art and visual perception can aid in laparoscopic image interpretation and in the use of digital surgical simulators. Development of artistic skills in surgeons not only encourages the development of surgical skill but can also improve teaching abilities, presentation skills and interaction with patients. Surgeons who are also artists have a unique role to play in bringing both disciplines together; and in educating others about the benefits of art knowledge, skills and appreciation to both clinical practice and life in general.
References 1. Vesalius A (1543, 1555) De humani corporis fabrica – [“On the fabric of the human body” translation by Northwestern University]. Available at: http://vesalius.northwestern.edu/ 2. National Library of Medicine Exhibit (2006) Dream anatomy: a national library of medicine exhibit. Available at: http://www.nlm.nih.gov/dreamanatomy/ 3. University of the Arts London (2007) Leonardo Da Vinci – themes and trails. Available at: http://www.universalleonardo.org/
751
4. BBC News (2005) Da Vinci clue for heart surgeon. Available at: http://news.bbc.co.uk/2/hi/health/4289204.stm 5. Guneron E, Kivrak N, Koyuncu S et al (2005) Aesthetic surgery training: the role of art education. Aesthet Surg J 25: 84–86 6. Shepherd L (2005) Practical sculptural training for the plastic surgeon. Int J Surg 3:93–97 7. Morani AD (1992) Art in medical education: especially plastic surgery. Aesthetic Plast Surg 16:213–218 8. British Association of Art Therapists (BAAT) (2008) What is art therapy?; Available at: http://www.baat.org/art_therapy.html 9. Oster I, Svensk AC, Magnusson E et al (2006) Art therapy improves coping resources: a randomized, controlled study among women with breast cancer. Palliat Support Care 4:57–64 10. Crowl M (1980) The basic process of art therapy as demonstrated by efforts to allay a child’s fear of surgery. Am J Art Ther 19:49–51 11. Baum M (2001) Evidence-based art? J R Soc Med 94: 306–307 12. Baum M (2004) The healing environment: without and within (book of the month: book review). J R Soc Med 97: 145–146 13. Scher P, Senior P (2000) Research and evaluation of the Exeter Health Care Arts Project. Med Humanit 26:71–78 14. Winderbank-Scott D (2008) Paintings and images. Available at: http://www.gambyte.co.uk/
Administration of the Academic Department of Surgery
60
Carlos A. Pellegrini, Avalon R. Lance, and Haile T. Debas
Contents 60.1 Introduction ............................................................... 753 60.2
Challenges Faced by Modern Academic Departments of Surgery ........................................ 754
60.2.1 Rapid Pace of Scientific and Technological Change...................................................................... 60.2.2 Increasing Tensions in the Allocation of Time ........ 60.2.3 Financial Pressures ................................................... 60.2.4 Workforce Challenges .............................................. 60.3
754 754 755 755
Addressing the Challenge: Strategic Themes for Seizing and Opportunities in the Twenty-First Century .................................. 757
60.3.1 Organizing the Department for the Twenty-First Century ..................................................................... 60.3.2 Leadership Excellence: Qualities and Skills ............ 60.3.3 Surgical Divisions .................................................... 60.3.4 The Interdisciplinary Model .....................................
757 758 758 759
60.4
Surgical Innovation and Partnership with Industry .......................................................... 763
60.5
Philanthropy and Fundraising, an Essential Function of a Modern Department ....................... 763
60.6
Ethics, Professionalism, Quality ........................... 763
60.7
Focus on the Future Generations: Training, Development, Evaluation ....................................... 764
60.7.1 Medical Students: Attracting the Best and the Brightest ...................................................... 764 60.7.2 Residency Training................................................... 764 60.7.3 Fellowship Training ................................................. 765
C. A. Pellegrini ( ) Department of Surgery, University of Washington, Box 356410, Seattle, WA 98195-6410, USA e-mail: [email protected]
60.8
Faculty ..................................................................... 765
60.8.1 60.8.2 60.8.3 60.8.4 60.8.5 60.8.6
Recruitment and Retention ....................................... Development of Faculty ........................................... Balancing Faculty Efforts......................................... Faculty Compensation .............................................. Role of Promotion in Incentive System ................... Metrics of Academic Productivity ...........................
60.9
Conclusions ............................................................. 768
765 765 766 766 768 768
References ........................................................................... 769
Abstract This chapter discusses some of the challenges faced by modern academic departments and some of the ways to overcome those challenges, as well as the opportunities provided by modern technology, and ways to maximize their potential. Emphasis is placed on optimizing administration and resources to create a dynamic environment that fosters inquiry and original research and to produce academic surgical leaders in a constantly changing world.
60.1 Introduction The administration of an academic department of surgery is responsible for the day-to-day management of the department: the recruitment and retention of personnel, the provision of support services for the faculty and senior staff, the management of the finances, and the administration of the educational, research and clinical enterprise. In addition, the department administration is responsible for both short-term and long-term strategic planning: setting the vision and the goals to be achieved by the department. Thus, the individuals recruited to senior administrative posts of a modern department of surgery
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_60, © Springer-Verlag Berlin Heidelberg 2010
753
754
must not only be capable of providing academic leadership and management but also crafting the vision that provides direction for the future. Once the direction is set, a reiterative process of review will be required to ensure progress that is made to achieve the strategic goals. This chapter discusses some of the challenges faced by modern academic departments, some of the ways to overcome those challenges, the opportunities provided by modern technology, and how to maximize their potential. Emphasis is placed on optimizing administration and resources to create a dynamic environment that fosters inquiry and original research and to produce academic surgical leaders in a constantly changing world. Although the focus of this book is on research, from the perspective of administration of the department, it is impossible to separate research from education and clinical missions. The three legs of the academic department are so intertwined that the only way to achieve success in the research arm is to have strong and well-organized educational and clinical arms. Academic departments of surgery will always try to recruit those unusual individuals whose capability contributes to all the three facets of the academic mission, or the “triplethreats.” It is paramount, however, to realize that typical academic department of surgery is composed, predominantly, of faculty that practice clinical medicine with heavy clinical and resident and student teaching responsibilities. Their time commitment to research is, of necessity, limited. The department, as a whole, needs to be a “triple threat” to be academically productive. To accomplish this, the department needs to have a complement of clinical faculty members with protected time for research, and PhDs.
60.2 Challenges Faced by Modern Academic Departments of Surgery Modern academic departments of surgery face a multitude of challenges. For the purposes of discussing these challenges in the context of departmental administration, we have divided them into three distinct areas.
C. A. Pellegrini et al.
60.2.1 Rapid Pace of Scientific and Technological Change At a time of rapid scientific and technological advances, the department needs a structure that fosters (relying on incentives) entrepreneurship, risk-taking, and forward movement. Challenging traditional thought and structure rather than respect for the “status quo” must be encouraged. Constant change requires the ability to rapidly adapt to new paradigms, integrating new alliances, programs, skills, and training with existing practices born of traditional thinking, and allocating resources based on these new paradigms.
60.2.2 Increasing Tensions in the Allocation of Time A modern department administration and its faculty face growing challenges from the ever-increasing demands on each of its three academic missions (Research, Education, Clinical Care). Other stressors are the demands posed by absenteeism due to faculty travel to attend meetings. These demands have increased as the world has become more interconnected and as the number of professional organizations and focused work-groups has increased. The increased faculty absence poses particular stress to the financial health of the department. Patients, now participating as educated consumers of health care, pose new and growing demands on faculty time. They expect to see and discuss their daily care plan with attending physicians (a function once relegated to the residents); they ask complex questions and rightfully exercise their right to know details of their care. Both patients and payers focus on safety and demand a high level of efficiency. Protecting time for faculty research, therefore, requires a re-engineering of the delivery of clinical care, with emphasis on a team approach, devising effective methods that enable the transfer of accurate information among team members, and a reconfiguration of faculty duties. Recruiting these teams, organizing their work flow, safeguarding their protected research time, and assuring that the provision of care meets standards without jeopardizing the teaching and research are challenges for the administration of a modern department.
60
Administration of the Academic Department of Surgery
60.2.3 Financial Pressures 60.2.3.1 Market-Based Healthcare Traditionally, the academic department of surgery has derived a substantial portion of the revenue used to pay faculty and staff salaries from its clinical operation. Most surgical departments, as component parts of large academic medical centers, have long enjoyed exclusive ownership of the tertiary care market. This has changed with demands of the health market place and increased competition from private hospitals. As a result, a substantial portion of the care once provided by academic institutions has migrated to the community. Payers prefer to avail themselves of community hospitals to avoid the higher costs that academic centers demand to support their teaching and research missions. The fierce competition for traditional clinical activities imposes a specific challenge on the administration of the department and constantly requires it to innovate, to select fields of concentration, to develop new program areas, and to be not only more technologically advanced, but also to develop strategies that provide efficient clinical care (i.e., devoid of “waste”). Academic institutions have the distinct and distinguishing advantage of being able to transfer knowledge from bench to bedside.
60.2.3.2 Increased Reliance on Hospital Funding In the U.S., payment for professional services has decreased while that for hospital services has increased. This imbalance has allowed hospitals to maintain their financial health while compromising that of clinical departments. In order to minimize the impact of everdecreasing reimbursement for clinical services, most departments are increasingly relying on direct program support from the hospital. The responsibility for the financial health of the departments, once primarily under the control of the dean of the school of medicine, now falls on hospital administration. Increasing reliance on hospital funding has made the departments more vulnerable as recruitment and retention of faculty is dictated to a much greater extent by the needs of the hospital, primarily related to clinical services. This jeopardizes, to a significant extent, the recruitment and retention of research faculty. This is perhaps the most significant
755
challenge for the leadership of the department, assuring that the financial dependence on the hospital will not jeopardize its research. The true impact that this new relationship has on the focus, scope and extent of research of departments of surgery needs to be studied.
60.2.3.3 Increased Competition for Research Awards Departments of surgery face a substantial financial challenge in the current research environment. First, as mentioned previously, department of surgery faculty carries a substantial clinical load; it needs to operate frequently to preserveits skills; it faces constant technological changes that require its attention and this, of course, makes it less able to compete for research funds. There is a need to support these efforts, a need that challenges the administration of the department as it requires provision of time away from clinical duties. Secondly, even when research is funded, remuneration of its faculty will usually be higher than provided by federal funds. This creates a need to close that gap with internal resources. Resources have to come from cross-subsidies from clinical income and/or from philanthropy or other sources. At other times, a well-funded faculty member may experience temporary lapse in funding. In such circumstances, the department should provide bridge funding.
60.2.4 Workforce Challenges Departments of surgery face three types of workforce challenge: (a) the result of relatively major changes in the expectations of students, residents and new (younger) faculty, reflecting generational differences; (b) the influence of the gender and ethnic shift in the composition of the surgical workforce; and (c) the reduction of resident work hours imposed by the Accreditation Council for Graduate Medical Education (ACGME) to emphasize education as opposed to service. These changes emphasize the need for adaptive departmental response.
60.2.4.1 Generational Issues In the fields of sociology, psychology and education, vast research effort has been invested into generational
756
C. A. Pellegrini et al.
characteristics and their influence on societal and personal functions. Four distinct generations make up the trainees and surgical workforce today: Traditionalists (1925–1943); Baby Boomers (1943–1960); Generation X (1961–1981); Millennial Generation or Generation Y (1982–2000) [1]. Each generation, influenced by the political and socioeconomic events that occurred around the time of its birth, appears to have common underlying values and beliefs. Similarly, specific differences have been described between these four generations. Both, similarities and differences affect the way individuals relate to each other and the way clusters of individuals from one generation relate to clusters of another generation within the work environment. Postponing retirement to their mid or late seventies on the part of some surgeons has resulted in the persistence, within the work force, of the four generations simultaneously, a unique social phenomenon of the beginning of twentyfirst century and one that imposes on the organizational and management skills of the current leaders. For example, the Millennial Generation, the generation to which current medical students and residents belong, expects active participation in policy decisions that influence their work environment from the outset rather than “waiting for their turn when they are more senior”, a characteristic of the Traditionalist and early Baby Boomer generations. A person of the Millennial Generation will respond to a mentorship style that allows for greater participation and less command and control. She/he will be motivated by becoming increasingly skilled in a complex surgical procedure, working on a complex problem with highly capable partners, and having work expectations (project timelines, scope and breadth of assignments) that allow for work/personal life balance. Working in balance to provide for each individual in the four generations currently composing the workforce is a substantial challenge for the administration of a department.
residents; while women remain a minority in surgery, the growing proportion will create new challenges and opportunities. For example, women who will bear children will do so almost exclusively during their early career. Men and women will become increasingly willing to interrupt their professional commitments to participate in raising their children, particularly as there is an increasing acceptance among the surgical community. Policies will need to be designed to equitably deal with interrupted promotion timeline. These healthy lifestyle choices will also require a resultant change in production expectations and effort commitments. In addition, these changes require re-training methods, re-admission to the Maintenance of Certification processes recently adopted by the Boards and re-credentialing in the hospitals. Furthermore, it is likely that surgeons of the future will have to face lower levels of compensation than previous generations may have enjoyed in order to fund a necessarily larger medical professional workforce. Changes in the ethnic composition of the workforce are also important to understand. In the U.S., while the proportion of physicians (and medical students) of African American origin has remained flat, Latino and Asian groups continue to increase substantially [3]. It is important to have adequate representation of all ethnic groups as studies have shown that this influences the access to medical care, but appropriate plans to support individuals with diverse backgrounds, values and beliefs need to be developed and constantly updated by a conscientious administration. It behooves the leaders of the academic surgical profession to become familiar with and integrate this body of knowledge to address administrative structure, policies, curricula considerations for residents, contemplation of research topics and other aspects of the structure and function of the modern academic department.
60.2.4.2 Gender, Ethnicity, and Other Social Factors
60.2.4.3 Control and Regulation of Resident Work Hours
Karen Borman, in an analysis of 11,000 residents taking the American Board of Surgery examination, found that in the last decade the number of women residents in surgery doubled and the number of international medical graduates tripled [2]. In 2007, women comprised slightly over 20% of graduating surgical
Few challenges have stressed the administration of surgical departments as much as the implementation of strict limits in the “tour of duty” of the residents. Several studies have now examined the volume of operations done by the residents, and concerns have been raised about the limited exposure to real-time decision making
60
Administration of the Academic Department of Surgery
757
on their part. The implications of this limited access to opportunities have had substantial implications to departments. For example, psychomotor skills and the development of basic operative skills must be acquired in the laboratory via modeling and simulation. This unintended consequence of the work-hour limitation rules has benefited residents’ learning and has also improved the safety and efficacy with which some simple procedures (central venous catheterization and thoracostomy tube placement) are now done. However, there is no question that these laboratories are expensive and pose substantial challenges to the departments in terms of finances, personnel resources and space. Moreover, the limited availability of the residents for patient care further taxes the faculty in this arena, decreasing their ability to perform research.
and oversight of the trainees. The top leadership position, the chair, was supported by vice-chairs and division chiefs. The chair was an individual with demonstrated strengths in one or more of the missionspecific areas of patient care, research and education. The vice-chair positions, organized to head education, research, clinical care, fundraising and finance, were filled by faculty chosen for their knowledge and expertise in the particular area of responsibility. Thus, as many of the senior leadership positions were, in part, bestowed on individuals to honor their professional achievements and not as evidence of their skill to lead, chairs, vice-chairs and division chiefs had varying degrees of competence in business affairs, finance, operations and acumen in leadership and systems thinking. By design, this traditional structure created a “silo” approach to management of the department. Additionally, because of the nature of the appointment, most were filled with older faculty. As a result, the traditional department management structure did not have a natural mechanism for junior faculty to participate in decision making. Non-faculty administrative staff and work was also organized to be consistent with the central chair/division model. There were varying degrees of staff with expertise in grants and contracts administration, fiscal processing, personnel supervision and human resource processes, and physical plant and equipment management. As was the case with the chair, there was also varying leadership and business aptitude among administrative staff and substantial variance among departments in divisional autonomy and centralization of administrative processes. By contrast, the expectations for a modern department administrative organization have substantially changed. First, it is now widely recognized that in order for an organization to be successful, the administration needs to be cognizant of the mission, vision, goals and objectives, and should be composed of individuals who have the skills needed for the function assigned [4, 5]. An academic department’s main “output” is the answers to questions through its research, the education of the next generation of surgeons and the provision of clinical care. Success, for an academic department, is its ability to achieve a substantial output in those areas. Money, in the case of an academic department, is a means to support increased output in its core mission, not an end unto itself. Unlike business units, to which they are frequently compared, accumulation of wealth or financial success is not an evidence
60.3 Addressing the Challenge: Strategic Themes for Seizing and Opportunities in the Twenty-First Century When observing the species of the Galapagos Islands, Charles Darwin noted that it was not the fastest nor the strongest who survived – it was those who adapted to change. Surgical departments and their administrations face a similar challenge. On the other hand, there are multiple opportunities – many unique – that when used appropriately will help the survival and, indeed, the improvement of academic surgery. In this section, we will focus on the elements that are needed from the point of view of administration of a department of surgery to best position it in its quest to achieve excellence.
60.3.1 Organizing the Department for the Twenty-First Century The organizational structure of the late twentieth century academic department was built on a hierarchical central administration that oversaw divisional units. The foundation for this organization was the presence of well-delineated specialties and the educational needs
758
of accomplishment of its mission, academic success, or a “well run” department [6]. Second, as the pace of scientific discovery and technological development increases, the ability to change becomes paramount. The department needs a nimble structure that fosters entrepreneurship, risk-taking and forward movement. This philosophy requires a different set of skills among the leaders and a different administrative structure. The leader is expected to exhibit not just achievements in one or more aspects of the traditional mission but must also master the qualities of leadership, communication, adaptive schemes and business acumen [7, 8]. The support structure around the chair is composed of faculty (division leaders and others) that bring specific expertise in leadership and management, including knowledge in the organization and management of interdisciplinary ventures. It is important to have methods that allow constant input from younger faculty into the affairs of the department leadership (surveys, retreats, focused discussion groups, etc.) as this is an expectation of the new generation and as it will provide an important input in this ever-changing clinical, educational and research enterprise [9]. The leadership must also include a much more robust non-faculty administrative staff with not only knowledge and expertise in the traditional areas, but also well-versed in compliance, regulatory issues and willingness to provide the level of transparency and openness in all operations that provide the basis for appropriate relationships with other units, the hospital and the school. Recognizing the increasingly complex nature of financial and business models, many departments have evolved the vice-chair for finances into a non-faculty administrative position, which partners with the chair on matters pertaining to the leadership, finance and administrative responsibilities of the department, or have hired a non-faculty administrator to partner with the vice-chair. This person plays a most important role in the day-to-day management of the department and in the definition of the department’s strategy.
C. A. Pellegrini et al.
and for the hospitals in which faculty are deployed. Traditionally, the curriculum vitae of the applicant was the most important element, as it summarized past professional accomplishments. The process is much more complex today, as the selection committee must look at a number of other characteristics of the candidates as described above, and must also carefully reevaluate the overall needs of the department so as to be able to recruit not just the best person but the best person for the particular needs of the department at a given stage of its own development. Of course, the chair will have to be an individual that has demonstrated abilities in one or more of the missions of the department as it would otherwise be impossible to relate to the members or the department and engender their trust and support. However, the chair of the modern department must also have recognized expertise in business finance and management, exceptional leadership acumen, with particular emphasis on communication, systems thinking and adaptive decision making style. Management of a department also implies playing a major role within the school at large (which benefits the department), and this requires frequent contacts with legislators in the case of public institutions, fundraising, programmatic development, organization and introduction of new technology, knowledge about competition, market strategies and a myriad of other qualities for which most current chairs were not specifically trained. The ability to coordinate activities among disciplines, to coordinate programmatic development with the hospitals and to organize a structure that provides the transparency that is commonplace further stresses the chair’s time. Thus, there are two new qualities that chairs of modern departments must have in order to lead: welldeveloped business acumen and extraordinary communication skills. The latter quality will enable the chairs to advance an idea among the faculty, change the direction of a program and/or create partnerships with other sections and departments. They should gain increasing knowledge and expertise on relevant topics of leadership, particularly communication, Emotional Intelligence, Systems Thinking and motivation [10].
60.3.2 Leadership Excellence: Qualities and Skills
60.3.3 Surgical Divisions
Today, the selection of a chair for a department of surgery is one of the most important events for the school
Despite the move toward integration and interdisciplinary approaches, many departments have not found a
60
Administration of the Academic Department of Surgery
759
better way to organize than by the traditional mode of specialty divisions. On the other hand, and following the line of thought expressed above, the leaders of the divisions provide important support to the chair as they are part of the executive leadership of the department, where their seniority, knowledge and leadership of departmental affairs is more important than the specific focus of their practice. Of course, each division participates actively in several disease-focused centers and thus the division chiefs or their designees become the interface between the administrative structure of the department and other departments, divisions and the hospital. Furthermore, the division leaders have fiscal responsibility and provide the general direction for educational activities related to residents, fellows and graduate students in those areas, recruit new faculty and are responsible for assuring that those who join the department have the appropriate mentorship (usually several individuals, each fulfilling an aspect of the person’s activities). The division leader of the modern department must add to these duties the ability to partner with the chair to harness the collective strength of the department toward common goals, to think, plan and act strategically beyond the boundaries of his/her respective division to better the whole. In essence, both the division and the department require responsibility and advocacy. Further, as academic organizations adopt increasingly transparent accountability paradigms, the division leader must join the chair to embrace collective responsibility for the success of the entire enterprise even as she/he advocates for his/her division.
difficulty is to create centers and/or institutes in both the research and the clinical enterprise, each with a broad, yet specific focus of activity. Joint faculty appointments from other clinical and basic science disciplines facilitate and promote interactions and joint projects among the center/institute members. These programs are vital and introduce trainees to the concept of interdisciplinary work, preparing them better for the future. As departments of surgery face the challenges of creating these interdisciplinary centers, one might ask why not simply abolish the current (traditional) structure of departments (Medicine, Surgery, etc.,) and replace it with a totally new structure based, for example, on a “disease focus.” While this theory has some advocates [11], there are at least three reasons to preserve the current structure in some form. First, the creation of departments around the disease-focus will necessarily run into the same “silo” problems, as diseases and practices require involvement of many disciplines with different focused areas of expertise. Second, the current oversight of training in the United States, done by the ACGME through 23 “Residency Review Committees” is based on the more traditional structure. Lastly, the clinical enterprise, as related above, would necessitate the creation of focused areas of expertise that would be different than those practical to basic sciences and research arena. There is no question, however that hospitals would be much more efficient if they were largely organized around diseases, and many medical centers have either developed “care lines” or formal “disease center” type of structure. In order to overcome these obstacles, we recommend a “hybrid” approach. With this approach, centers and institutes are created based on the interest and expertise of faculty. Both in the clinical arena and in research, they respond to a specific “theme.” They are then, administratively housed in a department (usually the department that has the larger number of individuals participating or that has invested most of the resources). The governance of this interdisciplinary group has representation from all involved departments. Resources are allocated by this group. On occasion, centers of this nature acquire such stature within the school of medicine that their management involves direct interaction with the dean’s office. In a way, the expectation is that a modern department will keep its focus while developing enough communication, interconnections and
60.3.4 The Interdisciplinary Model The traditional boundaries between surgical specialties and clinical and research disciplines have become blurred. Indeed, the overlap in many areas related to clinical care is mirrored by that which exists in the research enterprise. The training of the work-force for the twenty-first century must entail this view as well, and thus, the teaching efforts have to be aligned with an interdisciplinary model. Modern departments of surgery should foster the concept that inter-disciplinary interaction is essential to move forward. This model imposes substantial stresses on the traditional administrative structure of a department, which needs to be adapted accordingly. One way to overcome this
760
C. A. Pellegrini et al.
collaborative work with all others. It is a concept that evolved from having strong and large silos to establishing a network of channels that communicate across all silos. Figure 60.1 shows an example of interdisciplinary cooperation between the hospital, the Departments of Surgery and Medicine and the Dean’s office in Organ Transplantation.
60.3.4.1 Interdisciplinary Research in Basic Sciences In the basic sciences research arena, the need for more sophisticated laboratories where investigators can have access to expensive, modern technology can only be achieved by developing strong, relatively large groups of people with expertise in basic sciences that collaborate around a major theme, a phenomenon, or a line of thought. At the University of Washington, the School of Medicine in partnership with a local developer started a new campus devoted to research. As each new building
comes on line, the school puts out requests for proposals. Proposals must be interdisciplinary, demonstrate the existence of a critical mass of researchers with a relatively common focus and sufficient funding and maturity to assure the viability of the new center. Once a proposal is approved – usually with the assignment of a substantial amount of contiguous space – a governance group is created (following a specific template to keep uniformity) that includes representatives from all departments involved, and the new center/institute is offered a departmental base. The ability to create centers that will use 30,000–50,000 sq. ft. of space allows the university to provide the group with common areas or work, independent conference rooms and basic infrastructure that fosters interdisciplinary contact and collaboration to a much greater extent and results in the submission of grants that cross basic and clinical disciplines. Similarly, when the University of California San Francisco (UCSF) created its new Mission Bay Campus in 2002, it moved into the new campus not departments but interdisciplinary programs (the Cardiovascular Institute, the Cancer Center, Quantitative Biology,
Solid Organ Care Line Program Cross Disciplinary Decision Making Structure
Dean/CEO
End Stage Organ Disease Program Oversight Executive Committee Chair Surgery Chair Medicine Hospital CEO
All Care-line Operations Committee
Kidney/Pancreas Careline Cross Disciplinary Team Surgical Diector Medical Director
Liver Careline Cross Disciplinary Team Surgical Director Medical Director
Heart Careline Cross Disciplinary Team* Surgical Director Medical Director
Lung Careline Cross Disciplinary Team Surgical Director Medical Director
*Matrix to this program, direct report to multi-disciplinary Regional Heart Center program
Fig. 60.1 Solid organ care line program
60
Administration of the Academic Department of Surgery
761
Bioengineering and Bioinformatics, the Neurosciences Institute, etc.) and not departments moved to the new campus. The concept here was to have interdisciplinary programs that bring together both basic science and clinical departments to jointly undertake fundamental research.
Extension of projects in the health science arena frequently leads to the development of projects to analyze clinical outcomes and through these analyses, a large number of clinical trials have had their origins in groups engaged in health services research and outcomes studies, and now profit from the rich cadre of staff with expertise and experience dealing with IRBs, compliance issues and the like. This model essentially requires a section of the administrative structure of the department to attend to its needs.
Active Role in NIH in Interdisciplinary Collaborations: Clinical Translational Research Institutes The NIH has recently financed a limited number of Clinical Translational Research Institutes. These institutes receive substantial resources from NIH and are intended to provide an infrastructure that can foster interdisciplinary clinical and translation research. When fully developed, they provide central resources for faculty engaging in clinical trials, clinical outcomes research and a number of other bench-to-bedside activities. Additionally, these institutes are charged with the education of junior faculty. Well-organized programs offer faculty members the opportunity to learn while conducting their own projects. The institutes offer supervision of clinical trials, a platform to deal with compliance issues and regulatory bodies, data-management, and the essential resources to carry out studies, such as expertise on biostatistics, data collection and management and the writing of grants. The idea behind the creation of this pathway was to establish a “model” that would foster interdisciplinary work, and that would provide the administrative and logistic infrastructure to develop the scientists of the future. These institutes serve as models that can be replicated elsewhere using private and or philanthropic funding sources given the limited availability of federal funds.
Interdisciplinary Model in Health Services Research Faculty engaged in health services research do not usually have the need to acquire expensive equipment to conduct their research. Further, creating a critical mass of faculty and support staff in the field of health services provides the appropriate environment to maximally utilize resources, large databases, explore issues that are ultimately tied together, discuss ideas and create the kind of think tanks that advance the field.
Interdisciplinary Model in Clinical Delivery and Training Programs The concept of interdisciplinary cooperation as developed above is increasingly used in the delivery of clinical services. As Porter and Teisberg [11] have clearly outlined, from the perspective of the consumer of medical services (the patient), an approach that is coordinated around the clinical needs makes more sense than the traditional departmental structure of the academic centers. To that end, the creation of “centers of excellence” or “care-lines” seek to nucleate all the resources needed to treat an illness using providers from separate academic departments as well as other health care providers, social services, nursing, etc. From an administrative point of view, we believe the best structure for these multidisciplinary care points is one that involves the hospital and the academic departments in close cooperation. Resources come primarily from the hospitals in the form of space, equipment, provision of hospital personnel to run the administrative functions related to patient care and, depending on the profitability of the center, financial support for faculty involved. The department should provide faculty and a portion of the administrative oversight. Governance of these centers can be difficult in particular when more than two or three departments are involved. We believe that a “board-like” structure responsible for the management of the particular center is best suited for this model. The composition should reflect the resources used, but will in general involve hospital/department/school interactions. Furthermore, the model of an “oversight” committee (high-level oversight, less frequent meetings, dealing particularly with resource allocation and/ or with strategy) and a “management” committee for the details of the center’s management and for a more granular analysis of performance, goals, etc. has been
762
used effectively. Finally, it is important to designate a “director” of the center/group, an individual with (a) profound knowledge of the disease treated, (b) with active participation in the clinical delivery and (c) with appropriate leadership capacity to run the center. The department needs to keep close track of how many clinical centers are involved and how the centers’ administrative activities are integrated partly or wholly within the department. Transplantation, cardiovascular services, and oncology are programs that fit this model of organization. Each one of these centers frequently develops “focused centers” – for example, the “transplant care-line” develops independent centers in kidney transplantation, liver transplantation, etc. Each center becomes more closely associated and integrated with the management of end stage disease in nephrology and/or hepatology than with the corresponding surgical side (i.e., a kidney transplant surgeon would be working in closer collaboration with the nephrologist than with the surgeon performing liver or lung transplantation). Conceptually, clinical units are contained within larger clinical units. Interdisciplinary and inter-professional cooperation has encouraged joint faculty appointments. In the University of Washington and the UCFS, it is now common to have clinicians with a substantial involvement in research to also have a joint appointment in the basic science department of the school that most closely relates to their research activity. By the same token, the institutions have liberally provided joint appointments for faculty from other schools. At the University of Washington, for example, faculty from the School of Engineering forms the core of the research enterprise in the surgical simulation center. Joint appointments between Surgery and other clinical departments are also common.
60.3.4.2 Interdisciplinary Model in Surgical Education The use of simulation and modeling is becoming the norm and standard in the education of surgical residents, particularly in the early stages. We believe it is important to create a multidisciplinary platform for these centers as they stand to serve the needs of many departments for student and resident training [12]. Many tasks and procedures are common to several specialties. Exercises for which modeling is known to
C. A. Pellegrini et al.
provide the ideal learning platform, such as disclosure of errors, communications, etc., are also common to several disciplines. Finally, team training, an essential part in modern teaching, requires the presence of all members of the team, usually from different departments of the school and from other schools such as nursing. Research in teaching methods, assessment of performance and other aspects of education is best achieved with the integration of expertise that resides in other schools, such as engineering, business and administration. With this in mind, and with the concept of the benefit derived by economies of scale, the University of Washington has created a large institute, accessible to all departments, and with space allocated in each of the hospitals that are part of the large academic medical center structure. The organizational structure of this multidisciplinary training center is based on the model of an Academic Board, with representation from chairs of departments actively involved in the center, representation from leadership of hospitals and from the dean’s office. This academic board meets twice, yearly to define strategy and manages the center through an “executive committee” that is composed of selected key individuals from the board. In turn, this executive board delegates the day-to-day operation to an executive director and a management team. This administrative structure oversees the activities of the center in three large areas: education (including curricular development, assessment of needs, etc.); research (primarily related to the development of devices to use in simulation and/or the validation of curricula); and, a Quality of Practice group. The latter represents a close cooperation with the hospital (main funding source along with the dean’s office) in which specific curricula are developed that tie skill acquisition to safety and quality of practice. For example, based on the need created by the large number of complications resulting from insertion of central lines, this group developed and implemented an elaborate curriculum with a cognitive and a psychomotor skill portion that every resident and attending involved in the placement of central lines must follow. The trainees are then required to reach a certain level of proficiency before being “certified” as capable of placing lines. These centers play an important role, not only in the training but more importantly in the evaluation that an individual trainee has achieved a certain level of proficiency before the completion of the training cycle.
60
Administration of the Academic Department of Surgery
60.4 Surgical Innovation and Partnership with Industry In the world of ever-shrinking federal funding, it has become increasingly important to create partnerships with industry. These partnerships must be created under clear ethical standards and with the safe-guard of stringent conflict of interest policies. The Association of American Medical Colleges (AAMC) has most recently published the basic guidelines that it expects its members to follow [13]. These guidelines call for all medical schools to develop their own in-house policy, and they provide an excellent frame of reference. However difficult the policies might be, the relations with industry are vital to advance education of surgical trainees and for the support of the research enterprise. The advantage is not simply related to the financial support that industry might provide. More importantly, relationship with industry provides access to innovation, new devices and the opportunity to collaborate with talented engineers and developers that work directly with these companies. The value that clinicians bring to the table cannot be underestimated. Not only do they provide a reality check for many industry ideas but, from a larger societal perspective, their input modulates the introduction of new technology in a safer and more efficient way. Physicians who have become intimately familiar with instruments and devices they have helped craft, will be not only the ideal teachers, but the most valuable evaluators of performance of those instruments and devices. If the work is done conscientiously, the academic-industry partnership is also capable of introducing these innovations safely to the larger world of clinical practice by creating training and proctoring programs, and evaluation algorithms that protect the public at large.
60.5 Philanthropy and Fundraising, an Essential Function of a Modern Department Just as partnership with industry is an important requirement for a modern department, philanthropy has become an essential means to advance its mission. Philanthropy is often associated with grateful patients but is increasingly the result of individuals that feel
763
passionately about advancing a certain area of knowledge or who believe firmly in the mission of the department. Philanthropy is all about stewardship and the development of personal relationships. Establishing close personal relationships allows members of the department to get individuals or corporations to become enthusiastic about the academic activities. The department needs to have a well-articulated and agreed upon strategy for fundraising. The chair should maintain an active relationship with the university or school foundation, if they exist, and the department should have a liaison dedicated to its fundraising activities. The chair should create a department fundraising advisory council, made up of influential lay supporters of the department. This council can help the department leadership determine the fundraising goals of the department and its divisions. Coordination of these activities and priority setting is essential to avoid confusion in the minds of potential donors and to formulate a systematic approach. Donor cultivation and recognition should be done in a thoughtful way. A regular newsletter and invitation of donors to departmental functions, where they are appropriately recognized, are effective means of building on-going relationships.
60.6 Ethics, Professionalism, Quality An increasingly important aspect of a modern department is its focus on ethics, professionalism and quality of care delivery. Perhaps as a reflection of the generational changes described above, perhaps of a need of the profession to regain public trust, students, residents and young faculty put substantial emphasis on the ethics of the profession and on the exercise of professional behavior with patients and while interacting with each other. As such, universities have seen resurgence in the exploration of professionalism in all its aspects [14]. Members of the medical profession express an interest in the pursuit of excellence, altruism, accountability, humanism and compassion – the basis of professional behavior. They demand attention to the needs of vulnerable populations, the creation of appropriate safety nets, they believe in their duty to society, and there has been a substantial interest in practice abroad and in learning about issues related to Global Health. This expression on the part of students and residents extends to younger faculty and demands that modern departments of
764
surgery create the appropriate environment to foster these activities and harness the enthusiasm. Interpersonal relationships, respect for each other, and celebration – rather than condemnation – of differences become important aspects of the departments of surgery. The leadership of the department must focus its attention and create the right environment to continue to attract the best and the brightest to its ranks. Specific measures of professionalism should be established and reviewed annually between the faculty and the division chiefs, and between staff and their supervisors. In this way, professional behavior is made part of the faculty member’s promotion package – there is a threshold without which promotion is withheld. Appointment letters should express clearly that professional behavior is expected of every member of the department and lack thereof may result in termination. Quality patient care became the focus of attention of the Institute of Medicine at the turn of the century. The first major publication of a specially designated working group, “To Err is Human” addresses important elements of safety. The publication that followed this very powerful book, “Crossing the Quality Chasm,” seeks to define quality in six parameters and sets quality as a guide in the provision of care [15, 16]. Interestingly, this issue meshes well with the idea of ethics and professionalism described above and was embraced by the younger generations, which have expanded the concept of “quality of care” to quality in all aspects of their work. The administration of the department and its leadership must embrace and deliver an environment conducive to quality in its operations.
60.7 Focus on the Future Generations: Training, Development, Evaluation 60.7.1 Medical Students: Attracting the Best and the Brightest The concepts discussed above allow a surgical department to function as a unit, and as such, to develop the appropriate strategies to recruit the best and the brightest from our student classes. Recognizing that surgery had become less attractive in the last few years, the University of Washington developed specific ways in which we facilitated early access to the students. For example, the Department of Surgery supports the “SIG”
C. A. Pellegrini et al.
(Surgery Interest Group), which is open during the first year of medical school to anyone with an interest in surgery. It is understood that individual interests will evolve over time, and that some students may want to go into nonprocedural activities and that some will join procedural specialties different than those housed in the department of surgery. However, it is important to foster early in their careers this form of informal, extracurricular activity that exposes students to surgeons in general and to specific specialties when they so request. Secondly, several interested surgeons have been designated to become surgical educators, to teach courses and/or run administratively the activities of the first and second year medical student classes. It is through the relationships established with these surgeons that students can be inspired to explore surgery as a career option, can attend operative sessions where, through the use of multiple monitors in the operating room, they can observe, in detail, the procedures performed and the pertinent portions of the anatomy that are involved. This immersive experience allows medical students to participate in operations in real-time and with a greater sense of reality than was possible in the past. In addition, students are encouraged to attend Surgical Grand Rounds and other related didactic activities. The Department assists surgeons who participate in the more traditional third and fourth year student clinical training to learn recruitment strategies to use with these students when they rotate through surgery.
60.7.2 Residency Training Substantial changes have occurred in the educational paradigm of the residency in the last 10 years [17]. First, as discussed above, training in basic psychomotor skills is progressively moving from bedside to the laboratory. The emphasis on patient safety and quality of care has, indeed, resulted in the translocation of the training of basic tasks and skills from the patient to the simulation laboratory. Modern educational methods dictate that these skills be acquired using multimedia to acquire the basic cognitive portions and simulators to acquire the psychomotor skills. These experiences are done under either faculty or technician supervision using strict curricula, which emphasize standardization of processes. Appropriate evaluation and assessment and follow up of the educational process in terms of ultimate outcomes are now the rule in these centers.
60
Administration of the Academic Department of Surgery
Elaborate systems now track the clinical performance and the outcomes of procedures performed by residents and provide information on individual complications, which dictate future learning opportunities. Second, independence of residents in the management of patients has been curtailed by the higher expectations put upon the faculty by patients, payers and academic institutions. The full impact of this decrease in their ability to participate in medical decision-making has not yet been evaluated. Third, there is less time to acquire skills, knowledge and experience given the strict limitation on working hours and an incredible number of restrictive regulations from regulatory agencies. Fourth, emphasis is now placed on the development of a “whole person” with emphasis on excellent interpersonal and communication skills, ability to work in teams and understanding the needs of informed patients to actively participate in their care. Lastly, another substantial change seen in the last decade is that of cross-disciplinary training. The tools and the expertise to fully train residents, and in particular, fellows frequently reside in other departments. Organizing the smooth and appropriate functioning of training programs under this model is a much greater challenge that must be addressed by the administration of the modern department of surgery.
60.7.3 Fellowship Training As a consequence of the slower development of residents described above, there has been a proliferation of fellowships. This phenomenon is the result of several changes described above, including shorter periods of patient contact, much more direct interactions between faculty and patients, and the increasing complexity of the tools with which surgeons are expected to become familiar. Fellowships provide for post-residency training in focused areas, with a limited number of people who are experts in a relatively narrow field. Today, 70–80% of U.S. graduates seek some form of post-residency training, which provides them with the maturation and skills necessary to start academic or private practice from a position of advantage. Fellowships are not part of the basic training and, therefore, the salaries of the fellows are not covered by Medicare, which imposes a substantial burden on the academic departments.
765
60.8 Faculty 60.8.1 Recruitment and Retention Probably, the single most important factor that determines the success of the academic department is the ability to attract and retain highly capable residents and faculty [4, 5]. This principle should be the cornerstone of every aspect of program design, organizational structure and administration of the department. In fact, every aspect of the department contributes to the department’s ability to attract and retain highly capable individuals. Recruitment and retention are so intertwined that it is virtually impossible to discuss them separately. Indeed, retention of faculty who fulfill their aspirations facilitates recruitment of new faculty. Recruitment of new faculty anchors existing faculty in place. Many complex factors contribute to what actually motivates individuals to succeed and to find satisfaction. At the risk of oversimplification, a number of motivational theoretical frameworks suggest the single greatest motivator for highly capable people is their ability to grow personally and professionally [18]. These individuals are motivated by public recognition for personal and professional achievements by individuals and groups whose impressions they value. Therefore, it stands to reason that reward systems for academic surgeons (highly capable individuals) would focus on recognition of personal and professional achievements. There are a number of methods that traditional academic departments use to recognize faculty achievement, ranging from informal methods such as listing achievements in widely read materials such as newsletters to formal methods, the most visible being achievement of promotion in rank. These methods are extremely valuable to the academic department and have sufficient flexibility to adapt to generational and gender differences while maintaining some consistency for programmatic integrity.
60.8.2 Development of Faculty Developing the academic leaders of the future is an important responsibility of an academic department of surgery. Surgical departments face a particularly difficult
766
task in this regard because of the demands posed on surgeons to devote a large amount of time to their practice and because at current incomes, it is difficult to reconcile the need to protect substantial portions of the surgeon’s time with current federal financing of awards such as those financed by NIH for the purpose of career development (K awards). Surgical societies have lately developed a creative model. Through their respective foundations, they provide matching funds that double the federal award for a career development, making it possible for the department to bridge the gap between the pure researcher’s salary as viewed by NIH and the real compensation levels expected by the surgical faculty. The department’s main role is to assure appropriate mentorship for the individual receiving the award and protecting him/her from the rigorous demands of the practice [19]. A very important part of faculty development plan is mentorship. As previously discussed, the new generation of surgeons expects an organized system that offers mentorship. Unlike the traditional models of mentorship when two or more individuals established a relationship that served as the base for mentorship; the current generation expects that the system would have “available” mentors in different areas, mentors that can be tapped for their expertise when needed. Establishing a specific, well-resourced office of faculty development is a great advantage. It can systematically monitor and evaluate all mentorship relationships. Almost every new faculty is paired with several individuals whose expertise lies in teaching, in a specific aspect of research, grant-writing or other aspects important to the faculty member. Over time, some of these relationships solidify and become real pillars in the retention process. Similarly, many schools and some departments offer specific, usually relatively short and intensive courses on leadership, communication, effective teaching and other skills necessary to enhance the chance of a surgeon to develop the “total person” and preparing the leaders of tomorrow.
60.8.3 Balancing Faculty Efforts The faculty of the modern department of surgery faces unprecedented demands on their time. Indeed, only a decade or two ago, the development of an academic surgeon required simply that a portion of the time be protected to do research. It was assumed that most surgeons entered their careers with less knowledge of
C. A. Pellegrini et al.
research and that preserving some protected research time would suffice to expose them to their mentors and show them the way to conduct successful research. This approach did not appreciate the needs of rigorous research training. In addition, the complexity of clinical care, the advances in technology and the continuous evolution of treatment methods, clinical regulatory environment and information systems that support clinical care demand a different approach today. From the start, it is important that realistic goals be determined by the leadership and the surgeon to appropriately apportion the time. In today’s world of increasingly complex tools and devices, surgeons that wish to develop an area of clinic service delivery to its fullest will need “protected” time to train and to practice. At the highest levels of complexity, it is impossible to maintain expertise without almost constant exposure to practice. This has occurred in a world where research, be it in basic science, in health service, or in the area of translation and clinical trials, has become much harder for the surgeon as well. Educational tools and elaborate modeling and simulation scenarios are not only demanding of time but also the development of the appropriate expertise to measure results of interventions and to assess for adequate performance. If one adds to this scenario, the need to attend meetings and to participate in groups with specific focus on the faculty member’s area of expertise, the time demands become a serious issue and one that requires attention and diligent management. The administration and the leadership have to provide the “total” environment to allow those goals to be reached. Growth and development in all these areas comes at a substantial price as well, since in addition to the local responsibilities, faculty will acquire national and international duties. A number of tools are now available to modern departments to organize work flow in all three legs of the academic stool, and a new science in management has to be applied to assure the success of these ventures. This is especially evident in the field of vascular surgery where many have had to develop an entire new set of procedures and tools in order to keep up with the practice of medicine.
60.8.4 Faculty Compensation A great deal of controversy exists about the role of compensation as a motivator. Motivation theorists do not put compensation high on the spectrum of motivating
60
Administration of the Academic Department of Surgery
767
factors. Others believe compensation is important and should be considered carefully when designing evaluation and reward systems. Traditionally, academic surgical departments have designed a variable pay component into their faculty compensation system. Two very important principles in developing faculty compensation plans are: (a) total transparency; and (b) compensation that is both performance and incentive based and recognizes departmental strategic goals and university rank and step. It is worth mentioning that the number of financial incentive compensation systems are as many as the individual academic departments that they aim to serve, academic departments, and nearly every health care consulting firm in the U.S. offers expertise in designing and redesigning faculty and private physician compensation systems. Clearly, there is not a “holy grail” with a perfect formula that perfectly aligns evaluation and incentives with ideal performance. Paradoxically, the majority of these systems are designed to financially reward academic faculty for their contribution to the academic mission through their personal clinical production. Although this type of compensation seems similar to that of private sector physician compensation, those who favor it note that, for most departments, the major source of financial reward comes from physicians’ fees. In the United States, productivity is most often measured by some component of fees, cash, or relative value units (units of work). In the past 15 years, there has been a growing trend to design variable pay systems that weight contributions to research, education, and administration and leadership in addition to clinical production as factors that determine some component of variable pay. The problem with most of those systems is that monies that will be used to sustain the variable portion of the pay are almost entirely generated by clinical work and thus, those responsible for its generation usually oppose a distribution that rewards other activities. Furthermore, metrics for these other activities are harder to come by than simple distributions based on clinical work load. One of the most difficult decisions that must be considered carefully when designing a compensation system is whether to design it based on individual performance and reward or on group performance and reward. Both models pose serious unintended consequences. Individual reward systems generate competition among partners. Group reward systems allow for average performers to obtain the same reward as high performers. Probably, the most frustrating system is one where an individual is measured on individual performance but his/her reward
is based on group performance. For example, an individual may be highly clinically productive and contribute financially and educationally to his/her division. However, the entire division may not produce sufficiently to generate funds to distribute. In this scenario, the faculty has achieved what is required, but due to factors for which she/he has no control, she/he will not receive a reward. Care must be taken to mitigate the unintended consequences of having an incentive system that competes with the realities of academic faculty development. For example, the emerging generation of academic faculty, as part of the Millennial Generation tends to consider their professional life as a means to an end, not the end itself. They earn money to spend, and they, along with the Generation X before them, are highly motivated toward a balance of personal and professional life. Therefore, effective incentive and evaluation systems will of necessity integrate these generational considerations into the program design. Additionally, evaluation systems should be considered an integral part of the incentive system; what is measured is what is desired; therefore, recognition of successful measures are a means to motivate. Finally, incentive systems need to be designed to enhance or to at least limit negative impact on intrinsic motivators, self determination and personal growth. Regardless, when designing a faculty compensation system, the modern academic department should take the following into consideration: Obtain input from all generations and both genders of faculty. This is the best way to ensure faculty “buyin” by giving faculty the opportunity to participate in determining the outcome of their own compensation and by creating transparency with regard to the benefits and limitations of the chosen compensation system. Relate compensation plan goals to the department and institution strategic goals. Establish explicit desired performance. Identify explicit methods of measurement. Build transparency and direct cause/effect relationship between performance and reward. Decide on individual, group or a blend of both models, and ensure the performance and rewards use the same model. Ensure there is a meaningful and complimentary relationship between the performance evaluation system, promotion criteria and compensation plan. Build into the system, a method to reliably report progress and anticipated reward to the faculty.
768
60.8.5 Role of Promotion in Incentive System One of the strongest incentive systems in academic medicine is the criteria for appointment and rank promotion. Junior faculty are motivated to hone their clinical and teaching skills, obtain extramural funding and publish to not only add unique value to the pool of knowledge in their particular field for their professional accomplishment, but also to retain their jobs. In this way, promotion criteria are a critical “carrot and stick” incentive system. Promotion criteria need to be regularly, critically evaluated to ensure alignment with the academic systems goals, and to recognize and mitigate the unintended consequences when alignment is challenged. For example, promotion criteria often include a requirement to establish recognition for contributions to a specialized field of knowledge and for participation in leadership roles in societies and associations on a regional, national and international level. Of necessity, this requires faculty to commit significant time away from their institution. Conflict arises as the faculty member also must establish his/her clinical practice and skills, and contribute to medical education programs in his/her home institution.
60.8.6 Metrics of Academic Productivity Competing priorities create conflict for the academic faculty member who develops a research program while juggling the time demands of clinical practice. It is increasingly difficult to establish an individual academician whose professional profile includes all three legs of the academic stool. Different institutions have differing ways of determining the relative values of contribution in the three components of the academic mission. The Department of Surgery at the University of Washington thoughtfully revamped the promotion criteria for surgical faculty into a single track. The criteria was designed to recognize the significant contributions of individuals whose primary strength is in advancing any one of the major segments of an academic surgeons professional portfolio – clinical, teaching, research and administration. In fact, the criteria recognize the essential integration of these segments. We have then analyzed the opportunities brought by the same changes and described the way that our institutions have addressed these challenges Scholarship in
C. A. Pellegrini et al.
research is viewed by the department in its broadest sense to include not only research in basic science but also research in health services, outcomes, communications, leadership, behavioral science, teaching and all other matters that are pertinent to the development and functioning of modern surgery. Scholarship is generally measured by the approbation of peers in journal publication, grant funding from outside agencies, and in books, monographs or electronic publications. Clinical productivity is determined on the basis of Work Relative Value Units (WRVUs) and their relationship to published benchmarks. Every year, each faculty member discusses with her/his division chief the number of WRVUs produced the prior year, compares them to the benchmark and discusses them in the context of current and future hospital operations. A number of WRVUs is then set as a target for the next year taking into consideration all other plans of the division and the faculty member in question as well as a number of environmental issues, and the number set is monitored quarterly by the division chief and the administrator of the division to adjust as needed. Teaching efforts and results are determined jointly using the plans presented at the annual meeting with the division chief, the evaluations of the students and residents and the type of activities and hours spent on the effort. Faculty with primary dedication to teaching create portfolios of activities, evaluations from peers, students and supervisors in our or in other departments and use those as evidence of their achievement. Much less emphasis is placed on administration – unless the faculty member occupies an important position (such as Program Director of a Residency, Student Coordinator, etc.). Similarly, substantially less importance is given to their role in national and international societies.
60.9 Conclusions In this chapter, we have depicted the many challenges faced by modern departments of surgery. Those challenges are related to the rapid pace of scientific and technologic advancement, to profound sociopolitical changes of the last few years, to the financial pressures and to many workforce related issues. We have then analyzed, the opportunities brought by the same changes and described the way our institutions have addressed these challenges. We have placed special
60
Administration of the Academic Department of Surgery
769
emphasis on the development of the interdisciplinary model, on the type of leadership required to run a modern department and on ways in which the faculty and staff can be motivated to respond positively to the constant change.
10. Goleman D, Boyatzis R, McKee B (2002) Primal leadership: realizing the power of emotional intelligence. Harvard Business School, Boston, MA 11. Porter ME, Teisberg EO (2006) Redefining health care: creating value-based competition on results. Harvard Business School, Boston, MA 12. Sachdeva AK, Pellegrini CA, Johnson KA (2008) Support for simulation-based surgical education through American College of Surgeons–accredited education institutes. World J Surg 32:196–207 13. Association of American Medical Colleges (AAMC) (2008) Protecting patients, preserving integrity, advancing health: accelerating the implementation of COI policies in human subjects research. Available at: https://services.aamc.org/ publications/index.cfm?fuseaction = Product.displayForm& prd_id = 220&cfid = 1&cftoken = DAE53F39-AAFC-341F4DCBEBFFE3AC4B42 14. Fryer-Edwards K, Van Eaton E, Goldstein EA et al (2007) Overcoming institutional challenges through continuous professionalism improvement: the University of Washington experience. Acad Med 82:1073–1078 15. Institute of Medicine, Committee on Quality of Health Care in America, Kohn LT et al (2000) To err is human: building a safer health system. Institute of Medicine, National Academies Press, Washington, DC 16. Institute of Medicine, Committee on Quality of Health Care in America (2001) Crossing the quality chasm: a new health system for the 21st century. National Academies Press, Washington, DC 17. Pellegrini CA (2006) Surgical education in the United States: navigating the white waters. Ann Surg 244:335–342 18. Hersey PH, Blanchard KH, Johnson DE (2008) Management of organizational behavior: leading human resources, 9th edn. Prentice Hall, Upper Saddle River, NJ 19. Souba WW (1999) Mentoring young academic surgeons, our most precious asset. J Surg Res 82:113–120
References 1. Reeves TC (2008) Do generational differences matter in instructional design? Available at: http://it.coe.uga.edu/itforum/Paper104/ReevesITForumJan08.pdf 2. Borman KR, Vick LR, Biester TW et al (2008) Changing demographics of residents choosing fellowships: longterm data from the American Board of Surgery. J Am Coll Surg 206:782–788; discussion 788–789 3. Association of American Medical Colleges (AAMC) (2006) Diversity in the physician workforce: facts & figures. Available at: https://services.aamc.org/Publications/index.cfm?fuseaction = Product.displayForm&prd_id = 161&prv_id = 191&cfid = 1 4. Collins J (2001) Good to great. Harper Collins, New York 5. Collins J, Porras JI (2002) Built to last: successful habits of visionary companies. HarperBusiness, HarperCollins, New York 6. Collins J (2005) Good to great and the social sectors. HarperBusiness, HarperCollins, New York 7. Rikkers LF (2004) Presidential address: surgical leadership– lessons learned. Surgery 136:717–724 8. Souba WW (2004) The tough work of leadership. In: Ziegenfuss J, Sassani J (eds) Portable health administration. Academic Press, San Diego, pp 65–87 9. Souba WW (2003) Academic medicine’s core values: what do they mean? J Surg Res 115:171–173
Information Transfer and Communication in Surgery: A Need for Improvement
61
Kamal Nagpal and Krishna Moorthy
Contents 61.1
Communication as a Pivotal Factor for Surgical Safety .................................................. 771
61.2
Models of Communication .................................... 772
61.3
Communication Errors: Culprit of Major Disasters in High-Risk Industries ......................... 774
61.4
Strategies to Improve Communication in High Risk-Industries: Can It Be Adapted in Surgical Care? .................................................... 774
61.5
Communication Failures & Medical Errors ........ 775
61.6
Assessing Communication & Identifying Communication Failures in Surgical Care........... 776
61.6.1 61.6.2 61.6.3 61.6.4
Operating Theatre Communication .......................... Postoperative Handover Communication ................. Shift Handover Communication............................... Clinical Units Communication.................................
61.7
Impact of Information Transfer and Communication on Outcomes ....................... 777
61.8
Interventions Used to Improve Communication . 777
776 776 776 777
61.8.1 Standardizing the ITC Process ................................. 777 61.8.2 Changing Teams ....................................................... 778 61.8.3 Technology Innovations ........................................... 778 61.9
Conclusion............................................................... 779
References ........................................................................... 779
K. Nagpal () Department of Biosurgery and Surgical Technology, Imperial College London, 10th Floor QEQM Building, St. Mary’s Hospital, Praed Street, London W2 1NY, UK e-mail: [email protected]
Abstract This chapter outlines the role of information transfer and communication (ITC) in surgical safety. It reviews the communication practices in the entire continuum of surgical care and recommends the strategies, which can be adapted from high-risk industries, to improve communication in surgery. Finally, this chapter looks into various interventions available to enhance ITC, thereby ensuring surgical safety.
61.1 Communication as a Pivotal Factor for Surgical Safety Traditionally, surgical outcomes have primarily focused on the role of patient pathophysiological risk factors and the skills of individual surgeon. However, this approach neglects a wide range of factors that have been found to be important in achieving safe and high-quality performance in other high-risk environments. It has been recently suggested [1] that whole operation profile consisting of full range of factors such as teamwork, inter-professional communication, organizational culture and the work environment play a role in determining surgical outcomes (Fig. 61.1). Clearly, patient factors are important, but an approach that fails to acknowledge the role of these factors will do little to reduce the morbidity and mortality. Of all the system factors, communication is perhaps most significant, both as a skill in itself and because effective communication is integral to the success of all other factors highlighted in a systems model of safety. Moreover, the exchange of information is a core characteristic of other non-technical skills such as decision-making, situational awareness, teamwork, leadership and stress management, which are essential for safe and effective performance. Communication is fundamental to workplace
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_61, © Springer-Verlag Berlin Heidelberg 2010
771
K. Nagpal and K. Moorthy
Fig. 61.1 Systems approach to surgical outcome [1]
Patient Risk Factors
efficiency and safety, and co-ordination between humans is clearly not possible without effective communication. It also plays a vital role in ensuring the successful completion of tasks and provides knowledge, builds relationship, establishes predictable behaviour patterns, maintains attention to the task and is also a management tool. The importance of communication for effective performance, reducing errors and improving surgical safety cannot be overemphasized.
61.2 Models of Communication Communication can be classified in various ways: 1. One-way or two-way communication – This is the most commonly used model. − One-way communication – It involves three components: sender, transmission and receiver (Fig. 61.2). The information conveyed by the sender is transmitted to one or more receivers. Examples include email, voicemail or television. Advantages include that it is simple, rapid and the sender feels in control
OPERATION PROFILE
772
Surgical team Procedures Operative events Communication Technical Skills Team Performance Decisionmaking Operative Environment
Surgical Outcome
while the shortcomings are that there is no feedback, responsibility lies with the sender and, to be effective, the receiver has to be attentive. − Two-way communication – Apart from three components, it has another element called feedback, which closes the communication loop (Fig. 61.3). Two-way communication involves sender transmitting information to receiver who then responds and in turn becomes the sender and transmits information back to receiver. It occurs during radio conversations, telephone calls or conversations. Although one-way communication is faster and more efficient, two-way communication is more accurate, reliable and effective, permits checking and correction of details and both sender and receiver have responsibility. In contrast to one-way communication, it generally takes longer time, and receiver also has to communicate in return. The key difference between one-way and two-way communication is the role of feedback. Feedback ensures that both sender and receiver are on the same page. It also closes the communication loop and is the
Transmission
Source
Fig. 61.2 One-way communication
Receiver
61
Information Transfer and Communication in Surgery: A Need for Improvement
Fig. 61.3 Two-way communication
773
Transmission
Source
Receiver
Feedback
simplest way of preventing any misinterpretation at the receivers’ end. 2. Verbal/Written − Verbal communication is both social and functional. It is generally accompanied with nonverbal cues that are important. Mehrabian [2, 3] carried out a series of studies based on situations in which there was ambiguity between the spoken word and non-verbal cues such as posture, facial expression, etc. They demonstrated that amount of attention the receiver pays to words is 7%, while 55 and 38% of the attention is paid to non-verbal cues and tone, respectively. This implies that non-verbal signals are at least as important as tone and words. − Written – This is the most important communication used in workplace. It is frequently electronic and is open to misunderstanding and misinterpretation, so care must be taken to ensure that it is clear, precise and informative. West [4] suggested that richness of transfer of information is determined by the medium of information exchange (Fig. 61.4). He commented that information that is transferred during face-to-face conversation is extremely rich as this allows both verbal and non-verbal communication to take place.
Least rich information Written
Slightly rich Telephone conversations
Most rich Face- to -face
Fig. 61.4 Richness of information according to the mode of transfer [4]
774
K. Nagpal and K. Moorthy
3. Who/Why/What/how model – This model divides communication into four components: − − − −
What – the type of information to be communicated How – the means to transfer the information Why – the reason for information transfer Who – the person to whom the information is being transferred
61.3 Communication Errors: Culprit of Major Disasters in High-Risk Industries Many accident analyses cite miscommunication as being the primary contributory factor. There are some known published investigations into accidents/incidents where failure of communication at shift handover was held to have been a contributory causal factor. These were major accidents/incidents resulting in actual or potential loss of life, major property damage and/or environmental impact. These highly publicized incidents form the tip of an iceberg of numerous unpublished lost production incidents or near-misses caused by failures of communication. • Sellafield beach incident – In November 1983, highly radioactive waste liquor was accidentally discharged in the sea as a result of failure of communication between shifts. • Piper alpha disaster – In the North Sea, situated 110 miles northeast of Aberdeen, Scotland, Piper Alpha platform had an explosion on its production deck on 6 July 1988. Of 266 persons on board, only 61 survived. Of many factors, failure of transmission of relevant information at the handover was one of the main contributory factors for the accident. • Tenerife air crash – The famous accident was caused by the miscommunication between the air traffic control tower and the pilot. • Scandinavian star fire – This passenger ship caught fire because of lack of communication between rescue coordination centre and the passenger ship. These incidents demonstrate the consequences of overreliance on a single means of communication, failure to consider information needs of others and an increased potential for misunderstanding when people hold
differing mental models. In these incidents, written communication failed as the intended message was misunderstood or simply not communicated. Reason [5] suggested that organizational accidents, which happen due to communication problems, can be categorized as: • System failures – in which necessary channels of communication do not exist or are not functioning. • Message failures – in which channels exist but information is not transmitted. • Reception failures – in which channels exist and information is transmitted but either there is a delay in arrival of information or it is misinterpreted by the receiver.
61.4 Strategies to Improve Communication in High Risk-Industries: Can It Be Adapted in Surgical Care? Effective communication underpins nearly all nontechnical skills. Commercial, political and humanitarian pressures have compelled high-risk industries like aviation, nuclear power and the chemical industry to raise their standards and make sustained efforts to improve and maintain patient safety. Although caution is advised while drawing parallels between healthcare and these industries, they are, for most part, quite comparable to healthcare. Therefore, we might consider adapting specific industrial techniques to make the process safer. Drawing from research carried out in a range of high reliability industries, it is thereby possible to make a number of recommendations for improving communication. • The intended communication must first be encoded and physically transmitted in the form of a signal, which may be written, spoken or gestured. The message should not be buried in irrelevant, unwanted information or “noise”. • The introduction of redundancy to communication reduces the risk of erroneous transmission. Information should be transferred via more than one medium. e.g. – verbal and one other method (written, electronic) [6]. • Feedback increases accuracy of information. Two-way communication is suggested with both participants
61
•
•
•
•
•
Information Transfer and Communication in Surgery: A Need for Improvement
taking responsibility for achieving accurate communication [7]. Effective communication can be aided by qualitative aspects of speech such as assessments of comprehension, confidence, competence and fluency. Verbal face-to-face communication is desirable [8]. Key information needs to be specified and presented, and efforts should be made to reduce ambiguity and to exclude irrelevant information. Natural language is inherently ambiguous. A shared mental model facilitates successful communication. Miscommunications and misunderstanding are most likely to occur when mental models held by sender and receiver differ widely [9]. Written communication is facilitated by design, which considers the information needs of the user, supports the communication task and demands inclusion of relevant categories/types of information [10]. People and organizations frequently refer to communication as unproblematic, implying successful communication is easy and requires little effort. Over-confidence and complacency are common. Effort needs to be expended by organizations to address complacency by emphasizing the potential for miscommunication and its possible consequences, setting standards for effective communication and developing the communication skills of organizational members.
775
61.5 Communication Failures & Medical Errors The role of communication in clinical practice has been appreciated for a number of years. A report, 25 years ago, suggested that 15% of human errors was attributable to poor communication [11]. Subsequently, over the last decade, The Harvard Medical Practice Study [12], The Quality in Australian Health Care Study [13], and Institute of Medicine report [14] all revealed that ineffective communication was an underlying factor in most medical errors. Researchers in family practice [15], emergency medicine [16–18], anaesthesia [19] and the intensive care unit (ICU) [20] all make pleas for better team communication. In one study of all root cause analyses submitted to the Joint Commission on Accreditation of Healthcare Organizations (JCAHO), communication was identified as the most common root cause of wrongsite surgery events [21] (Fig. 61.5). The analysis of adverse events and incidents in the two phases of surgical care i.e. preoperative evaluation [22] and recovery room [23] also revealed communication failures as one of the leading causes of these problems. In two separate studies analyzing surgical errors [24, 25], communication breakdown was found to be one of the main factors attributed to errors. These studies highlight that communication failures are the principal contributory factors to
Root Causes of Sentinel Events
(All categories; 1995–2004) Communication Orientation/training Patient assessment Staffing Availability of info Competency/credentialing Procedural compliance Environ.safety/security Leadership Continuum of care Care planning
Fig. 61.5 Sentinel event statistics www.jcaho.org
Percent of 2966 events
Organization culture 0
10
20
30
40
50
60
70
80
90
100
776
adverse events in surgery. In fact, they have been shown to be the leading causes of inadvertent patient harm.
61.6 Assessing Communication & Identifying Communication Failures in Surgical Care Although serious communication breakdowns occur across the continuum of surgical care, most of the studies have assessed communication in the operating rooms (ORs).
61.6.1 Operating Theatre Communication Listening, clear accurate speech, courteous behaviour and acknowledging requests were identified as key skills for effective communication in OR in a focus group interview. The study by Nestel et al. [26] identified the absence of basic interpersonal skills and appreciation and respect for different professional roles as causes for compromised communication. The authors also concluded that current practice seems to be based on making assumptions, and there appears to be little or no opportunity to check or clarify assumptions resulting in an unsatisfactory state of communication. This could be due to different perceptions of various disciplines. In a qualitative study[27] on briefing in OR, majority of nurses believed that a briefing would contribute immensely to safety of patients undergoing surgery but surgeons perceived the provision of information function negatively, seeing it as a drain on their time for the purpose of “telling people what they should already know”. In an exploratory study [28] to identify communication patterns, OR charge nurse communication was observed, and the authors found that most (69.2%) of the communication episodes occurred face to face. Coordinating equipment followed by coordinating patient preparedness was the most frequent purpose of communication identified. The authors suggested that automating aspects of preparing patients for surgery has the potential to reduce information exchange. Lingard et al. [29] in a prospective observation study highlighted that patterns of communication in OR are complex and socially motivated. High-tension events
K. Nagpal and K. Moorthy
occur during every procedure, which has a ripple effect, especially on trainees. The same group in their next ethnographic study [30] identified that communication failures in OR occurred in 30% of the team exchanges. Although there was no visible effect in most deemed communication failures, a third of them resulted in situations, which potentially jeopardized patient safety.
61.6.2 Postoperative Handover Communication There is a substantial body of research on nurse-tonurse handovers and some recent interest in hand-over between doctors, but little work exploring interprofessional handover. In an observation study on postoperative handovers [31], anaesthetists and nurses often had different expectations of the content and timing of information transfer. In this prospective observational qualitative study, communication was found to be largely informal during the postoperative handover in the recovery room. Moreover, the transfer of information did not automatically led to transfer of professional responsibility for the patient. This unstructured, variable communication process was again highlighted in a survey study [32] on handover from theatre to the Post-anaesthetic care unit (PACU) where only 32.6% of anaesthetists attained maximum scores for the quality of verbal information. Of the five required points of verbal information, 14% of anaesthetists failed to give any information. Information regarding pre-operative status, premedication and the surgical procedure was given in 40, 36 and 21% of cases.
61.6.3 Shift Handover Communication There is a perceived belief that continuity of care is challenged by the new working time directives across both sides of Atlantic. Although important, literature in the area of shift handovers is sparse and there have been very few articles that specifically deal with the signing out of inpatients within a surgical service. Few studies [33, 34] have expressed concerns regarding the current practices of patient handover between shifts. In a questionnaire study [34], in which ninety general surgical trainees were asked to identify the current shift handover practice, only
61
Information Transfer and Communication in Surgery: A Need for Improvement
47% of the trainees believed that current handover practices were adequate to ensure patient safety. Overall satisfaction with the handover process was 1.57 out of 3. Only thirteen percent reported that they have received formal training on good handover. In another trainee’s assessment of handover within a burns unit [33], mean satisfaction score was 3.8 out of 5. In this telephonic questionnaire survey, 86.7% of trainees judged their current handover practice as safe. As in the previous study, only ten percent had received a formal training on handovers. These studies highlight the room for improvement in handover practice, including more training and greater involvement of integrated multidisciplinary team.
61.6.4 Clinical Units Communication Communication within various clinical units is variable and team members have diverging perceptions of communication. In a cross sectional survey [35] of 48 ICU doctors and 136 nurses, nurses reported lower levels of communication openness between nurses and doctors. Compared with senior doctors, trainee doctors also reported lower levels of communication openness. Furthermore, the analysis revealed that open communication was a predictor of understanding patient care goals. Mills et al. [36] used Medical Team Training questionnaire to study the perception of communication among clinicians and nurses in clinical units and observed that nurses and anaesthesia providers perceive communication significantly different from surgeons. In addition, surgeons rate communication and teamwork more favourably than the other two groups. Although, these studies clearly show the differences in the perception of communication among various healthcare professionals, none of the studies compared the subjective perception with objective patient outcomes. These differences in perception related to communication create an opportunity for an error and highlight the need for an improvement and standardization of communication. Greenberg et al. [37], in a study of malpractice cases, summarized that communication breakdowns are not unique to OR but happen across the continuum of care and are equally likely to occur during the pre- and postoperative care as during the intraoperative course. Most of failures identified by them were verbal and occurred between a single transmitter and single receiver. In most cases, information was either never transmitted or was
777
communicated but inaccurately received. Status asymmetry and ambiguity about responsibilities were the common associated factors identified to be responsible for these communication breakdowns.
61.7 Impact of Information Transfer and Communication on Outcomes Communication lapses are known to be significant contributors to adverse events in surgical patients. A multi-institutional study conducted by Williams et al. [38] uncovered 328 communication incident reports through focus group sessions at five medical centres. They identified four main contributory factors for information transfer and communication (ITC) failures: blurred boundaries of responsibility, decreased surgeon familiarity with patients, diversion of surgeon attention and distorted communication. Furthermore, one-third of these incidents led to serious adverse events. Another multi-centre questionnaire study [39] conducted in 25 ICUs in Michigan state, USA revealed that nurse-physician communication was a predictor of nurse medication errors. Various studies [40–42] discussing the improvement of ITC have demonstrated improvement in clinical processes and patient outcomes. These studies, while improving communication showed a decrease in the number of cancelled surgical procedures [43], decrease in length of stay [42], reduction in morbidity and mortality [41], delays in OR [44] and increase in number of patients who received antibiotic and DVT prophylaxis [40].
61.8 Interventions Used to Improve Communication 61.8.1 Standardizing the ITC Process Recently, a multicentre global study [41] on the effectiveness of the surgical safety/WHO checklist in OR showed that there was a significant improvement in the use of safety processes with the implementation of the safety checklist. These processes consisted of oral confirmation by surgical teams of the completion of basic
778
steps for ensuring safe delivery of anaesthesia, antibiotic prophylaxis, effective teamwork and other safety practices in surgery. This improvement was subsequently translated into a significant reduction in morbidity and mortality rates. In another tertiary centre pre-post intervention study [44], there was a 19% reduction in communication breakdowns with the use of preoperative briefing. After testing the feasibility of the preoperative checklist, a Canadian group [45] conducted a pre-post intervention study evaluating once again the benefit of preoperative checklist and briefing. They observed that the intervention reduced communication failures by threefold among OR team members, from a mean of 3.95 to a mean of 1.31 failures per procedure. Standardization was referred to variably by different studies as checklists [41, 45], protocols, communication sheets, briefing [46], daily goals form [42, 47] and ward round proforma [48]. Using the same principles of standardizing and structuring of the ITC process, many centres [42, 47–49] have shown a significant improvement in ITC process by using daily goals forms and post-take ward round proformas. Few of these studies [41, 42, 50] also managed to translate their ITC improvement to outcomes like length of stay.
61.8.2 Changing Teams Transforming teams through modifying structure, behaviours, attitudes and beliefs have been shown to improve ITC practices. Moreover, improvements in team behaviour are an essential part for sustainability of change. Catchpole et al. [51] in their prospective intervention study on paediatric handovers demonstrated a reduction in information omission after the implementation of a new handover protocol, which was adapted from aviation and the Formula One industry. The authors concluded that the development of a handover protocol using expertise from high-risk industries improved information transfer with no deleterious effects on the duration of handover. Awad et al. [40] used crew resource management principles from aviation industry in OR to demonstrate an improvement in interdisciplinary communication among OR team members. Few other studies [50, 52] have also demonstrated improved communication after changing the behaviours and structure of the teams. A study in ICU [52] enhanced the representation of
K. Nagpal and K. Moorthy
nurses in the rounds by introducing the nurse presentation in the rounds and found a 26% increase in communication among the healthcare professionals. In another study [50], restructuring of the patient care teams for general surgery patients admitted to the hospital improved team communication, and eventually led to a significant decreased length of stay for patients. Dodek et al. [53] in their study again emphasized the role of team structure in enhancing communication. They demonstrated increased communication by introducing an explicit approach to ICU rounds that included a clear sequence of reporting assessment and plans.
61.8.3 Technology Innovations The search for new technology to improve ITC process has been extensive. In a study [54] to evaluate the efficacy of hands free voice over internet protocol (VOIP) in the perioperative environment, OR providers responded to communication queries four times faster when using VOIP compared with alphanumeric pagers; however, the authors expressed concern about the issue of security and confidentiality of the patient data. Other examples of technological inventions for enhancing communication are patient tracker and telerounding. Patient trackers improved communication among residents, nurses and consultants to escalate the discharge process while telerounding made the patient–physician communication better. Another study [55] supporting technology found that cellular phone use by anaesthesiologists was associated with a reduction in the risk of medical error resulting from communication delay. Structuring communication via technology has also been shown to increase the team efficiency in OR. In a simulated environment, Webster et al. [56] demonstrated that teams with scripted speech or automated information display interfaces performed significantly faster than teams with no rules. Information technology has also been utilized to improve residents’ shift handovers. Van Eaton et al. [57] analyzed the shift handovers and subsequently designed the user-driven computerized resident sign out system, which was feasible, powerful and was immediately accepted by the residents. In a further randomized controlled trial [58], they demonstrated that this system enhanced patient safety and efficiency
61
Information Transfer and Communication in Surgery: A Need for Improvement
by decreasing patients missed on resident rounds, and resident round completion times decreased by upto 3 h per week. Integrating these technologies into clinical practice may represent a window into future patient care with increased patient and clinician satisfaction; however, it would be important to ensure secure control of patient level medical information before their widespread adoption. Given the importance of having effective interdisciplinary communication and information transfer, attention should be given to the ways in which these strategies are implemented. Some of the principles that would contribute to the successful implementation of change include teams’ commitment, opinions during the development process, ongoing consultation process with clinical teams, receiving and providing continuous feedback, perceptions of teams both before and during an intervention providing critical insight into opportunities and obstacles, which are essential for tailoring the intervention to the context. This can be challenging; however, in-depth stakeholder work to understand and surmount any cultural barriers should be undertaken early to overcome any hurdles.
61.9 Conclusion Communication is a vital component of the systems approach to surgical safety and is integral to other non-technical skills such as teamwork, situation awareness and decision making. Communication failures have been uncovered as a leading cause of surgical errors. High-risk industries have appreciated the importance of effective communication and have undertaken various steps to enhance ITC. Although, effective communication is also one of the key components for surgical safety and surgical outcome, the surgical community has failed to acknowledge the need to improve ITC practices in surgical care. Most of the studies looking at either exploring, assessing or improving communication have revealed that ITC practices in surgical care are unstructured, variable and not transparent. There is an imminent need to standardize and structure the ITC process. Apart from implementing the change, attention should be given to the ways by which these changes can be made sustainable.
779
References 1. Vincent C, Moorthy K, Sarker SK et al (2004) Systems approaches to surgical quality and safety: from concept to measurement. Ann Surg 239:475–482 2. Mehrabian A, Ferris (1967) Inference of attitudes from nonverbal communication in two channels. J Consult Pschyol 31:248–252 3. Mehrabian A, Wiener, M. (1967) Decoding of inconsistent communications. J Pers Soc Psychol 6:109–114 4. West MA (2004) Effective teamwork.practical lessons from organisational research, 2nd edn. BPS Blackwell, Leicester 5. Reason J (1997) Managing organisational accidents. Ashgate, Aldershot 6. Bellamy LJ (1984) Not waving but drowning: problems of human communication in the design of safe systems. Institution of Chemical engineers symposium no 90, pp 167–177 7. Leavitt HJ, Mueller R (1962) Some effects of feedback on communication. In: Hare AP, Borgatta EF, Bales RF (eds) Small groups: studies in social interaction. Knopf, New York 8. Hopkins DV (1980) The measurement of the air traffic controller. Hum Factors 22:547–560 9. Reddy M (1979) The conduit metaphor – a case of frame conflict in our language about language. Cambridge University Press, Cambridge 10. Miller RB (1984) Transaction structures and format in form design. Wiley, Chichester 11. Abramson NS, Wald KS, Grenvik AN et al (1980) Adverse occurrences in intensive care units. JAMA 244:1582–1584 12. Brennan TA, Leape LL, Laird NM et al (1991) Incidence of adverse events and negligence in hospitalized patients. Results of the Harvard Medical Practice Study I. N Engl J Med 324:370–376 13. Wilson RM, Runciman WB, Gibberd RW et al (1995) The quality in Australian Health Care Study. Med J Aust 163: 458–471 14. Kohn L (2000) To err is human: an interview with the Institute of Medicine’s Linda Kohn. Jt Comm J Qual Improv 26:227–234 15. Bhasale AL, Miller GC, Reid SE et al (1998) Analysing potential harm in Australian general practice: an incidentmonitoring study. Med J Aust 169:73–76 16. Adams JG, Bohan JS (2000) System contributions to error. Acad Emerg Med 7:1189–1193 17. Risser DT, Rice MM, Salisbury ML et al (1999) The potential for improved teamwork to reduce medical errors in the emergency department. The MedTeams Research Consortium. Ann Emerg Med 34:373–383 18. Williams KA, Rose WD, Simon R (1999) Teamwork in emergency medical services. Air Med J 18:149–153 19. Runciman WB, Sellen A, Webb RK et al (1993) The Australian Incident Monitoring Study. Errors, incidents and accidents in anaesthetic practice. Anaesth Intensive Care 21:506–519 20. Beckmann U, West LF, Groombridge GJ et al (1996) The Australian Incident Monitoring Study in intensive care: AIMS-ICU. The development and evaluation of an incident reporting system in intensive care. Anaesth Intensive Care 24:314–319
780 21. (1995–2004) Sentinel event analysis statistics. Available at: http://www.jointcomission.org 22. Kluger MT, Tham EJ, Coleman NA et al (2000) Inadequate pre-operative evaluation and preparation: a review of 197 reports from the Australian incident monitoring study. Anaesthesia 55:1173–1178 23. Kluger MT, Bullock MF (2002) Recovery room incidents: a review of 419 reports from the Anaesthetic Incident Monitoring Study (AIMS). Anaesthesia 57:1060–1066 24. Christian CK, Gustafson ML, Roth EM et al (2006) A prospective study of patient safety in the operating room. Surgery 139:159–173 25. Rogers SO Jr, Gawande AA, Kwaan M et al (2006) Analysis of surgical errors in closed malpractice claims at 4 liability insurers. Surgery 140:25–33 26. Nestel D, Kidd J (2006) Nurses’ perceptions and experiences of communication in the operating theatre: a focus group interview. BMC Nurs 5:1 27. Lingard L, Whyte S, Espin S et al (2006) Towards safer interprofessional communication: constructing a model of “utility” from preoperative team briefings. J Interprof Care 20:471–483 28. Moss J, Xiao Y (2004) Improving operating room coordination: communication pattern assessment. J Nurs Adm 34: 93–100 29. Lingard L, Reznick R, Espin S et al (2002) Team communications in the operating room: talk patterns, sites of tension, and implications for novices. Acad Med 77:232–237 30. Lingard L, Espin S, Whyte S et al (2004) Communication failures in the operating room: an observational classification of recurrent types and effects. Qual Saf Health Care 13:330–334 31. Smith AF, Pope C, Goodwin D et al (2008) Interprofessional handover and patient safety in anaesthesia: observational study of handovers in the recovery room. Br J Anaesth 101:332–337 32. Anwari JS (2002) Quality of handover to the postanaesthesia care unit nurse. Anaesthesia 57:488–493 33. Al-Benna S, Al-Ajam Y, Alzoubaidi D (2009) Burns surgery handover study: trainees’ assessment of current practice in the British Isles. Burns 35:509–512 34. Kennedy R, Kelly S, Grant S et al (2009) Northern Ireland General Surgery Handover Study: surgical trainees’ assessment of current practice. Surgeon 7:10–13 35. Reader TW, Flin R, Mearns K et al (2007) Interdisciplinary communication in the intensive care unit. Br J Anaesth 98: 347–352 36. Mills P, Neily J, Dunn E (2008) Teamwork and communication in surgical teams: implications for patient safety. J Am Coll Surg 206:107–112 37. Greenberg CC, Regenbogen SE, Studdert DM et al (2007) Patterns of communication breakdowns resulting in injury to surgical patients. J Am Coll Surg 204:533–540 38. Williams RG, Silverman R, Schwind C et al (2007) Surgeon information transfer and communication: factors affecting quality and efficiency of inpatient care. Ann Surg 245: 159–169 39. Manojlovich M, DeCicco B (2007) Healthy work environments, nurse-physician communication, and patients’ outcomes. Am J Crit Care 16:536–543 40. Awad SS, Fagan SP, Bellows C et al (2005) Bridging the communication gap in the operating room with medical team training. Am J Surg 190:770–774
K. Nagpal and K. Moorthy 41. Haynes AB, Weiser TG, Berry WR et al (2009) A surgical safety checklist to reduce morbidity and mortality in a global population. N Engl J Med 360:491–499 42. Pronovost P, Berenholtz S, Dorman T et al (2003) Improving communication in the ICU using daily goals. J Crit Care 18: 71–75 43. Maloney CG, Wolfe D, Gesteland PH et al (2007) A tool for improving patient discharge process and hospital communication practices: the “Patient Tracker”. AMIA Annu Symp Proc:493–497 44. Nundy S, Mukherjee A, Sexton JB et al (2008) Impact of preoperative briefings on operating room delays: a preliminary report. Arch Surg 143:1068–1072 45. Lingard L, Regehr G, Orser B et al (2008) Evaluation of a preoperative checklist and team briefing among surgeons, nurses, and anesthesiologists to reduce failures in communication. Arch Surg 143:12–17; discussion 18 46. Narasimhan M, Eisen LA, Mahoney CD et al (2006) Improving nurse-physician communication and satisfaction in the intensive care unit with a daily goals worksheet. Am J Crit Care 15:217–222 47. Agarwal S, Frankel L, Tourner S et al (2008) Improving communication in a pediatric intensive care unit using daily patient goal sheets. J Crit Care 23:227–235 48. Thompson AG, Jacob K, Fulton J et al (2004) Do post-take ward round proformas improve communication and influence quality of patient care? Postgrad Med J 80:675–676 49. Phipps LM, Thomas NJ (2007) The use of a daily goals sheet to improve communication in the paediatric intensive care unit. Intensive Crit Care Nurs 23:264–271 50. Friedman DM, Berger DL (2004) Improving team structure and communication: a key to hospital efficiency. Arch Surg 139:1194–1198 51. Catchpole KR, de Leval MR, McEwan A et al (2007) Patient handover from surgery to intensive care: using Formula 1 pit-stop and aviation models to improve safety and quality. Paediatr Anaesth 17:470–478 52. Wright S, Bowkett J, Bray K (1996) The communication gap in the ICU–a possible solution. Nurs Crit Care 1: 241–244 53. Dodek PM, Raboud J (2003) Explicit approach to rounds in an ICU improves communication and satisfaction of providers. Intensive Care Med 29:1584–1588 54. Jacques PS, France DJ, Pilla M et al (2006) Evaluation of a hands-free wireless communication device in the perioperative environment. Telemed J E Health 12:42–49 55. Soto RG, Chu LF, Goldman JM et al (2006) Communication in critical care environments: mobile telephones improve patient care. Anesth Analg 102:535–541 56. Webster JL, Cao CG (2006) Lowering communication barriers in operating room technology. Hum Factors 48: 747–758 57. Van Eaton EG, Horvath KD, Lober WB et al (2004) Organizing the transfer of patient care information: the development of a computerized resident sign-out system. Surgery 136:5–13 58. Van Eaton EG, Horvath KD, Lober WB et al (2005) A randomized, controlled trial evaluating the impact of a computerized rounding and sign-out system on continuity of care and resident work hours. J Am Coll Surg 200:538–545
General Surgery: Current Trends and Recent Innovations
62
John P. Cullen and Mark A. Talamini
Contents 62.1
Introduction ............................................................ 781
62.2
Trends in General Surgery Research ................... 782
62.2.1 The Call for Patient Safety and Medical Information Technology ........................................... 62.2.2 Requirements for Reporting ..................................... 62.2.3 Trends Across Institutional Review Boards ............. 62.2.4 Privacy Legislation ................................................... 62.2.5 Trends in Funding for Surgical Research ................. 62.2.6 Partnerships for Clinical Trials................................. 62.3
782 783 784 784 784 785
Recent Innovations ................................................. 785
62.3.1 Surgical Robotics ..................................................... 62.3.2 Natural Orifice Translumenal Endoscopic Surgery (NOTES) ..................................................... 62.3.3 Artificial Organs and Device Engineering ............... 62.3.4 Training the Surgeon of the Future ..........................
785 787 788 789
References ........................................................................... 790
Abstract In surgery, new technology has been developed to eliminate some problems that surgeons face during more challenging laparoscopic or conventional operations. Surgical robotic systems have restored much of what was lost and altered the learning curve in the surgeon’s favor. At the same time, advanced surgical instruments and optics continue the trend of the smaller scar. Minimally invasive surgery has given birth to a new and exciting era in surgical research. The future of surgical research lies, in part, in computer and visual technology. This chapter begins by describing many of the current trends in surgical research. The second half of the chapter describes the latest innovations in general surgical research. Surgeons in numerous disciplines continue to make great strides in basic science research. However, the latest and most exciting innovations have largely developed through the integration of new innovative technology into surgical practice.
62.1 Introduction
M. A. Talamini () Department of Surgery, University of California at San Diego, 200 West Arbor Drive, #8400 San Diego, CA 92103, USA e-mail: [email protected]
The overall trend in surgical practice over the last few decades has been miniaturization. Incisions and instruments have been made smaller, and the impact of major operations on patients has decreased, along with lengthy hospital stays. The surgical sciences and our approach to operative surgery were permanently altered by the introduction of laparoscopy. The goals of surgery have not been changed; surgeons still resect tumors, repair hernias, and insert biomedical devices. However, the method of accomplishing these goals and the resulting impact on patients is dramatically different. The trend of adopting smaller incisions and quicker recoveries has been seen across multiple medical subspecialties.
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_62, © Springer-Verlag Berlin Heidelberg 2010
781
782
Interventional cardiology advancements in less invasive cardiac catheterization procedures have reduced the number of coronary artery bypass grafts. Gastroenterologists have an array of therapeutic options at their disposal with modern endoscopes. Interventional radiologists can localize and embolize a bleeding vessel and spare a patient the more traumatic open operation. Even postoperative intra-abdominal infection can often be controlled using IR guided drainage. Surgical subspecialties have also focused on minimizing the impact of an operation. Thoracoscopic procedures allow a surgeon access to the chest without a large and painful thoracotomy incision and without spread in the ribs. Urologists now commonly perform robotic-assisted or laparoscopic prostatectomies, reducing the morbidity of this operation. A skilled bariatric surgeon can surgically help treat obesity with adjustable gastric banding rather than rearranging the intestines in a gastric bypass. But while usually better for patients, minimally invasive procedures have burdened surgeons with a new set of problems. Three-dimensional visibility is lost in laparoscopy. The surgeon’s vision is placed into the hands of another physician, nurse, or medical student commanding the laparoscope. The tremendous tactile capacity of the fingertip is replaced by an inflexible instrument. Minimally invasive operations are often technically challenging, and for this reason, laparoscopy is less frequently employed in complex, difficult procedures. For every new technology that emerges, physicians must spend extra time training to become proficient, and laparoscopy is no exception. But these shortfalls of laparoscopy created a need for improvements in technology. One potential solution attempting to address some of these deficits is surgical robotics. This technology has been developed to eliminate some problems surgeons face during morechallenging laparoscopic operations. Surgical robotic systems have restored much of what was lost and altered the learning curve in the surgeon’s favor. At the same time, advanced surgical instruments and optics continue the trend of the smaller scar. Minimally invasive surgery has given birth to a new and exciting era in surgical research. The future of surgical research lies, in part, in computer and visual technology. As high-definition televisions have made their way into the marketplace, they have also found their way into operating room suites. Modern day radiological computer systems allow surgeons to view films during surgery in the operating room, replacing the hard copies
J. P. Cullen and M. A. Talamini
and light boxes of past days. Surgeons may digitally permanently record an operation for teaching or medico-legal purposes. The latest endoscopy and laparoscopy trainers resemble video games in their interfaces and graphic capabilities. In the midst of this technological boom, patients have become more informed than ever possible. The Internet has placed a wealth of information directly into the hands of the patient. Resourceful patients seek out clinical trials for their particular condition, or pursue the newest treatments simply on the belief of improved outcomes. At the same time, hospitals and physicians are facing increasing scrutiny regarding patient safety. Increased regulation has created new challenges for human research projects. The relationship between the medical device industry, physicians and researchers, and the ethical reporting of data has been placed into the spotlight by several high-profile situations. This chapter will begin by describing many of the current trends in surgical research. The second half of the chapter will describe the latest innovations in general surgical research. Surgeons in numerous disciplines continue to make great strides in basic science research. However, the latest and most exciting innovations have largely developed through the integration of new innovative technology into surgical practice. The first half of the chapter will specifically examine several of the latest fields of innovation, all virtually in their infancy. The impact of surgical robotics, the potential of natural orifice transluminal surgery, the latest in bioengineering, and the latest innovations in surgical training will be reviewed in detail.
62.2 Trends in General Surgery Research 62.2.1 The Call for Patient Safety and Medical Information Technology In the recent years, there have been many new regulations and requirements placed upon health care providers in the name of patient protection and accurate reporting. Most notably, a report released by the Institute of Medicine in 2002 attributed nearly 100,000 deaths to medical error [1]. This number exceeds that
62 General Surgery: Current Trends and Recent Innovations
783
of breast cancer deaths, HIV-related deaths, and deaths from motor vehicle crashes. The results of this report put the entire medical community under heavy scrutiny, and numerous changes have been made since to reduce patient error. Though the methods or numbers of the study may be criticized, it is clear that room for improvement exists. Technology has facilitated the task of reducing medical error. Computerized order entry systems are one example. In most systems, physicians directly enter orders into a computer system, eliminating potential communication errors between the doctor and nurse or the ward clerk. Computerized entry has been demonstrated to reduce medication errors [2]. Computer systems also can create a computerized log of prior orders that physicians can double-check to ensure appropriate medications have been given. Combined with automated dispensing machines and bar-code identification of patients and medications, computerbased technology can reduce human error [3]. Research benefited from the technological advances designed to improve patient care. Along with computerized ordering systems, many health care records are now stored digitally. This has facilitated the collection of patient data, and through the power of the Internet, it is easier than ever to record and analyze large sets of patient data. Modern computers can find trends or correlations within large data sets that investigators would not necessarily detect. The web has also changed the way scientists conduct their research and publish their results. The Internet is often taken for granted in this day and age. Medical students and residents now train with a wealth of information only a few mouse clicks away. Most major medical textbooks and journals are now available online. In the United States, Pubmed central is the National Institute of Health’s free Internet-based resource for searching. It directly links to the National Library of Medicine’s MEDLINE database and provides access to over 4,000 journals. In addition to facilitating review, for many journals and conferences, electronic submissions have greatly streamlined the submission and publication process. Expert reviewers now access articles and provide peerreview feedback through web-based systems. Finally, the Internet has forever altered the physician–patient relationship. Patients now have easy access to an incredible amount of medical information (or at times, marketing or misinformation). Patients may use the web to
learn more about a specific disease, hospital or doctor. Patients may also join web-based support groups. The Internet may also alter the interactions between doctor and patient. One study of obese patients found webbased counseling sessions to be as effective as face-toface meetings [4]. A continuously used web-based system may motivate this patient population in instances where health care providers are not available [5]. Advanced systems allow patients to maintain an online food diary and track weight changes with graphs. Patients may find that an email to a dietician or a physician is an easier method of communication when compared to calling a busy office looking for help. Due to the sensitive nature of obesity and the stigma attached to it, the Internet may be an effective weapon in fighting obesity [6]. This new medium and the information age have greatly benefited all of scientific research. The Internet allows for easier exchange of information. Larger patient databases can be compiled from multiple institutions. Live surgical cases can be transmitted into surgical research meetings. The Internet will continue to provide tools that are critical to success of surgical research.
62.2.2 Requirements for Reporting National clinical registries ensure adequate reporting. Currently, all clinical trials within the United States are required to be in the registry [7]. Critics of the registry contend that it is difficult to keep an idea or intellectual property proprietary if it is posted on a website. As with all data collection, registering a trial requires effort in the form of time and paperwork. While the registry does offer advertising, there are many offices that may not want to be contacted by potential subjects. But, there are also numerous benefits to such a registry. Patients who are familiar with the clinical trials website may seek out additional information or newer treatments for a condition through a clinical trial. Such a registry ensures accurate and honest reporting of trial results on the part of researchers and sponsors. Researchers may also learn what trials are already being conducting in a particular field, and may focus their efforts accordingly. A wealth of information can also be gained from large data banks. Clinical quality measures are currently being gathered into
784
large data repositories such as NSQIP, allowing for data analysis not possible on a local level [8].
62.2.3 Trends Across Institutional Review Boards Institutional review boards operate at an institutional level. Significant variability amongst IRBs has been found in multiple studies. One group reports variability amongst 16 local IRBs [9]. For the same proposed clinical trial, one IRB waived the requirement for informed consent while five IRBs permitted telephone consent. Of the 16 boards, only three approved consent forms met federal regulations regarding the basic content of informed consent. Another similar study demonstrated the variability in allowing expedited review. An educational study submitted to several IRBs for expedited review was only granted the process in 2/3 of respondents [10]. One IRB was removed from the study after not responding to an application for 164 days. Even within a single board, there may be variations between subcommittees assigned to a protocol. Some have proposed a national review system to correct this variability [11], but it is difficult to envision a centralized system being sufficiently efficient. The variability of IRB approval will continue to impact surgical research. However, a good IRB process protects both research subjects, and the researcher. A successful researcher will develop a collaborative relationship with his or her local IRB.
62.2.4 Privacy Legislation The American Health Insurance Portability and Accountability Act of 1996 (HIPPA) increased patient privacy protection, but to many hospitals, its enforcement was and is an “enormous chore” [12]. Critics contend that the privacy legislation has had a negative impact on surgical research. Since randomized controlled trials involving surgical procedures are difficult to perform, surgeons have more frequently relied on case control studies or case series to attempt to draw conclusions in clinical research. HIPAA made these types of retrospective studies much more difficult to perform even though they pose only minimal risk in terms of privacy [13].
J. P. Cullen and M. A. Talamini
The impact of increased regulation such as HIPPA is difficult to predict. Many believe that the privacy laws are a significant burden to health research [14, 15]. The increased amount of paperwork and red tape may deter some researchers from participating in clinical research altogether [16]. Patients may be less likely to enroll in a clinical trial simply because of the inclusion of HIPAA related verbiage in consent forms. In one study, patients were consented to a mock study in two groups: with and without associated HIPAA consent forms. The HIPAA consented group was less likely to enroll in the study. Concern over privacy was the principal reason for refusing to enter the study, but poor understanding of the form was the second most common reason given [17]. Surgical researchers will have no choice but to adapt and allocate resources to ensure compliance with HIPPA. Medical and surgical research has provided great advances for our civilization, but the memory of past failures of the last century, such as the Tuskegee experiments [18], remains. Ultimately, HIPPA should at least, in part, provide assurances to the public regarding patient safety and protection in the process of advancing science.
62.2.5 Trends in Funding for Surgical Research Research cannot be done for free. The NIH remains a major source of research funding for surgeons, but primarily for preclinical and basic science research. Industry and other non-governmental groups play more of a role in applied clinical research. The relationship between researchers and industry has changed recently in the USA in response to political and public relations concerns. Recently, there has been conflict regarding the control of data and the duty to report study findings even if the outcome is undesirable. Funding for surgical research from the NIH has remained steady over the last few years. However, it has been noted that the amount of funding to Medical Departments is increasing much greater than for Surgical Departments [19]. Surgical research has maintained a ratio of 25% basic science and 75% clinical research over the last few years, and that these percentages have been stable. Surgeons in the field of oncology have been particularly disappointed in the trends
62 General Surgery: Current Trends and Recent Innovations
785
for oncology funding over the last decade [20, 21]. One of the chief concerns has been a decrease in surgical research funding compared to the nonsurgical sciences, which has correlated with a decreased number of surgeons involved in the review process [21].
focus on some recent innovations in surgical research. Tremendous gains have been made in technology in the recent years. The fields of robotics and natural orifice surgery have the potential to further reduce the invasiveness of an operation on a patient. At the same time, tissue and bioengineering have created new ways of treating the disease. Finally, the latest developments in training will be explored.
62.2.6 Partnerships for Clinical Trials Numerous questions have emerged in recent years regarding best practices for clinical trials. Academic centers and industry sponsors have traditionally assisted each other in the advancement of medical science, but the two parties may not always have identical goals for this research. There have not been clear guidelines. Conflicts of interest can arise regarding control of data and subsequent publication rights. The agenda of a surgery department may be different than that of an industry sponsor in the development of a new device or drug. Mello et al. conducted a survey of medical-school research administrators regarding institutional policies for clinical trial agreements with industry [22]. There was considerable variability between institutions. Most notably, only half of responders would forbid an industry sponsor to insert its own statistical analyses, and 40% of respondents would allow a sponsor to share data with third parties. Of further note, 75% of respondents reported a conflict in the previous year. Still, despite these recent difficulties, it should be noted that industry supported studies have greatly contributed to the advancement of health care. With increased awareness of potential conflicts and a strong call for academic freedom, the variability in clinical trial agreements should decrease in recent years [23].
62.3 Recent Innovations Over the last few years, there have been numerous breakthroughs in the fields of trauma, plastic surgery, colorectal surgery and other surgical disciplines. Clinical outcomes data have permitted the refinement of clinical decision making. Basic science researchers have continued to make advances in the fundamental understanding of oncology, vascular endothelium and the body’s response to trauma. In this review, we will
62.3.1 Surgical Robotics Once relegated to the realm of science fiction films or novels, surgical robotics is a now a reality. This new emerging field is a rich, important research opportunity, as there is much to learn. Outcomes, education, ergonomics, and technical aspects of next-generation systems are all important arenas of investigation. Laparoscopy’s advantages include smaller scars and shorter hospital stays. For the patient, there is little downside to a laparoscopic approach. For surgeons, however, several sacrifices had to be made. Most notably, laparoscopy reduced the surgeon’s vision from 3 dimensions to 2, and forced the surgeon to rely on an assistant to provide camera control during a procedure. The degrees of freedom provided by laparoscopic instruments are less than that of the hand, and the learning curve for laparoscopic operations is often steep. The use of a robotic surgical system reduces or eliminates some of these disadvantages. The robotic system also holds the promise of telesurgery, allowing surgeons to potentially perform an operation from miles away [24]. Laparoscopy forever altered the science of surgery, though the concept initially met some resistance. Eventually, laparoscopic or thoracoscopic operations became the standard of care for numerous operations. Even with minimally higher complication rates, patients preferred the reduced morbidity of several small incisions compared with a single large one. But while laparoscopy decreased hospital stays and minimized patient morbidity, these operations put additional demands upon surgeons. In laparoscopy, surgeons work from a two-dimensional video screen to perform an operation. The laparoscope is usually manipulated by an assistant surgeon or nurse who cannot in real time anticipate the visual needs of the surgeon. The instruments used reduce the degrees of freedom a surgeon has, and even tasks that are simple in
786
open surgery such as tying a knot become much more difficult when performed laparoscopically. The use of robotic surgical systems restored much of what had been taken by laparoscopy. At the console, the surgeon has full control over the camera and threedimensional vision of the target organs. The wrists of the da Vinci surgical system provide a full six degrees of freedom. In addition to this, the da Vinci system can be linked to a teaching console, so that a student and a teacher can both have access to the surgical instruments. The use of modern communications systems and robotic systems allows for surgery to be performed across great distances. The daVinci system was designed with general surgery and cardiac surgery applications in mind. However, it is now clear that robotic assisted laparoscopic prostatectomy (RALP) is an ideal application. This da Vinci robotic system allowed urologists to leap frog the learning curve of standard advanced laparoscopy. It has provided a means of accomplishing a prostatectomy in a minimally invasive way with equivalent risk of impotence and incontinence, problems which are well known and feared by prostate cancer patients. Since the first RALP was performed in 2000 [25], nearly 30,000 of these procedures have been completed at present, representing 25% of all prostatectomies [26]. Return of continence does appear to occur earlier for RALP with a trend toward improved overall continence rates (98% for RALP vs. 95% in RRP in the best reported series) even inclusive of learning curve cases. Further refinements to RALP may lead to an improvement in clinical outcomes. Menon et al. for example, developed a new technique of prostatic fascial preservation that has greatly improved potency rates in one small series [27]. Such refinements in technique are only possible through clinical research utilizing advanced surgical systems such as the daVinci system. General surgeons have preliminarily applied robotics to nearly all areas of abdominal surgery. The robot has been successfully used for cholecystectomy, Heller myotomy, esophagectomy, lysis of adhesions, distal pancreatectomy, splenectomy, bowel resection, duodenal polypectomy, adrenalectomy, exploratory laparoscopy, donor nephrectomy and pyloroplasty [28]. Esophageal and bariatric operations have benefited the most from the robotic system. The system greatly reduces the learning curve for technically challenging operations. Improved outcomes may be another reason. One multi-institutional series of 104 patients undergoing robotic-assisted
J. P. Cullen and M. A. Talamini
Heller myotomy had 100% success with no mucosal perforations [29]. The benefit was even found in patients with prior Botox treatment, a known risk factor for mucosal perforation [30]. For esophagectomy, the robotic system allows good visualization and tissue manipulation in the mediastinum, where blunt, blind, hand dissection is used in transhiatal esophagectomy. In this approach, the esophagus is removed using smaller incisions, sparing the patient a midline abdominal incision or a painful thoracotomy. In the gastric bypass operation, completion of the gastrojejunal anastomosis is the key step in the operation. If this anastomosis leaks, the complication can be potentially devastating. Use of a robotic system allows for a handsewn anastomosis under a three-dimensional vision [31]. This is especially important in the super-morbid obese where excess fat or a fatty liver may impair the surgeon to safely connect the stomach to the jejunum. The initial learning curve of gastric bypass is better for surgeons when compared with laparoscopy [32]. In skilled hands, robotic assistance may even reduce bariatric procedure time. In one study, the operative time for robotic Roux-en-Y gastric bypass was actually faster than the laparoscopic approach: 169 (robot) vs. 208 min (laparoscopic) [33]. Further research completed by surgeons far along the robotic learning curve will better determine the future of surgical robotics. The da Vinci “S” system is the latest surgical robotic system from Intuitive Surgical. This evolution provides the surgeon with newer longer instruments and greater range of motion within the body. Further refinements to the controls and improvement in haptic feedback areas being researched will produce future robotics systems. Robotics also offers the potential to integrate radiologic imaging into the console image, allowing the surgeon a view of radiologic mages. Advanced technology will seamlessly integrate this data into the operative image, perhaps by allowing the projection of a CT image onto the patient’s actual visualized surgical anatomy. Several other systems are also in place and used successfully. In orthopedics, the Iso-C3D (Siemens, Germany) allows for three-dimensional imaging to ensure proper alignment of instruments [34]. The AESOP system (Intuitive Surgical, Sunnyvale, CA) allows voice control of the endoscope to assist in thoracic and neurosurgical procedures. It also eliminates tremor and the need of a second skilled surgeon to manage the scope [35]. The EndoAssist (Armstrong Healthcare, High Wycombe) is another robotic device that performs
62 General Surgery: Current Trends and Recent Innovations
787
a similar function [36]. Further refinements to these and other emerging technologies will improve their utility, but significant decreases in cost will be needed to secure their use. Surgical robotics remains in its infancy, but holds great promise based upon the research necessary to produce effective systems. The robotic approach now equals the standard of care for mitral valve repair, Heller Myotomy and radical prostatectomy [37]. Other refinements will need to be made to current systems, and a decrease in costs and operative times will increase the use of robotic systems. This technology may increase the number of technically challenging major operations that are currently not commonly treated in a minimally invasive approach. Research efforts will continue in this field, and the idea of flexible robotics and endolumenal mini-robots [38] are among the most exciting technological concepts in surgery.
adjustable gastric banding can be completed as outpatient procedures. NOTES has the potential to further this trend, as patients may have decreased pain following operations. Additionally, anesthetic requirements may decrease using a NOTES approach. There are several potential disadvantages to a natural orifice approach. The primary problem with natural orifice surgery is that it opens the abdominal cavity to a separate set of microbial pathogens. While antibiotics solutions can be used to lavage the lumens of the gastrointestinal tract or the vagina, these treatments might not be as effective in reducing bacterial counts as topical skin solutions. Along with infection, there are several other risks to the natural orifice approach. Because the endoscope is not designed to resect solid organs, until appropriate tools can be developed, complication rates may be higher. Endoscopic clips may not provide secure ligation of arteries or ducts as laparoscopic clips. In the event of emergent bleeding or damage to surrounding organs, the endoscope may be incapable of correctly treating the complication. Retraction is difficult with current endoscopes, and the small diameter instruments do not provide much strength without bending. Kalloo, et al. [39] began the current interest in NOTES by using an endoscope in 2004 to create a gastrotomy in a porcine model to survey the abdomen. This approach was intended to advance the capabilities of endoscopy. But perforation is a feared complication of endoscopy, so this procedure did not immediately make sense to many physicians. In 2005, leaders from the two key American societies (The Society of American Gastrointestinal and Endoscopic Surgeons-SAGES and The American Society of Gastrointestinal EndoscopistsASGE) gathered and drafted a white paper that outlined the progress made in natural orifice surgery and intended it to guide research endeavors [40]. Soon after, numerous other groups began developing natural orifice surgery in animal models. Cholecystectomy has been performed in animal models [41, 42]. Other operations including fallopian tube ligation [43] and gastro-jejunostomy [44] have also been performed by NOTES in animal models. Several patients have also undergone NOTES procedures in initial locally approved clinical trials. The transvaginal route has been favored thus far for its simplicity, and it’s already commonplace use by gynecologists (Fig. 62.1). Most NOTES procedures to date are performed with the assistance of transabdominal laparoscopic vision for safety. More research will be needed to determine the true clinical value of NOTES. Regardless of the
62.3.2 Natural Orifice Translumenal Endoscopic Surgery (NOTES) NOTES is among the most recent and innovative surgical research endeavors. This approach makes use of a patient’s natural orifices to perform operations. Surgeons may access the abdominal cavity through a small incision in the stomach, bladder, rectum, vagina, or esophagus. This allows surgeons to complete operations without leaving visible evidence on the patient’s body. The chief potential advantage of NOTES is the ability to accomplish a scar-less operation. Though this benefit is primarily cosmetic, use of a NOTES approach also eliminates the risk of hernia formation and skin infection. Patients with high postoperative hernia rates, such as the morbidly obese, might be one patient population that would benefit from a NOTES approach. Patients with significant abdominal wall pathology such as burns or excessive scar tissue might also benefit from this approach. There may also be advantages in terms of reduced pain and disability. This field is currently attracting significant focus in the surgical research world. As surgeons developed means to minimize the trauma created during operations (minimally invasive surgery), the entire medical system has adapted. A large number of outpatient surgical centers were born out of the development of minimally invasive surgery, and many cases such as cholecystectomy, appendectomy, hernia repair, and
788
J. P. Cullen and M. A. Talamini
suturing devices are in development to create options for weight loss [46].
62.3.3 Artificial Organs and Device Engineering
Fig. 62.1 The abdomen of a patient following a laparoscopic assisted transvaginal cholecystectomy demonstrates no abdominal wall scars
eventual fate of NOTES, the research movement has generated a tremendous amount of innovation between surgical research laboratories and industry. It also stands out as an example of the power of collaboration, as leaders from both the surgical disciplines and gastroenterology have made contributions. Among the key benefits of NOTES research is the development of new instruments and techniques that can be applied to conventional surgical operations. NOTES research has led to the development of a number of devices that will likely prove useful to minimally invasive surgeons. The ShapeLock system from USGI (San Clemente, CA) is one such device. This is a flexible overtube that locks into place to provide extra support to the endoscope [45]. RealHand instruments from Novare (Cupertino, CA) are flexible laparoscopic instruments that provide additional degrees of freedom that allow surgeons to reach different angles from a single axis point (Fig. 62.2). A number of endoscopic
Fig. 62.2 Flexible operating platforms increase the capabilities of endoscopic therapy. Photo courtesy of USGI (2008)
In the 1930s, Willem Kolff watched several patients pass away due to kidney failure, and he set to work to create the first kidney dialysis machine. Using data gathered by John Abel at Johns Hopkins several years earlier, he devised a plan for dialysis. Despite a lack of resources and the Nazi occupation of the Netherlands, Kolff developed the first dialysis machine and managed to wake a patient from a uremic coma in 1945. From there, he distributed his machines around the world. The story of kidney dialysis is among the best examples of perseverance and altruism in medical research [47]. It also marks how artificial devices can improve patient care. Liver dialysis has the potential to save many lives. Unfortunately, the function of the liver in removing blood toxins and in protein synthesis is much more complicated than that of the kidney, and usage of liver dialysis systems has been limited. Furthermore, these systems cannot perform the synthetic functions of the liver. The use of MARS (molecular adsorbent recirculation system) has been described in several small studies but has had little impact on overall mortality [48]. However, at least one larger meta-analysis review found that artificial support may improve mortality in acute liver failure in the setting of chronic liver failure [49]. Other systems for liver failure are emerging but require more research. The Prometheus system is another such example [50]. Despite limited results in the first generation of liver dialysis devices, the field remains an important area of surgical research with the potential to save numerous lives. Artificial heart research has been both important, and popular in the media and in medical journals. Cardiac surgeons and cardiologists have successfully implanted artificial hearts as bridge therapy to sustain patients awaiting heart transplantation. Since only (approximately) 2,000 heart transplants are completed in the United States annually, the need for organs greatly exceeds the supply. A large number of patients die while on waiting lists. Cardiowest (Syncardia Systems, Tuscon, AZ) and Abiocor (Abiomed, Inc, Danvers, MA) are the two FDA approved total artificial hearts which are available. Both have prolonged survival in patients
62 General Surgery: Current Trends and Recent Innovations
789
with severe heart failure when compared with expected survival rates [51]. However, complications such as infection, mechanical failure, and most notably bleeding have limited the use of these devices. The longest survival with the Cardiowest device thus far is just over 600 days [52]. The Abiocor has been implanted in a little over a dozen patients, the longest living over a year with the device. Still, research continues, and the Abiocor II (due in 2008) is hoped to greatly improve upon its predecessor by providing a smaller size and longer life. In addition to organs, current research continues in engineering other forms of replacement tissues. Active research continues in the development of biomaterials that are less thrombogenic, in the hope of creating bypass grafts and other devices less prone to clotting and failure [53]. Several variations of coated or drug-eluting stents are currently available in an attempt to improve upon standard stents. Lightweight hernia meshes can maintain the same strength as heavier counterparts while allowing better tissue integration [54]. The laparoscopic adjustable band is another device that has gained in popularity in the recent years. As the United States continues to be afflicted by an epidemic of obesity, the adjustable band provides a safer operation than traditional gastric bypass or duodenal switch [55]. Though the weight lost may not equal that of the other operations, the positive metabolic benefits of weight loss are still enjoyed by the patient. The band, also, may be removed if complications occur. The success of adjustable gastric banding has inspired a large number of similar devices, both intra- and extra-gastric, most of which are still in preclinical studies. Gastric pacing is an innovation that may also be useful in treating morbid obesity. The pacemaker is implanted during a laparoscopic operation [56]. While the true clinical value is yet to be determined, manipulation of the gastric nervous system may be beneficial for decreasing hunger signals or as primary treatment for gastroparesis. The popular press keeps their eye on additional potential developments, which may emerge over the next decade. Artificial lungs, kidneys, glucose detecting contact lenses and microchips that reduce the impact of Alzheimer’s are amongst the wild ideas being developed by researchers to improve patient care [57]. On top of this, stem cell research may provide better treatment for paralysis, Parkinson’s, diabetes and a number of other potential diseases.
62.3.4 Training the Surgeon of the Future In the United States, increasing concerns for patient safety and decreased resident work hours have forever altered surgical training. Resident surgeons currently need to master several operative disciplines, including open surgery, endoscopy, and laparoscopic surgery. Training in robotics and advanced endoscopy and NOTES may not yet be universal, but may well be of benefit to a developing surgeon. Due to these factors, a change in resident training has occurred. Several research efforts have closely analyzed how a surgeon learns, and technology has created new tools to provide that training. The days of “See one, do one, teach one” are over. Surgical residents should be familiarized with laparoscopic, endoscopic, and robotic technologies prior to entrance in the operating room. Surgical residents need to demonstrate adequate mastery of advanced skills such as intracorporeal knot-tying in simulation prior to training these skills in live patients. This optimizes patient safety while still promoting the development of surgical skill. Several virtual reality trainers have emerged over the last few years. Data generally shows they are effective in training residents to perform operative tasks [58]. These resources are of varying graphical and technical capabilities. However, they are expensive, and may not provide any more effective training than more basic black-box low fidelity trainers. This is also an important arena of surgical research with the potential to develop better, more realistic surgical simulators. Surgical robotics has specific issues related to the training of surgeons. In a survey of residents at our institution, 100% of surgical residents indicated they would sign up for a course in robotic training, but only 50% had experienced such a class [unpublished data]. Surgical residents also predicted a much larger role for surgical robotics than attending surgeons across several disciplines. The American Board of Surgery is aware of these increasing demands on residents. The number of endoscopic cases required during surgical residency was recently increased. The Fundamentals of Laparoscopic Surgery training module (FLS), developed by SAGES, is rapidly becoming an integral part of residency training. Streamlining surgical education has also been proposed. However, a successful, well-entrenched training paradigm such as the American surgical residency will likely require significant time to see significant change. Cardiac surgeons, for instance, might be better served
790
by 3 years of General Surgical training followed by 3 years of cardiac surgical training rather than five of the former and two of the latter. This evolution is further warranted by projected shortages in many surgical disciplines. Cost and time remain the biggest barriers to resident robotic training. Aside from a special training system, the da Vinci system only has one seat. A mentor may find it difficult to instruct a pupil who is seated at the console, facing the monitor. A special training system has been developed that allows an instructor to assist the trainee while sitting at a separate console [59]. The use of a mentoring console for teaching or telemedicine affords robotics an advantage in training, as a dedicated, trained instructor could potentially assist and train in surgery using the robot from a remote location. A two-headed da Vinci robot allows for a collaborative effort between the robot arms, and has been demonstrated to improve performance. This type of technology will be incredibly useful for intraoperative training, as the supervising surgeon may act as a driving instructor with access to a brake pedal. Modern communication capabilities allow surgeons to connect with health care personnel over long distances. Surgeons can now connect to their counterparts across oceans or to military personnel sailing on the seas [60]. Combining this reach with robotic technology will allow us to deliver expert care without travel time. Ultimately, the future for surgeons and for surgical research is incredibly bright. While increased regulation may be viewed as a hindrance, public protection and patient safety should bolster faith in physicians and their commitment to the study of disease and its treatment. A wide range of active research fields and opportunities ensure that surgery and medicine continue to improve over time, and that future surgical practice will be radically different than today.
References 1. Kohn LT, Corrigan JM, Donaldson MS (2000) To err is human: building a safer health system. National Academy Press, Washington 2. Bates DW, Leape LL, Cullen DJ et al (1998) Effect of computerized physician order entry and a team intervention on prevention of serious medication errors. JAMA 280:1311–1316 3. Anon (2003) Scanning medication barcodes improves accuracy at Lehigh Valley Hospital. Perform Improv Advis 7:132–134; 129 4. Harvey-Berino J, Pintauro S, Buzzell P et al (2004) Effect of internet support on the long-term maintenance of weight loss. Obes Res 12:320–329
J. P. Cullen and M. A. Talamini 5. Polzien KM, Jakicic JM, Tate DF et al (2007) The efficacy of a technology-based system in a short-term behavioral weight loss intervention. Obesity (Silver Spring) 15: 825–830 6. Saperstein SL, Atkinson NL, Gold RS (2007) The impact of Internet use for weight loss. Obes Rev 8:459–465 7. Irwin RS (2007) Clinical trial registration promotes patient protection and benefit, advances the trust of everyone, and is required. Chest 131:639–641 8. Khuri SF, Daley J, Henderson W et al (1998) The Department of Veterans Affairs’ NSQIP: the first national, validated, outcome-based, risk-adjusted, and peer-controlled program for the measurement and enhancement of the quality of surgical care. National VA Surgical Quality Improvement Program. Ann Surg 228:491–507 9. Silverman H, Hull SC, Sugarman J (2001) Variability among institutional review boards’ decisions within the context of a multicenter trial. Crit Care Med 29:235–241 10. Dyrbye LN, Thomas MR, Mechaber AJ et al (2007) Medical education research and IRB review: an analysis and comparison of the IRB review process at six institutions. Acad Med 82:654–660 11. Stair TO, Reed CR, Radeos MS et al (2001) Variation in institutional review board responses to a standard protocol for a multicenter clinical trial. Acad Emerg Med 8: 636–641 12. Kilbridge P (2003) The cost of HIPAA compliance. N Engl J Med 348:1423–1424 13. O’Herrin JK, Fost N, Kudsk KA (2004) Health Insurance Portability Accountability Act (HIPAA) regulations: effect on medical record research. Ann Surg 239:772–776; discussion 776–778 14. Kulynych J, Korn D (2002) The effect of the new federal medical-privacy rule on research. N Engl J Med 346:201–204 15. Tovino SA (2004) The use and disclosure of protected health information for research under the HIPAA privacy rule: unrealized patient autonomy and burdensome government regulation. S D Law Rev 49:447–502 16. Henke PK, Fewel M (2007) Surgical research and the new privacy laws. Bull Am Coll Surg 92:26–29 17. Dunlop AL, Graham T, Leroy Z et al (2007) The impact of HIPAA authorization on willingness to participate in clinical research. Ann Epidemiol 17:899–905 18. Harrison RW III (2001) Impact of biomedical research on African Americans. J Natl Med Assoc 93:6S–7S 19. Jackson HH, Jackson JD, Mulvihill SJ et al (2004) Trends in research support and productivity in the changing environment of academic surgery. J Surg Res 116:197–201 20. Avis FP, Ellenberg S, Friedman MA (1988) Surgical oncology research. A disappointing status report. Ann Surg 207: 262–266 21. Bland KI (2007) Concerning trends and outcomes for National Institutes of Health funding of cancer research. J Surg Oncol 95:161–166 22. Mello MM, Clarridge BR, Studdert DM (2005) Academic medical centers’ standards for clinical-trial agreements with industry. N Engl J Med 352:2202–2210 23. Kaiser J (2007) Conflict of interest. Stung by controversy, biomedical groups urge consistent guidelines. Science 317:441 24. Marescaux J, Leroy J, Rubino F et al (2002) Transcontinental robot-assisted remote telesurgery: feasibility and potential applications. Ann Surg 235:487–492
62 General Surgery: Current Trends and Recent Innovations
791
25. Pasticier G, Rietbergen JB, Guillonneau B et al (2001) Robotically assisted laparoscopic radical prostatectomy: feasibility study in men. Eur Urol 40:70–74 26. Patel VR, Chammas MF Jr, Shah S (2007) Robotic assisted laparoscopic radical prostatectomy: a review of the current state of affairs. Int J Clin Pract 61:309–314 27. Menon M, Kaul S, Bhandari A et al (2005) Potency following robotic radical prostatectomy: a questionnaire based analysis of outcomes after conventional nerve sparing and prostatic fascia sparing techniques. J Urol 174:2291–2296; discussion 2296 28. Talamini MA, Chapman S, Horgan S et al (2003) A prospective analysis of 211 robotic-assisted surgical procedures. Surg Endosc 17:1521–1524 29. Melvin WS, Dundon JM, Talamini M et al (2005) Computerenhanced robotic telesurgery minimizes esophageal perforation during Heller myotomy. Surgery 138:553–558; discussion 558–559 30. Galvani C, Gorodner MV, Moser F et al (2006) Laparoscopic Heller myotomy for achalasia facilitated by robotic assistance. Surg Endosc 20:1105–1112 31. Moser F, Horgan S (2004) Robotically assisted bariatric surgery. Am J Surg 188:38S–44S 32. Yu SC, Clapp BL, Lee MJ et al (2006) Robotic assistance provides excellent outcomes during the learning curve for laparoscopic Roux-en-Y gastric bypass: results from 100 robotic-assisted gastric bypasses. Am J Surg 192:746–749 33. Mohr CJ, Nadzam GS, Curet MJ (2005) Totally robotic Roux-en-Y gastric bypass. Arch Surg 140:779–786 34. Kendoff D, Pearle A, Hufner T et al (2007) First clinical results and consequences of intraoperative three-dimensional imaging at tibial plateau fractures. J Trauma 63:239–244 35. Nathan CO, Chakradeo V, Malhotra K et al (2006) The voice-controlled robotic assist scope holder AESOP for the endoscopic approach to the sella. Skull Base 16:123–131 36. Nebot PB, Jain Y, Haylett K et al (2003) Comparison of task performance of the camera-holder robots EndoAssist and Aesop. Surg Laparosc Endosc Percutan Tech 13:334–338 37. Husted TL, Broderick TJ (2006) NASA and the emergence of new surgical technologies. J Surg Res 132:13–16 38. Rentschler ME, Dumpert J, Platt SR et al (2007) Natural orifice surgery with an endoluminal mobile robot. Surg Endosc 21:1212–1215 39. Kalloo AN, Singh VK, Jagannath SB et al (2004) Flexible transgastric peritoneoscopy: a novel approach to diagnostic and therapeutic interventions in the peritoneal cavity. Gastrointest Endosc 60:114–117 40. Rattner D, Kalloo A (2006) ASGE/SAGES Working Group on Natural Orifice Translumenal Endoscopic Surgery. October 2005. Surg Endosc 20:329–333 41. Pai RD, Fong DG, Bundga ME et al (2006) Transcolonic endoscopic cholecystectomy: a NOTES survival study in a porcine model (with video). Gastrointest Endosc 64: 428–434 42. Park PO, Bergstrom M, Ikeda K et al (2005) Experimental studies of transgastric gallbladder surgery: cholecystectomy
and cholecystogastric anastomosis (videos). Gastrointest Endosc 61:601–606 43. Jagannath SB, Kantsevoy SV, Vaughn CA et al (2005) Peroral transgastric endoscopic ligation of fallopian tubes with long-term survival in a porcine model. Gastrointest Endosc 61:449–453 44. Bergstrom M, Ikeda K, Swain P et al (2006) Transgastric anastomosis by using flexible endoscopy in a porcine model (with video). Gastrointest Endosc 63:307–312 45. Swain P (2007) The ShapeLock system adapted to intragastric and transgastric surgery. Endoscopy 39:466–470 46. Schweitzer M (2004) Endoscopic intraluminal suture plication of the gastric pouch and stoma in postoperative Rouxen-Y gastric bypass patients. J Laparoendosc Adv Surg Tech A 14:223–226 47. van Noordwijk J (2001) Dialysing for life: the development of the artificial kidney. Kluwer, Boston 48. Wolff B, Machill K, Schumacher D et al (2007) MARS dialysis in decompensated alcoholic liver disease: a single-center experience. Liver Transpl 13:1189–1192 49. Kjaergard LL, Liu J, Als-Nielsen B et al (2003) Artificial and bioartificial support systems for acute and acute-onchronic liver failure: a systematic review. JAMA 289: 217–222 50. Sen S, Williams R, Jalan R (2005) Emerging indications for albumin dialysis. Am J Gastroenterol 100:468–475 51. Gray NA Jr, Selzman CH (2006) Current status of the total artificial heart. Am Heart J 152:4–10 52. Leprince P, Bonnet N, Rama A et al (2003) Bridge to transplantation with the Jarvik-7 (CardioWest) total artificial heart: a single-center 15-year experience. J Heart Lung Transplant 22:1296–1303 53. Jordan SW, Chaikof EL (2007) Novel thromboresistant materials. J Vasc Surg 45(Suppl A):A104–A115 54. Cobb WS, Kercher KW, Heniford BT (2005) The argument for lightweight polypropylene mesh in hernia repair. Surg Innov 12:63–69 55. Ren CJ, Fielding GA (2003) Laparoscopic adjustable gastric banding [Lap-Band]. Curr Surg 60:30–33 56. McNatt SS, Longhi JJ, Goldman CD et al (2007) Surgery for obesity: a review of the current state of the art and future directions. J Gastrointest Surg 11:377–397 57. Steiner S, Dyer N, Everett J et al (2005) Will we merge with machines? Available at: http://www.popsci.com/scitech/ article/2005–08/will-we-merge-machines 58. Ahlberg G, Heikkinen T, Iselius L et al (2002) Does training in a virtual reality simulator improve surgical performance? Surg Endosc 16:126–129 59. Hanly EJ, Miller BE, Kumar R et al (2006) Mentoring console improves collaboration and teaching in surgical robotics. J Laparoendosc Adv Surg Tech A 16:445–451 60. Cubano M, Poulose BK, Talamini MA et al (1999) Long distance telementoring. A novel tool for laparoscopy aboard the USS Abraham Lincoln. Surg Endosc 13:673–678
Upper Gastrointestinal Surgery: Current Trends and Recent Innovations
63
Danny Yakoub, Oliver Priest, Akram R. George, and George B. Hanna
Contents
63.7
63.1
Introduction ............................................................ 794
63.7.1 Surgical Skills Training ............................................ 809 63.7.2 Hospital Volume/Outcome Relationship .................. 809
63.2
Innovation Within the Specialty ........................... 794
63.8
Future Directions in Research and Management Within Specialty ..................................................... 809
63.8.1 63.8.2 63.8.3 63.8.4
Future Directions in Drug Therapy .......................... Staging of Gastric and Esophageal Cancer .............. Prediction of Survival and Guidance of Treatment .. Global molecular profiling .......................................
63.2.1 Robotics in Upper Gastro-Intestinal Surgery ........... 63.2.2 Natural Orifice Transluminal Endoscopic Surgery (NOTES) ..................................................... 63.2.3 Endoluminal Therapy ............................................... 63.2.4 Endoscopic Mucosal Resection (EMR) ................... 63.2.5 Photodynamic Therapy ............................................ 63.3
794 795 796 797 797
Recent Advances in Choice of Management Within Specialty ..................................................... 802
63.4.1 Gastro-Esophageal Reflux Disease (GERD) and Achalasia ........................................................... 802 63.4.2 Barrett’s Esophagus and Esophageal Cancer ........... 804 63.5
Molecular and Biological Developments Within Specialty ..................................................... 805
63.5.1 Progression of Barrett’s Disease and Adenocarcinoma of the Esophagus ................... 63.5.2 ASPECT Study......................................................... 63.5.3 Tumor Markers in Esophageal Cancer Staging ........ 63.5.4 Diagnostic Laparoscopy/Thoracoscopy ................... 63.5.5 Detection of Micrometastases in Bone Marrow ....... 63.6
809 810 810 810
References ........................................................................... 810
Techniques Within Specialty ................................. 797
63.3.1 Obesity Surgery Techniques..................................... 797 63.3.2 Debates ..................................................................... 801 63.4
Training Within Specialty ..................................... 809
805 806 807 807 807
Imaging and Diagnostics........................................ 807
63.6.1 Upper Gastrointestinal Barium Examination (UGI)................................................... 807 63.6.2 Staging Modalities and Prognostic Indicators.......... 808
D. Yakoub () Department of Surgery, Staten Island University Hospital, 475 Seaview Avenue, Staten Island, New York, NY 10305, USA e-mail: [email protected]
Abstract Upper gastrointestinal surgery has witnessed an enormous leap in the last decade, largely related to introduction of new technologies; with the advent of NOTES and the advancement of robotic surgery and various endoscopic techniques, many of the now standard laparoscopic procedures are now being attempted through these approaches. It seems as this is a turnover of a new generation of surgical techniques which may once more change the face of surgical practice and values, just as laparoscopy did in the last two decades. Another focus of interest is Barrett’s esophagus, which has been one of the most studied upper gastrointestinal diseases in recent research studies owing to its identified risk of cancer development; making it one of few possible targets of cancer prevention. Recently, the massive increase in rate of performance of bariatric surgeries has put these techniques in the focus of comprehensive research in terms of technical perfection as well as metabolic and physiological consequences. As for other benign and malignant esophageal and gastric conditions, various studies have addressed both the surgical technical, and more importantly, the biological question of etiology, diagnosis and follow up. In spite of the achievements in the field, it is still open for
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_63, © Springer-Verlag Berlin Heidelberg 2010
793
794
vast future progress whether on the basic science frontier or on the clinical training and practice grounds. Upper Gastrointestinal surgery has been an area of prolific development in recent decades. The introduction of obesity and metabolic operations has seen a dramatic expansion in view of the worldwide pandemic of obesity. Novel treatment therapies include the introduction of Natural Orifice Transluminal Endoscopic Surgery (NOTES), Endoluminal therapy, Endoscopic Mucosal Resection (EMR) and Photodynamic Therapy. There has been increased clarification of the mechanisms of Gastro-Esophageal Reflux Disease, Achalasia, Barrett’s Esophagus and Esophageal Cancer. This includes disease progression and the use of tumour markers, molecular profiling, detection of micrometastases and diagnostic laparoscopic/thoracoscopic techniques. Other developments include an increased appreciation of surgical skills training and volume-outcome relationships.
63.1 Introduction Disorders of the esophagus and stomach span a wide spectrum of diseases including gastroesophageal reflux disease (GERD), motility disorders, and cancer. The management of these disorders varies considerably from medical treatment in uncomplicated GERD to gastrectomy and esophagectomy in gastroesophageal cancer. The current era is a critical time for esophageal surgeons for several reasons. The incidence of esophageal carcinoma has increased dramatically over the past three decades, and the number of patients with GERD seeking alternatives to medical therapy is increasing. The introduction of new technology should not change the paradigm of who treats esophageal disorders. Surgeons must practice disease-based therapy and learn and add to their repertoire a wide range of emerging minimally invasive and endoscopic therapies.
63.2 Innovation Within the Specialty 63.2.1 Robotics in Upper Gastro-Intestinal Surgery Robotic surgery has generated much interest, especially since the da Vinci Surgical System (Intuitive Surgical
D. Yakoub et al. Table 63.1 Perceived advantages and disadvantages of TALC by a team of trainees comparing it to CLC Disadvantage Percentage Lack of tactile feedback
25
Prolonged setup time
25
Prolonged procedure time
17
Difficult patient access in an emergency
17
Added cost
8
Larger operating room to accommodate the robotic unit
8
Cumbersome equipment
8
Advantage
Percentage
Easier tissue dissection
60
Enhanced dexterity
60
Tele-education
50
Technological integration
40
Tele-presence
40
Tele-mentoring
40
Stereoscopic vision
40
Surgical training tool
30
Increased surgeon comfort
30
Inc, Sunnyvale, CA) was granted United States Food and Drug Administration approval in July 2000. The feasibility of telerobotically assisted surgery has been demonstrated for a variety of procedures in upper gastrointestinal surgery during the last decade, including cardiomyotomy, fundoplication, and esophagectomy [1].
63.2.1.1 Telerobotically Assisted Laparoscopic Cholecystectomy (TALC) The potential advantages and disadvantages of TALC over conventional laparoscopic cholecystectomy (CLC) are reported in a recent study (Table 63.1). Although an institutional learning curve with respect to operative time was observed (Fig. 63.1), the mean operative time for the final five TALCs was not significantly different from that associated with CLC performed by participating surgeons in study institutions. The cost associated with the instrumentation used in TALC was $16,400 per case, whereas the cost of CLC was $3,857
Upper Gastrointestinal Surgery: Current Trends and Recent Innovations
Fig. 63.1 Institutional learning curve. TALC telerobotic-assisted laparoscopic cholecystectomy; Conventional LC conventional laparoscopic cholecystectomy. * – P < 0.05 vs. conventional LC; # – P < 0.05 vs. TALC 1–5
∗
200
795
∗
175
∗
150
Time (min)
63
#
125 100 75 50 25 0 TALC 1-5
Fig. 63.2 Example endoscopic tools for NOTES(Karl Storz Inc., Tuttlingen, Germany) showing the multichannel endoscope head, with design allowing increased maneuverability of the tool tips in spite of the limited space
per case. Reusable laparoscopic instruments were used in all conventional laparoscopic cases [2]. 63.2.1.2 Robot-Assisted Laparoscopic Fundoplication (RALF) Antireflux surgery may benefit from robotic assistance because it requires fine motions in the narrow subphrenic space. Total operative time is reported to be significantly shorter for RALF surgery compared with conventional laparoscopic fundoplication (CLF), with a trend toward longer setup time but a significantly shorter effective operating time. RALF is suggested to be less ergonomically stressful than CLF, indicated by a decreased skin conductance and lower thumb electromyographic signals. Further investigations are needed to improve ergonomics in minimal access surgery to minimize musculoskeletal impairments experienced during a surgeon’s professional life [3].
63.2.1.3 Robotic Esophagectomy A limited number of case reports and small case series have demonstrated the feasibility of robotic-assisted esophagectomy, with the first case reported in 2003.
TALC 6-10
TALC 11-15
TALC 16-20
Conventional LC
The authors concluded that the procedure was best restricted to patients with early-stage disease [4]. Questions remain about the feasibility of oncologically appropriate lymphadenectomy in minimal access surgery. Long-term follow-up data of survival rates, local control, and functional outcomes are lacking. For telerobotic surgical technology to gain widespread acceptance, a clear benefit relative to other approaches will need to be demonstrated, with particular emphasis on the cost–benefit profile.
63.2.2 Natural Orifice Transluminal Endoscopic Surgery (NOTES) The potential advantages of flexible endoscopy and laparoscopic surgery include lack of surgical scars, less anesthesia, and faster recovery. It is not clear what methods, tools, and training are needed to make this development safe and effective for patients. NOTES is not a single method but the combination of concepts, tools, and training that allow access to internal cavities via natural orifices. Transgastric abdominal exploration and liver biopsy was first reported by Kalloo et al. in 2004 [5]. A major limitation of transgastric access to the peritoneal cavity is the additional challenge of operating with the scope retroflexed when working on upper abdominal organs. In an effort to avoid this, a per-anal transcolonic approach has been tried, with successful cholecystectomy in a survival porcine model [6]. The first NOTES procedure in a human was a transgastric appendectomy performed by Rao and Reddy in India in 2005 [7]. Palanivelu et al. recently reported their experience with transvaginal endoscopic appendicectomy. Two patients underwent transvaginal endoscopic appendicectomy
796
Fig. 63.3 Transcolonic view of the liver and gall bladder using a 2-channel therapeutic gastroscope. Biopsy forceps are shown providing traction on the liver and countertraction on the gall bladder
with laparoscopic assistance through a 3 mm umbilical port [8]. Bernhardt et al. subsequently published a report of a transvaginal appendicectomy for histologically confirmed appendicitis without using a hybrid approach [9]. If natural orifice surgery is to progress to larger organ surgery, instruments need to be flexible, steerable, lockable, and torqueable, with more aggressive graspers required (see Fig. 63.2). Improvements to lighting and stable optics are also needed, and crucially, a reliable form of tissue approximation for gastrotomy/colotomy defect closure and ultimately for vascular control and anastomosis. To expand NOTES capabilities to undertake larger organ surgeries, several trials abroad as well within the United States and elsewhere, several trials are underway to investigate combined hybrid laparoscopic NOTES procedures in humans, which include cholecystectomy and pyloroplasty (see Fig. 63.3). These procedures are being developed utilizing a multidisciplinary team often including both therapeutic gastroenterologists and minimally invasive surgeons [10]. In the clinical setting, this should allow for improved patient safety with the ready availability of laparoscopic or open surgery.
63.2.3 Endoluminal Therapy Recent interest in the development in endoluminal therapy has led to new alternatives in the treatment of GERD for patients with little or no significant hiatal hernia. Radiofrequency energy delivered to the lower esophageal sphincter (LES) theoretically adds bulk to the LES and changes the sphincter’s compliance.
D. Yakoub et al.
Multiple prospective nonrandomized studies with short-term results up to 12 months have shown promising results. Triadafilopoulos et al. showed improved quality of life scores, decreased esophageal acid exposure, decreased median DeMeester scores (40.0–26.4, P < 0.009) and 70% of patients off proton pump inhibitor (PPI) at 12 months for a group of 118 patients [11, 12]. Additional data on long-term follow up, including the incidence and natural history of esophagitis in these patients are required to establish the utility of this procedure. A second endoscopic therapy for GERD involves the creation of a mechanical barrier at the gastroesophageal junction (GEJ). The Gatekeeper Reflux Repair System (Medtronic, Tolochenaz, Switzerland) entails the submucosal placement of a poly-acrylonitrilebased hydrogel prosthesis above the GEJ. In a prospective, nonrandomized study of 68 patients with a 6 months follow up, quality of life was improved, median LES pressure (8.8–13.8 mmHg, P < 0.01) was increased and 70.4% of the prostheses were still in place at 6 months [13, 14]. Concerns remain about the durability of the implants, leaving the long-term utility of the procedure in question. Direct endoscopic tightening of the LES is also possible by either suturing or plication. Currently, three endoscopic suturing devices are available: the EndoCinch (BARD Endoscopic Technologies, Billerica, MA), the endoscopic mucosal dissection (ESD) (Wilson-Cook Medical, Winston- Salem, NC) and the Full-Thickness Plicator (NDO Surgical Inc., Mansfield, MA). The major difference is that the NDO Plicator takes a full-thickness suture of gastric fundus whereas the others utilize a partial mucosal or submucosal stitch. Partial-thickness sutures have a greater potential to pull through the tissue or migrate over time. The NDO Plicator system appears to have the technical advantages of a single full-thickness stitch, with serosa to serosa approximation leading to direct tightening of the LES, possible lengthening of the LES and favorable altering of the angle of His. Pleskow et al. showed in a nonrandomized, prospective study of 64 patients with follow-up of 12 months that mental and physical SF-36 scores were significantly improved. Importantly, 80% of patients had improved distal esophageal acid exposure as evidenced by significantly decreased DeMeester scores (44–30, P < 0.001) [15– 17]. Moreover, the majority of patients (68%) were off PPI at 12 month follow-up. The NDO Plicator as well
63
Upper Gastrointestinal Surgery: Current Trends and Recent Innovations
as the other endoluminal therapies need to be evaluated very carefully in randomized studies, and may serve as a “bridging therapy” between noninvasive GERD therapy and existing laparoscopic antirefux surgery [18].
63.2.4 Endoscopic Mucosal Resection (EMR) EMR is a technique used to locally excise lesions confined to the mucosa. Its main role in therapeutic endoscopy is the treatment of advanced dysplasia and early gastrointestinal (GI) cancers. As newer techniques of EMR such as circumferential mucosectomy are developed, the potential of reducing recurrence rates exists. Several EMR techniques have been described in the literature: (1) stripoff biopsy; (2) “inject, lift, and cut” method; (3) the “cup and suction” or EMR-cap technique; and lastly, (4) EMR with band ligation [19, 20]. ESD is one of the more recently described techniques, developed to perform single en-block resections of large mucosal lesions using an electrocautery knife.
63.2.5 Photodynamic Therapy Photodynamic therapy (PDT) has been used alone or in combination with neodymium yttrium aluminium garnet (Nd:YAG) laser therapy. PDT has been used to successfully ablate both Barrett’s and foci of adenocarcinoma in some patients [21–23]. However, several problems limit the utility of PDT in this setting. Firstly, the depth of injury with PDT is variable and imprecise, and because PDT destroys the tissue, there is no specimen and therefore no means to examine cancer eradication. Lymph node metastases cannot be treated by PDT. Furthermore, PDT rarely eradicates all the Barrett’s mucosa, and despite phenotypical downstaging, the underlying genetic abnormalities may persist, conferring a significant risk for recurrent or persistent dysplasia and adenocarcinoma. PDT can lead to the development of fibrotic esophageal strictures in >30% of patients. Finally, in a cost-analysis model, PDT was shown to be more expensive than esophagectomy. Consequently, there is little to support the routine use of PDT for patients with dysplastic Barrett’s or adenocarcinoma.
797
63.3 Techniques Within Specialty 63.3.1 Obesity Surgery Techniques Obesity has been dubbed a “global epidemic” by the World Health Organisation. Estimates suggest that more than 12 million adults and 1 million children in England will be obese by 2010 if no action is taken [24]. A report by the UK Government Chief Scientist estimates that by the year 2050, 60% of men and 40% of women will be clinically obese, at a cost of £45.5 billion per year [25]. Obese patients have an increased incidence of cardiovascular disorders including hypertension and Type II diabetes, with a variety of respiratory complications including obstructive sleep apnoea (OSA), symptoms of dyspnoea, obesity hypoventilation syndrome and bronchial asthma. Obesity is associated with an increased risk of gallstones, osteoarthritis and cancers of the breast and colon [26].
63.3.1.1 Obesity and All-Cause Mortality There is conflicting evidence regarding the indices of obesity and prediction of all-cause mortality. Price et al. published the findings from a cohort study in the United Kingdom of 14,833 patients aged 75 years or older, with a median follow-up of 5.9 years. No association was seen between baseline waist circumference measurement and, either all-cause or cardiovascular mortality. In nonsmokers (90% of the cohort), there was a negative association between body-mass index (BMI) and allcause mortality risk in both men (P = 0.0001) and women (P < 0.0001), even after adjustment for potential confounders. For the same cohort, waist-hip ratio (WHR) was positively but weakly associated with mortality risk in men (P = 0.579), whereas the relation was significantly positive in women (P < 0.0001). BMI was not associated with circulatory mortality in men (P = 0.667) and was negatively associated in women (P = 0.004). WHR was positively related to circulatory mortality in both men (P = 0.001) and women (P = 0.005) [27].
63.3.1.2 Surgical Treatment of Obesity Current guidelines from the National Institute for Clinical Excellence recommend bariatric surgery as a
798
treatment only for adults with a BMI greater than 40 kg/m2 or a BMI between 35 and 40 kg/m2, with coexisting significant disease such as type II diabetes, hypertension and OSA. Patients must also have attempted all appropriate medical treatment of their obesity for at least 6 months and failed to achieve or maintain adequate, clinically beneficial weight loss. The consensus view is that surgery should only be offered in the context of a multidisciplinary team including dieticians, psychologists, surgeons and allied health professionals that can provide special expertise in treating obese patients [28]. In the United States, the estimated number of patients discharged from inpatient facilities with a diagnosis code of morbid obesity and who underwent a bariatric procedure increased from approximately 5,000 per year in the early 1990s to more than 100,000 per year in 2005 [29]. Bariatric operations can be classified as either restrictive or malabsorptive, or a combination of the two approaches.
Restrictive Procedures Restrictive operations reduce the storage capacity of the stomach, creating a small reservoir with a narrow outlet to delay gastric emptying. This limits caloric intake by causing early satiety. Purely restrictive surgery does not alter the absorption of nutrients. Vertical Banded Gastroplasty During a vertical banded gastroplasty (VBG), originally described by Mason in 1982, the fundus of the stomach is stapled parallel to the lesser curve and the outlet of the created pouch is narrowed with a 5 cm band. A gastric reservoir of about 50 mL remains and the banding provides an exit diameter of 10–12 mm [30]. Levels of excess weight loss (EWL) documented in one trial exceeded 50 in 74% of patients [31], although in most studies with 3 to 5-year follow-up, EWL of at least 50% has been achieved in only 40% of patients [32]. Mortality is <1% in published series, with an overall morbidity of 14% [33]. Medium- and longterm complications include stomal stenosis (20%), staple line failures (11%), severe esophagitis (7%) and band erosions or migration (1.5%) [34]. Laparoscopic Adjustable Gastric Banding The laparoscopic adjustable gastric band (LAGB) was developed by Belachew in 1992 [35]. The technique
D. Yakoub et al.
involves placing a silicon inflatable gastric band horizontally around the proximal part of the stomach, with anterior gastric sutures to secure band position. By inflating the gastric band with saline injections via an implanted subcutaneous reservoir (Portacath), a 15–20 mL superior gastric pouch is created. Distension of the pouch causes a sensation of early satiety and reduces caloric intake. The band can be removed in the outpatient clinic without anesthesia [36]. LAGB is the safest bariatric surgical procedure, with a postoperative mortality of <0.5%, up to tenfold less than for gastric bypass [37–39]. In the treatment of super obese patients with a BMI >50 kg/m2, banding has successfully achieved at least 50% EWL with 90% resolution of comorbidities [40, 41]. Specific complications of the LAGB procedure include band slippage, leak, intolerance, infection and migration. Early reported slippage rates of 12–24% have subsequently improved with technique refinement to 1–2% [42, 43]. Erosion of the band is commonly caused by gastric microperforation and occurs in <2% of a large reported series [44]. Esophageal dilatation, Portacath migration and technical complications such as puncture of the band during placement of securing sutures may occur.
Sleeve Gastrectomy A vertical restrictive (sleeve) gastrectomy resects most of the gastric body, leaving a narrow stomach tube with reduced capacity. The procedure has an established role as part of a two-stage approach to bariatric surgery that is particularly appropriate for patients with a BMI >50 kg/m2 (“superobese”) [45]. An initial sleeve gastrectomy is performed followed by a planned malabsorptive operation after a certain proportion of weight has been lost. Also, patients treated with LAGB in whom the resulting weight loss is insufficient can have an additional sleeve gastrectomy. Several small series that were recently published report a 30–80% EWL after a sleeve resection, with a significant decrease in related comorbidities, albeit only with short-term follow-up [46, 47]. The operation can be performed quickly and safely with minimal adverse effects even in a cohort of patients with significant concomitant disease. No intestinal bypass is required, thereby avoiding the accompanying risks of anastomotic leakage, stricture and micronutrient deficiency. Furthermore, even if after having a sleeve gastrectomy patients do not achieve and maintain
63
Upper Gastrointestinal Surgery: Current Trends and Recent Innovations
beneficial weight loss, they can still go on to have a second malabsorptive procedure [48]. Endoluminal Procedures Endoscopic suturing devices have the ability to plicate the stomach and simulate a VBG via an entirely endoluminal approach. Experiments in animal models have been reported. Initial results from pilot studies in human subjects demonstrate the feasibility of the procedure with an acceptable safety profile, although data on long-term benefits and complications are lacking [49, 50]. Placement of an intragastric balloon has recently enjoyed a revival of interest as both a bridge procedure to reduce the operative risk of a bariatric procedure and as a permanent endoluminal treatment for morbid obesity, particularly for individuals who refuse surgery [51]. Limited data on superobese patients indicate that preoperative placement of an intragastric balloon can result in significant weight loss with a subsequent decrease in postoperative complications [52–54].
Malabsorptive Operations Malabsorptive procedures bypass varying portions of the small intestine, shortening its functional length to induce decreased absorption of nutrients. The created short-bowel syndrome results in a negative energy balance and weight loss. Biliopancreatic Bypass with or Without Duodenal Switch Procedure In its classic form, biliopancreatic diversion (BPD) consists of a horizontal distal gastrectomy with formation of a proximal 200 mL gastric pouch. After division of the mid-ileum, a Roux-en-Y gastroileostomy with endto-side ileo-ileostomy is constructed, creating a long Roux limb, an alimentary limb between 150– 200 cm and a 50–100 cm length common channel [55]. BPD is particularly suitable for patients with a BMI greater than 60 kg/m2 due to dramatic outcomes in terms of weight loss. Patients can lose up to 70% of their initial excess weight due to the combination of factors such as restriction from the gastrectomy, changes in eating patterns due to dumping syndrome and malabsorption from the substantial length of
799
ileum bypassed [56–58]. The restrictive component is often short lived, restoring appetite and eating capacity to baseline levels within 12 months. Longterm weight loss is achieved mainly through malabsorption [59]. BPD has a high incidence of postoperative stomal ulceration (8.3%), unpleasant postgastrectomy dumping syndrome and protein malnutrition. The operation is modified by using a vertical sleeve gastrectomy with preservation of the distal antrum, pylorus, and proximal duodenum [60, 61]. The Roux limb is constructed with a longer distal common channel and is joined end-to-end with the proximal cut end of the duodenum. It is generally accepted that by preserving the pyloric sphincter and increasing the distal common channel length, BPD with duodenal switch results in fewer cases of dumping and marginal ulcers than a classical BPD [62, 63]. Few data are published about limb length, but it is recommended that the common limb should measure at least 50 cm but less than 100 cm. In both forms of BPD procedures, the length of the common limb determines the degree of malabsorption, and the aim is to find the right balance between beneficial weight loss and acceptable protein malnutrition [64]. Laparoscopic BPD has been performed since 2001, and while the operating time is significantly longer, the reported benefits include fewer wound infections, shorter inpatient stay and a lower incidence of incisional hernia [65]. Published weight loss outcomes of the BPD operation include two large series reported by Scopinaro et al. whose impressive EWL rates were 74–78% in 1,356 patients and 73–78% in 2,000 patients [66, 67]. Weight loss was sustained at 15 years follow-up, with good resolution of related comorbities such as type II diabetes and hypertension, results that are supported by other series [61, 63]. Failure to achieve or maintain beneficial weight loss has been reported in up to 15% of patients. This necessitates revision surgery to add a greater restrictive component or modify length of the common limb [68, 69]. The choice of BPD is largely restricted to patients in whom other interventions have failed; as compared with alternatives, it has an increased operating time, prolonged hospital stay and a poorer side-effect profile. There are currently no randomized trials comparing BPD to other bariatric procedures.
800
D. Yakoub et al.
Fig. 63.4 (i) Biliopancreatic diversion (BPD). (ii) Duodenal Switch with sleeve gastrectomy. (iii) Roux-en-Y gastric bypass
i) BPD
ii) Switch
Roux-en-y Gastric Bypass Standard Roux-en-Y gastric bypass (RYGB) involves the formation of a 15–30 mL restrictive gastric pouch and creation of a gastrojejunostomy bypassing the distal stomach, duodenum and proximal jejunum to give an alimentary limb of at least 75 cm and a biliary limb of at least 50 cm. Antecolic and antegastric Roux-limb placement is the easiest way, with closure of mesentery defects required to avoid bowel obstruction. The construction of the gastrojejunostomy is another area of debate and is more likely a matter of surgeon preference. No randomized trials exist comparing the available techniques, including circular stapled, linear stapled, or completely handsewn anastomoses. The primary mechanism of weight loss is reduced calorie intake, as during a meal the gastric pouch quickly fills creating a sensation of early satiety. The
iii) RYGB
bypass causes some degree of malabsorption, determined by the length of the common limb. Because of its combination of persistent weight loss effectiveness and acceptable complication profile, the RYGB is considered to be the gold standard to which all other bariatric procedures are compared [64]. During the late 1990s, the RYGB became the most common procedure performed in the United States for obesity, today accounting for up to 80% of bariatric procedures [70]. The mortality of gastric bypass surgery is approximately 0.5% for both the open and laparoscopic approaches. The laparoscopic approach offers a reduction in critical care requirement, fewer postoperative pulmonary and thromboembolic complications and lower incidence of incisional hernia at the expense of significantly longer general anesthesia. The serious complication of anastomotic leakage occurs in 2–5% M0
Medically Fit
Laparoscopy
M1
Locoregional (M0) Medically Unfit
Laparoscopy
M1 Stage IV (M1)
Unresectable3 M0
Workup
Fig. 63.5 Clinical management of gastric cancer. Adapted from UK National Comprehensive Cancer Network Clinical Practice Guidelines in Oncology, 2005
Surgery
M0
Chemo/XRT1 Salvage Therapy2
63
Upper Gastrointestinal Surgery: Current Trends and Recent Innovations
801
Fig. 63.6 Lymph node groups draining the esophagus and stomach
of open cases, but a higher rate occurs after laparoscopic RYGB, while studies have also shown that internal herniation and small bowel obstruction are more prevalent after laparoscopic bypass [71–73]. Open and laparoscopic gastric bypass can achieve both significant weight loss and resolution of comorbid conditions. It produces excellent initial weight reduction, with a mean EWL of nearly 70% at 1 year. A number of case series have shown that after 3 years, 60–70% of patients can achieve >50% weight loss [74, 75]. Longterm results are good, with an average EWL of 60% at 5 years, decreasing to around 50% at 8–10 years [76]. If RYGB fails, then revisional surgery is complex and carries significant risk. A summary of the malabsorptive obesity surgery procedures is shown in Fig. 63.4.
63.3.2 Debates 63.3.2.1 Extent of Lymphadenectomy Needed for Cancer Clearance In cancer of the esophagus and GEJ, controversy exists over which type of operation to perform. An overview of the clinical approach to gastric cancer is presented
in Fig. 63.5. Some authors argue that the presence of lymph node involvement equals systemic disease; that in the face of such a poor prognostic factor survival will remain unchanged whatever the extent of resection and that the systematic removal of lymph nodes is of no benefit. Others believe that the natural course of the disease can be influenced positively by more aggressive surgery even in patients with positive lymph nodes. Mainly, under the Japanese influence, much attention has been paid to a meticulous extensive lymphadenectomy in the management of esophageal cancer [77]. Unfortunately, in many studies, the extent of esophagectomy and type of lymphadenectomy are poorly defined. An agreement on the anatomical classification of the different extents of lymphatic dissection for cancer of the esophagus has been reached only recently. More extensive surgery certainly improves the quality of the TNM staging system as prognostic index as it provides a better assessment of true survival in lymph node negative patients as compared with the more standard resection “contaminated” by false node negative patients. The lymph node groups draining the esophagus and stomach are displayed in Fig. 63.6. Published data indicate that R0 resection combined with extensive lymphadenectomy results in improved (loco-regional) disease free survival as well as
802
improved 5-year survival, although most data are coming from nonrandomized studies. Multimodality treatment regimens have to be adapted according to the highest quality standards and compared with the results of the best primary surgical therapy, including assessment of quality of life. Recent data analysis indicates that the number of involved lymph nodes removed at surgical resection is an independent predictor of 5-year survival after esophagectomy for cancer [78].
63.3.2.2 Minimal Access Esophagectomy In the late 1990s, several centers began exploring the potential for a minimally invasive esophagectomy. Techniques have now been developed for both a laparoscopic and a combined thoracoscopic/laparoscopic esophagectomy. Disadvantages of a completely laparoscopic approach include the inherent dangers of dissection near the pulmonary vessels high in the mediastinum and the inability to accomplish a systematic thoracic lymphadenectomy with this approach. However, the vagal-sparing procedure is ideally suited to a laparoscopic approach because the esophagus is stripped out of the mediastinum without any dissection, and no lymphadenectomy is necessary in these patients with only high-grade dysplasia (HGD) or intramucosal cancer. For patients with more advanced cancer, the combined thoracoscopic/laparoscopic approach offers the advantage of a thoracic lymphadenectomy and has been proven safe and effective in a large series of patients [79, 80]. Whether minimally invasive esophagectomy offers clear advantages in hospital stay and recovery, with equivalent oncological outcomes to open procedures is yet to be determined.
63.4 Recent Advances in Choice of Management Within Specialty 63.4.1 Gastro-Esophageal Reflux Disease (GERD) and Achalasia The immense success of laparoscopic surgery as an effective treatment of GERD and achalasia has established minimal invasive surgery as the gold standard in the surgical treatment of these two conditions [81, 82]. Compared to open surgery, laparoscopic procedures result in lower
D. Yakoub et al.
morbidity and mortality, shorter hospital stay, faster convalescence and less postoperative pain.
63.4.1.1 GERD Medication vs. Operation Antireflux surgery has been shown to be very effective at alleviating symptoms in 88–95% of patients, with excellent patient satisfaction, in both short- and longterm studies. In addition to symptomatic improvement, the long-term effectiveness of laparoscopic antireflux surgery (LARS) has been objectively confirmed with 24-h pH monitoring. Overall, LARS is safe and has a similar efficacy to open antireflux surgery and best medical therapy with PPIs. The failure rate in some series is greater than 50% at 5 years. Due to the cost of a proportion of patients still taking antireflux medications, it cannot be recommended on cost-effectiveness grounds over best medical therapy. There is a tendency for antireflux surgery to be superior to medical therapy for cancer prevention in Barrett’s esophagus, but this has not reached statistical significance [83–85]. Surgical therapy should be considered for patients with Barrett’s esophagus, especially those that are young or symptomatic.
Total vs. Partial Fundoplication Further insight into esophageal motility disorders and concerns of postoperative dysphagia has prompted the discussion whether the total or partial fundoplication is more appropriate treatment for GERD. Total wrap supporters acknowledge the fact that the wrap needs to be “floppy” to minimize postoperative dysphagia. Two partial fundoplications are in common practice; they are the Dor (anterior) and Toupet (posterior) fundoplications. Of these two, the Toupet is the most commonly performed. Fibbe et al. compared laparoscopic partial and total fundoplications in 200 patients with esophageal motility disorders and found no difference in postoperative recurrence of reflux [86]. Similarly, Laws et al. found no clear advantage of one wrap over the other in their prospective, randomized study comparing these two groups [87]. Moreover, in a meta-analysis of 11 prospective randomized trials including open and laparoscopic total vs. partial fundoplications in 991 patients, no statistically significant differences
63
Upper Gastrointestinal Surgery: Current Trends and Recent Innovations
were found in recurrence of reflux or long-term outcomes, although a higher incidence of postoperative dysphagia occurred after total fundoplication [88].
63.4.1.2 Achalasia Current controversies in the treatment of achalasia include the allocation of medical treatment, pneumatic dilation or surgical treatment to patients with achalasia. All of these therapies are designed to alleviate the symptoms of achalasia by addressing the failure of the LES to relax upon swallowing. However, none of these modalities address the underlying aperistalsis of the esophagus.
803
Pneumatic Dilation Under direct endoscopic visualization, a balloon is placed across the LES and is inflated up to 300 mmHg (10–12 psi) for 1–3 min. The goal is to produce a controlled tear of the LES muscle to render it incompetent, thereby relieving the obstruction. After pneumatic dilation, good to excellent response rates can be achieved in 60% of patients at 1 year. Long-term results are less satisfactory: at 5 years only 40% note symptom relief and at 10 years 36% of patients have relief of dysphagia. Repeat dilatations are technically possible but show a decreased success rate. The rate of esophageal perforations during pneumatic dilation is approximately 2%. The current recommendations regarding pneumatic dilation prior to (or instead of) surgical therapy depends upon the referring physician’s opinions, the surgeon’s expertise and the patient’s preference.
Pharmacotherapy Calcium channel blockers and nitrates have been widely used, with the sublingual application being preferred. Both medications effectively reduce the baseline LES tone, but they neither improve LES relaxation nor augment esophageal peristalsis. The current recommendations for pharmacotherapy are limited to patients who are early in their disease process without significant esophageal dilatation, patients who are high risk for more invasive modalities and patients who decline other treatment options.
Botulinum Toxin Direct injection into the LES with Botulinum toxin A (Botox, Allergan, Irvine, California) irreversibly inhibits acetylcholine release from cholinergic nerves. This results in decreased sphincter activity and a temporary amelioration of achalasia-related symptoms. Following Botox injection, about 60–85% patients suffering from achalasia will note an improvement in their symptoms. While repeated injections are technically possible, reduced efficacy is seen with each subsequent injection. Repeated esophageal instrumentation with Botox treatment inevitably causes an inflammatory reaction at the GEJ. The resulting fibrosis may make subsequent surgical treatment more difficult and increase the complication rate. There has been recent evidence that despite Botox therapy, laparoscopic Heller myotomy is as successful as myotomy in Botox-naïve patients.
Surgical Therapy Heller first described the surgical destruction of the gastroesophageal sphincter as therapy for achalasia in 1913. His original technique used two parallel myotomies that extend for at least 8 cm on the distal esophagus and proximal stomach. Since the inception of minimally invasive techniques, both the open transabdominal repair and the thoracic approach have fallen out of favor. At this time, the laparoscopic approach is considered superior to the above-mentioned surgical therapies. Laparoscopic Heller myotomy is noted to have a very low morbidity and mortality rate, less postoperative pain, improved cosmesis and a faster recovery compared to more invasive surgical therapies. Most importantly, excellent symptom relief is noted in 90% of patients. The current indications for laparoscopic surgical treatment of achalasia include patients less than 40 years of age, patients with persistent or recurrent dysphagia after unsuccessful Botox injection or pneumatic dilation, and patients who are high risk for perforation from dilation including those with esophageal diverticula, or distorted lower esophageal anatomy. Currently, there are two controversies concerning achalasia discussed within the surgical community: the extent to which the myotomy is extended onto the stomach and whether to add an antireflux procedure. The proximal extent of the myotomy is typically carried 5–6 cm proximal to the LES. The distal extent of the b
804
D. Yakoub et al.
Table 63.2 Selected trials comparing neoadjuvant chemotherapy with or without radiation and surgery vs. surgery alone in patients with localized esophageal cancer Author, (Year) Study Number Chemotherapy Total Median Survival (%) of radiotherapy survival patients dose (cGy) (months) Kelsen (1998)
MRC Trial (2002)
Cunningham (2006)
Surgery alone vs. pre-op chemotherapy with surgery
Surgery alone vs. pre-op chemotherapy with surgery
Surgery alone vs. pre-op chemotherapy, surgery, and postoperative chemotherapy
440
802
503
Cisplatin, fluorouracil
Cisplatin, fluorouracil
Epirubicin, Cisplatin, fluorouracil
–
–
–
16.1
37
14.9
35 (2-year survival)
13.3
34
16.8*
43*(2-year survival)
NA
23
36* (5-year survival) Urba (2001)
Walsh (1996)
Burmeister (2005)
Surgery alone vs. pre-op chemotherapy and radiotherapy with surgery
Surgery alone vs. pre-op chemotherapy and radiotherapy with surgery
Surgery alone vs. pre-op chemotherapy and radiotherapy with surgery
100
113
256
Cisplatin, vinblastine, fluorouracil
Cisplatin, fluorouracil
Cisplatin, fluorouracil
4,500
4,000
3,500
17.6
16
16.9
30 (3-year survival)
11
6*
16
32 (3-year survival)
22.2
NA
19.3
NA
MRC Medical Research Council; NA not available * P < 0.05
myotomy to either 1–2 cm or to at least 3 cm beyond the LES is debated. Oelschlager et al. found less dysphagia and subsequently fewer interventions to treat the recurrent dysphagia in favor of the distal myotomy group (3 vs. 17%) [89]. The consensus opinion is that a surgical myotomy predisposes patients to postoperative GERD, but while many surgeons add an antireflux procedure to address this potential problem, others report an adequate cardiomyotomy without postoperative reflux. Performing a fundoplication prolongs the operative time, and may cause problems with dysphagia in the setting of an aperistaltic esophagus. Proponents supporting the addition of an antireflux procedure mostly favor a partial (anterior, Dor vs. Posterior, Toupet) over a total (Nissen) fundoplication to avoid a functional obstruction from the
fundoplication and the persistence of dysphagia. There is a lack of prospective randomized studies comparing anterior vs. posterior procedures to demonstrate any clear advantages of one partial fundoplication over the other.
63.4.2 Barrett’s Esophagus and Esophageal Cancer 63.4.2.1 Combined Modality Adjuvant Treatment Randomized studies have demonstrated that preoperative chemotherapy improves survival in esophagogastric
63
Upper Gastrointestinal Surgery: Current Trends and Recent Innovations
adenocarcinoma [90]. Various comparative studies have attempted to seek evidence-based decisions on the most effective combination of different therapeutic modalities in treatment of esophageal cancer. Selected trials comparing those modalities are displayed in Table 63.2.
63.4.2.2 Palliative Management of Esophageal Cancer Esophageal stents have been successfully used to relieve dysphagia. A recent meta-analysis showed selfexpanding metal stents to be superior to plastic stents in terms of related mortality, morbidity and quality of palliation [91]. Uncovered stents are disadvantaged by a high rate of tumor in-growth; further adequately designed randomized controlled trials are required to examine outcomes and cost effectiveness of covered vs. uncovered metal stents.
63.4.2.3 Targeted Therapy Novel molecularly targeted anticancer agents include antiproliferative agents, apoptosis-inducing agents, antiangiogenic and anti-invasive agents. Although relatively few targets are being investigated or pursued clinically in esophageal cancer, there is a good rationale to do so based on the current knowledge of molecular abnormalities in this disease. Among the candidates for targeted therapy in esophageal cancer are oncogenes, such as EGFR, c-erbB2 and cyclin D1, as well as tumor suppressor genes, such as Rb, p53 and p16. An ongoing phase II study is investigating the EGFR tyrosine kinase inhibitor gefitinib in second line therapy of esophageal cancer, with a disease control rate of 37% in 27 patients recruited, and mild adverse effects reported [92].
63.4.2.4 Gene Therapy A gene-based approach investigated in esophageal cancer involves the agent TNFerade, a second-generation replication-defective adenoviral vector carrying the transgene encoding human tissue necrosis factor (TNF) and a radiation-inducible promoter. Preclinical experiments have demonstrated that TNFerade gene therapy plus radiation can induce tumor regression in models of esophageal cancer and other solid tumors.
805
A completed phase I study has shown the absence of dose-limiting toxicities of TNFerade plus radiation, no drug-related serious adverse events and a remarkably high tumor response rate in patients with advanced solid tumors. The agent is currently being investigated in a phase II study in patients with advanced esophageal cancer in combination with conventional chemotherapy and radiation [93].
63.5 Molecular and Biological Developments Within Specialty 63.5.1 Progression of Barrett’s Disease and Adenocarcinoma of the Esophagus The majority of Barrett’s patients will not develop cancer, so specific methods to identify high-risk groups are required. The risk of cancer development in Barrett esophagus has been confirmed to be related to risk factors for GERD. These risk factors, however, are often asymptomatic and therefore not helpful in the individualization of surveillance intervals. Recent molecular studies have identified a selection of candidate biomarkers that need validation in prospective studies. They reflect various changes in cell behavior during neoplastic progression. Barrett’s epithelium is characterized by the presence of goblet cells, and expression of intestinal markers such as MUC2, alkaline phosphatase, villin and sucrase isomaltase. Transcription factors such as CDX1 and CDX2 that play an important role in the development of intestinal epithelium in utero may also be important in the development of metaplastic epithelium in the esophagus. The presence of CDX2 protein and mRNA has been shown in esophageal adenocarcinoma, in Barrett’s metaplasia cells of the intestinal type with goblet cell specific Muc2 mRNA and also in the squamous epithelium of a proportion of patients with Barrett’s metaplasia. Molecular changes in the metaplasia–dysplasia–adenocarcinoma sequence are driven by genomic instability and evolution of clones of cells with accumulated genetic errors that carry selection advantage and allow successive clonal expansion. Some chromosomal aberrations, including aneuploidy and loss of heterozygosity (LOH), genetic
806
mutations and epigenetic abnormalities of tumor suppressor genes have been identified [94–98]. Cell cycle regulatory genes known to be implicated in esophageal adenocarcinoma development include p16, p53 and cyclin D1. p16 lesions are very common in Barrett’s metaplasia and include LOH, mutation and methylation of the promoter. Histochemically assessed cyclin D1 overexpression has been documented in Barrett’s esophagus and adenocarcinoma. One prospective study has shown that Barrett’s metaplasia patients with cyclin D1 overexpression were at increased risk of cancer development compared to patients with normal expression. Hyperproliferation has been consistently observed in Barrett’s metaplasia by many assays including immunohistochemistry staining for division markers such as proliferating cell nuclear antigen (PCNA) and Ki67, and flow cytometry for DNA content, but there are no advanced phase studies showing that proliferation indices have any predictive value for cancer progression [99–102]. p53 lesions occur frequently in esophageal adenocarcinomas (85–95%) and almost never in normal tissue from the same patients; their prevalence increases with advancing histologic grade of dysplasia, which makes them appropriate candidates for further studies [103, 104]. Reid et al. evaluated 17p (p53) LOH in a large phase 4 study with prospective observation of 256 patients and esophageal adenocarcinoma as the primary end point. In this study, 17p (p53) LOH was a strong and significant predictor of progression to esophageal adenocarcinoma [105].
63.5.1.1 Invasion E-cadherin Cadherins are a family of calcium dependent cell–cell adhesion molecules essential to the maintenance of intercellular connections, cell polarity and cellular differentiation. Germline mutation of the E-cadherin gene (CDH1) causes familial gastric cancer. Loss of E-cadherin expression is associated with many nonfamilial human cancers, including esophageal adenocarcinoma. The expression of E-cadherin is significantly lower in patients with Barrett’s esophagus compared with normal esophageal epithelium and the reduction of its expression is observed when the metaplasia–dysplasia–adenocarcinoma sequence progresses. These findings suggest that
D. Yakoub et al.
E-cadherin may serve as a tumor suppressor early in the process of carcinogenesis [106–108].
COX-2 Cyclooxygenase-2 (COX-2) is normally found in the kidney and brain but in other tissues, its expression is inducible and rises during inflammation, wound healing and neoplastic growth in response to interleukins, cytokines, hormones, growth factors and tumor promoters. COX-2 and derived prostaglandin E2 (PGE2) appear to be implicated in carcinogenesis because they prolong the survival of abnormal cells. They reduce apoptosis and cell adhesion, increase cell proliferation, promote angiogenesis and invasion and make cancer cells resistant to the host immune response. COX-2 is expressed in the normal esophagus but its expression was found to be significantly increased in Barrett’s esophagus and even more in HGD and esophageal adenocarcinoma. Some authors suggest that COX-2 expression might be of prognostic value in esophageal adenocarcinoma. The COX-2 immunoreactivity study in cancer tissues showed that patients with high COX-2 expression were more likely to develop distant metastases and local recurrence and had significantly reduced survival rates when compared to those with low expression.
63.5.1.2 Dysplasia The diagnosis of Barrett’s metaplasia warrants regular endoscopic surveillance for dysplasia with four-quadrant biopsies at 2 cm intervals in the affected esophageal portion. HGD in Barrett’s esophagus is an indication for esophagectomy or endoscopic therapies. The prognostic significance of dysplasia in Barrett’s esophagus has been extensively studied for several decades and results are inconsistent. Dysplasia alone is not an entirely reliable biomarker and more specific progression predictors need to be included in surveillance recommendations to increase their efficacy and cost effectiveness [109, 110].
63.5.2 ASPECT Study The largest prospective intervention study in patients with Barrett’s metaplasia was started in April 2004 in the
63
Upper Gastrointestinal Surgery: Current Trends and Recent Innovations
UK. It is organized by the National Institute for Cancer Research and sponsored by Cancer Research UK. ASPECT, a phase IIIb randomized study of aspirin and esomeprazole chemoprevention in Barrett’s metaplasia, is a national, multicentre, randomized controlled trial of low or high dose esomeprazole with or without low dose aspirin for 8 years [111]. Its primary aim is to study if intervention with aspirin can result in a decreased mortality or conversion rate from Barrett’s metaplasia to adenocarcinoma or HGD and if high dose PPI treatment can decrease the cancer risk further. Biomarker studies in this project include characterization and validation of p16, TP53 (gene expression and LOH and mutation analysis) and also aneuploidy/ploidy changes (by flow cytometry), currently the best characterized biomarkers for prediction of progression. Other changes in protein expression will also be studied including CDX2, COX-2, protein kinase C-e (PKCe), and minichromosome maintenance protein 2 (Mcm2) expression. The aim is to address whether these markers are reproducible and if they are clinically informative (sensitivity and specificity) at 2 and 4 years. In addition, microarray analysis is underway as well as identification of novel DNA SNP (single nucleotide polymorphism) signatures and target regions for expression analysis and investigation of clonality.
63.5.3 Tumor Markers in Esophageal Cancer Staging Studies investigating tumor suppressor genes, cell adhesion molecules and apoptosis-related genes have yielded relatively disappointing results in terms of identifying useful markers for daily practice. Available data indicate that progression from BE to HGD and adenocarcinoma is a continuum rather than a distinct stepwise process and that no single marker reliably discriminates lesions that will progress.
63.5.4 Diagnostic Laparoscopy/ Thoracoscopy The increasing number of treatment options with a curative or a palliative intent has increased the importance of proper patient selection for a specific treatment.
807
Diagnostic laparoscopy has been claimed to be superior to all noninvasive imaging modalities in the detection of liver metastases, intra-abdominal lymph node metastases and peritoneal tumor spread. Diagnostic gains in the range of 10–50% have been claimed by adding diagnostic laparoscopy to the staging of cancer of the GEJ. The potential diagnostic gain needs to be balanced against the fact that diagnostic laparoscopy is costly, time-consuming and associated with a risk of serious complications.
63.5.5 Detection of Micrometastases in Bone Marrow Cytokeratin 18-positive micrometastases have been found in the bone marrow in 80–90% of patients undergoing curative resection of node-positive or node-negative esophageal adenocarcinoma or SCC [112, 113]. These results suggest that hematogenous dissemination occurs independently of lymphatic spread and that a node-negative status does not preclude metastatic spread. In patients receiving neoadjuvant chemoradiotherapy, the detection rate of bone marrow metastatic cells was less than 40%, but after marrow culture, viable cytokeratin-positive cells were detectable in a further 30% of patients [114]. Despite pathological complete responses of the primary tumor occurring in some patients after neoadjuvant therapy, micrometastases were found in the bone marrow of the same patients, implying resistance of metastatic cells to the chemotherapeutic agents used.
63.6 Imaging and Diagnostics 63.6.1 Upper Gastrointestinal Barium Examination (UGI) Endoscopy and the UGI are the primary modalities for the detection of gastric cancer. The double-contrast examination is the single best radiological technique for the detection of early gastric cancer. A single-contrast examination alone has an overall sensitivity of only 75% in diagnosing gastric cancer. Any lesion that has a mixed pattern is not unequivocally benign and warrants biopsy.
808
63.6.2 Staging Modalities and Prognostic Indicators Once the diagnosis of gastric cancer is established, further studies are directed at staging to assist with therapeutic decisions. Endoscopic ultrasound (EUS) and computed tomography (CT) are the current primary staging modalities for esophageal and gastric cancer.
63.6.2.1 Endoscopic Ultrasound Endosonographic T staging is based on the number of visceral wall layers that are disrupted as well as the preservation or destruction of sonographic interfaces between adjacent organs and vessels. N staging is based on the presence and location of perivisceral lymph nodes that fit certain criteria (diameter 10 mm, round shape, uniform hypoechoic structure, wellcircumscribed margins) or that are found to harbor malignant cells by EUS-guided transvisceral fine needle aspiration (FNA). Due to its limited depth of penetration, endosonography is less useful for M staging. However, with low frequency options on newer echoendoscopes, much of the liver can often be surveyed and even sampled from the stomach and duodenum. Accuracy of EUS for T staging in esophageal and gastric cancer is approximately 82%, with a sensitivity and specificity of 70–100 and 87–100%, respectively. N-stage accuracy has been shown to be approximately 70%, with sensitivity and specificity that ranges from 69.9 to 100% and 87.5 to 100%, respectively. Addition of FNA of suspicious nodes increases the accuracy even further, bringing specificity to 100%. In addition, EUS-guided FNA or Tru-Cut® (Baxter, Deerfield, IL) biopsy of submucosa can provide a tissue diagnosis in the setting of linitis plastica. EUS can be particularly useful in early stage esophageal and gastric cancer where it can sometimes allow for EMR [115].
63.6.2.2 Computed Tomography Computed tomography scanning provides critical information in treatment planning for esophageal and gastric cancer patients. Helical computed tomography is the single best noninvasive means of detecting metastatic disease. Multidetector CT (MDCT) has enabled faster
D. Yakoub et al.
scanning with simultaneous acquisition of multiple thin slices. Use of thin slice collimation decreases partial volume artifact that enables more accurate assessment of the wall in a curved organ. The acquisition of volumetric high-resolution data made possible by MDCT also allows multiplanar reconstructions (MPR) to be performed on a workstation, depicting most segments of the gastric wall in the optimal orthogonal plane with enhanced anatomical detail. In 2005, Shimizu and colleagues published data demonstrating an 85% accuracy of CT for T staging, using MDCT with 1.25-mm multiplanar reconstructed images [116]. CT scanning is not helpful for N staging; the cutoff for normal nodal size is 8–10 mm, yet metastases are most commonly found in lymph nodes smaller than 8 mm. 63.6.2.3 Magnetic Resonance (MR) Currently, MR is used primarily when CT iodinated contrast is contraindicated due to a significant contrast allergy or renal failure. MR may also be used to confirm the presence of equivocal liver masses seen on CT. Further developments in MR may include the use of endoluminal coils for better definition of the gastric wall layers and improved T staging and regional nodal staging accuracy. Dynamic contrast-enhanced (DCE)MRI with low-mW Gadolinium-DTPA is currently the most widespread and accurate means of imaging angiogenesis, and new methods will continue to be compared to this standard.
63.6.2.4 Positron Emission Tomography (PET) Potential uses of PET in esophageal and gastric cancer patients are in staging, detecting recurrence, determining prognosis and measuring therapy response. The major advantage of PET is substantially greater contrast resolution and the acquisition of functional data. PET can detect lymph node metastases before lymph nodes are enlarged on CT. The limitations of PET are lower sensitivity for small lesions and false-positive results from infectious or inflammatory processes. In addition, PET studies are relatively expensive compared with other imaging modalities. Combined PET and CT (PET/ CT) scanners have been introduced recently. Although PET does not have a role in the primary detection of gastric carcinoma, the degree of uptake in a
63
Upper Gastrointestinal Surgery: Current Trends and Recent Innovations
known gastric carcinoma has prognostic value. Moderately intense fluorodeoxyglucose (FDG) uptake in the gastric wall is a normal variant. Despite this, the majority (60– 96%) of primary gastric neoplasms are detected by PET. A greater degree of FDG uptake is associated with greater depth of invasion, size of tumor, and lymph node metastases. The survival rate in patients with high FDG uptake is significantly lower than in patients with low FDG uptake. Although CT is more sensitive than PET for detection of lymph node metastases in N1 and N2 disease, PET is more specific. PET may be more sensitive for the detection of non-nodal sites such as liver and lung metastases but not for bone, peritoneal, and pleural metastases. PET may also have value in the prediction of response to preoperative chemotherapy in esophago-gastric cancer.
63.6.2.5 Prognostic Factors in Gastroesophageal Cancer While many factors affect prognosis in gastric and esophageal cancer patients, the number of positive lymph nodes is the most consistent prognostic indicator. Fiveyear survival rates in patients with 1–6, 7–15 and greater than 15 positive nodes are 43, 21 and 13%, respectively. A multicenter observational study of 477 gastric cancer patients demonstrated that the number of positive lymph nodes was of better prognostic value than the location of the involved nodal basin. Positive peritoneal cytology is also associated with decreased survival. Recent studies suggest that the sensitivity of peritoneal cytologic evaluation is enhanced with real-time reverse transcriptasepolymerase chain reaction (RT-PCR) amplifying CEA [117]. Cytologic evaluation for malignant cells using this technique was positive in 10, 29, 66 and 81% of patients with T1, T2, T3 and T4 tumors, respectively. Positive tests were strongly correlated with eventual peritoneal carcinomatosis and survival. Detection of micrometastases of esophageal and gastric cancers may have a role in staging before neoadjuvant therapy [118].
809
such as minimal access surgery, robot-assisted surgery and various extents of lymphadenectomy. The need for specialized training has largely been answered by the introduction of simulated operative environments. Various sophisticated bench models have been designed to be positioned inside box trainers as well as simulated patient models that can even be situated in a full “virtual theatre” environment. Validation of these methods has been achieved; recent studies have shown that simulation-based training translates to improved performance on real cases in terms of reduced operating time, fewer errors and decreased patient discomfort. The next generation of simulated environment is virtual reality simulation (VR) that has the clear advantage of high fidelity of the visual elements involved and the possibility of simulating almost any operative environment. Progress has been made to improve the tactile feedback of these simulators to mimic actual tissue resistance during operating on live patients.
63.7.2 Hospital Volume/Outcome Relationship The role of experience in improving surgical outcomes is well established [119, 120]. The impact of hospital volume on clinical and economic outcomes for esophagectomy has been reported [121, 122]. Hospitals were stratified to low-volume hospitals doing less than six esophagectomies a year and high-volume hospitals doing more than six. The study consisted of 1,193 patients who underwent esophagectomy during an 8-year study period. High-volume hospitals were associated with a 2-day decrease in median length of stay, a 3-day reduction in median intensive care unit stay, an increased rate of home discharges and a 3.7% decrease in hospital mortality (9.2 vs. 2.5%) [121].
63.8 Future Directions in Research and Management Within Specialty 63.7 Training Within Specialty 63.8.1 Future Directions in Drug Therapy 63.7.1 Surgical Skills Training Upper gastrointestinal surgery has evolved enormously over the last 50 years with the introduction of techniques
Future research in this area is evolving along three separate lines. The first examines the role of new chemotherapeutics, particularly oxaliplatin, irinotecan, and
810
oral 5-FU “prodrugs” such as capecitabine and S-1 that have proven valuable in other gastrointestinal malignancies. The second examines drugs designed to inhibit the function of a particular molecular target critical to cancer cell growth such as cetuximab, an epidermal growth factor inhibitor, and bevacizumab, a vascular epidermal growth factor inhibitor, both given in conjunction with chemotherapy. Finally, there is a growing emphasis on clinical and molecular predictors of chemotherapeutic responsive in advanced gastric cancer. Such considerations are particularly important given the relatively modest benefit/toxicity ratios of present chemotherapy regimens for gastric cancer. Low tissue levels of expression of thymidine synthetase (critical to intracellular 5-FU metabolism), p53 (related to chemotherapeutic resistance), and bcl-2 (important to cellular apoptosis) have been found to correlate with superior survival in gastric cancer [123, 124].
63.8.2 Staging of Gastric and Esophageal Cancer With the emergence of laparoscopic ultrasound, nodal staging is now possible with laparoscopy. Finch and colleagues indicate that laparoscopic ultrasound is 84% accurate in TNM staging of esophageal cancers. This study compared laparoscopic ultrasound to CT and laparoscopy and showed a clear benefit for ultrasound in assessing GI cancers [125]. No single laboratory test yet exists to facilitate diagnosis and detection of recurrent gastric cancer. New techniques are emerging for the detection of individuals at increased risk for gastric cancer based on their genetic composition. These technologies include cDNA microarray, serial analysis of gene expression (SAGE), differential display and subtractive hydridization.
D. Yakoub et al.
Tokyo. The software can be used to guide surgeons in the extent of lymphadenectomy and predict survival for a specific patient [126]. Similarly, nomograms or scoring systems have been developed that predict survival based on weighted importance of sex, age, tumor location, Lauren histologic type, T stage, N stage and extent of lymphadenectomy.
63.8.4 Global molecular profiling Emerging methods of global gene and protein profiling using bioinformatics are increasingly employed for molecular evaluation of Barrett’s metaplasia. Results are preliminary and do not yet have clinical relevance. Mitas et al. developed a quantitative mathematical algorithm based on the expression levels of a panel of three genes including TSPAN (tetraspan1), ECGF1 (endothelial growth factor 1) and SPARC (secreted protein, acidic, cysteine-rich) to discriminate between Barrett’s esophagus and esophageal adenocarcinoma [127]. Proteomic studies using mass spectrometry enable direct analysis of epithelial protein expression patterns, used in combination with microdissection by Streitz and colleagues [128]. Identification of specific biomarkers is necessary to select patients who need intensified surveillance, to better characterize populations for intervention studies including chemoprevention and to improve outcomes and reduce care costs. In order to facilitate evaluation of the appropriateness and quality of further studies and to improve the ability to compare results, NCI-EORTC developed and published Reporting Recommendations for Tumor Marker Prognostic Studies (REMARK, reporting of tumor marker studies), available at http://www.cancerdiagnosis.nci.nih.gov/assessment/progress/clinical.html.
References 63.8.3 Prediction of Survival and Guidance of Treatment Other systems have been proposed to guide treatment and predict survival of patients with gastric cancer. Maruyama has developed computer software based on demographic and clinical features of 4,302 gastric cancer patients at the National Cancer Center Hospital in
1. Ballantyne GH (2007) Telerobotic gastrointestinal surgery: phase 2 – safety and efficacy. Surg Endosc 21:1054–1062 2. Ruurda JP, Broeders IA, Simmermacher RP et al (2002) Feasibility of robot-assisted laparoscopic surgery: an evaluation of 35 robot-assisted laparoscopic cholecystectomies. Surg Laparosc Endosc Percutan Tech 12:41–45 3. Berguer R, Smith W (2006) An ergonomic comparison of robotic and laparoscopic technique: the influence of surgeon experience and task complexity. J Surg Res 134:87–92
63
Upper Gastrointestinal Surgery: Current Trends and Recent Innovations
4. Bodner JC, Zitt M, Ott H et al (2005) Robotic-assisted thoracoscopic surgery (RATS) for benign and malignant esophageal tumors. Ann Thorac Surg 80:1202–1206 5. Kalloo AN, Singh VK, Jagannath SB et al (2004) Flexible transgastric peritoneoscopy: a novel approach to diagnostic and therapeutic interventions in the peritoneal cavity. Gastrointest Endosc 60:114–117 6. Pai RD, Fong DG, Bundga ME et al (2006) Transcolonic endoscopic cholecystectomy: a NOTES survival study in a porcine model (with video). Gastrointest Endosc 64:428–434 7. Rattner D, Kalloo A (2006) ASGE/SAGES Working Group on Natural Orifice Translumenal Endoscopic Surgery. Surg Endosc 20:329–333 8. Palanivelu C, Rajan PS, Rangarajan M et al (2008) Transvaginal endoscopic appendectomy in humans: a unique approach to NOTES-world’s first report. Surg Endosc 22: 1343–1347 9. Bernhardt J, Gerber B, Schober HC et al (2008) NOTES – case report of a unidirectional flexible appendectomy. Int J Colorectal Dis 23:547–550 10. Rattner DW (2008) NOTES: Where have we been and where are we going? Surg Endosc 22:1143–1145 11. Triadafilopoulos G (2004) Changes in GERD symptom scores correlate with improvement in esophageal acid exposure after the Stretta procedure. Surg Endosc 18:1038–1044 12. Triadafilopoulos G (2007) Endotherapy and surgery for GERD. J Clin Gastroenterol 41(Suppl 2):S87–S96 13. Cicala M, Gabbrielli A, Emerenziani S et al (2005) Effect of endoscopic augmentation of the lower oesophageal sphincter (Gatekeeper reflux repair system) on intraoesophageal dynamic characteristics of acid reflux. Gut 54:183–186 14. Fockens P, Bruno MJ, Gabbrielli A et al (2004) Endoscopic augmentation of the lower esophageal sphincter for the treatment of gastroesophageal reflux disease: multicenter study of the Gatekeeper Reflux Repair System. Endoscopy 36:682–689 15. Pleskow D, Rothstein R, Lo S et al (2005) Endoscopic fullthickness plication for the treatment of GERD: 12-month follow-up for the North American open-label trial. Gastrointest Endosc 61:643–649 16. Pleskow D, Rothstein R, Kozarek R et al (2007) Endoscopic full-thickness plication for the treatment of GERD: longterm multicenter results. Surg Endosc 21:439–444 17. Pleskow D, Rothstein R, Kozarek R et al (2008) Endoscopic full-thickness plication for the treatment of GERD: five-year long-term multicenter results. Surg Endosc 22:326–332 18. Rothstein RI (2008) Endoscopic therapy of gastroesophageal reflux disease: outcomes of the randomized-controlled trials done to date. J Clin Gastroenterol 42:594–602 19. Conio M, Cameron AJ, Chak A et al (2005) Endoscopic treatment of high-grade dysplasia and early cancer in Barrett’s oesophagus. Lancet Oncol 6:311–321 20. Conio M, Repici A, Cestari R et al (2005) Endoscopic mucosal resection for high-grade dysplasia and intramucosal carcinoma in Barrett’s esophagus: an Italian experience. World J Gastroenterol 11:6650–6655 21. Gross SA, Wolfsen HC (2008) The use of photodynamic therapy for diseases of the esophagus. J Environ Pathol Toxicol Oncol 27:5–21 22. Tokar JL, Haluszka O, Weinberg DS (2007) Endoscopic therapy of dysplasia and early-stage cancers of the esophagus. Semin Radiat Oncol 17:10–21
811
23. Wolfsen HC (2005) Present status of photodynamic therapy for high-grade dysplasia in Barrett’s esophagus. J Clin Gastroenterol 39:189–202 24. Sproston P, Primatesta P (2003) Health survey for England 2002. The health of children and young people. The Stationery Office, London 25. Kopelman P, Jebb SA, Butland B. (2007) Executive Summary: Foresight 'Tackling Obesities: Future Choices' project. Obes Rev. Mar;8 Suppl 1:vi-ix 26. Field AE, Coakley EH, Must A et al (2001) Impact of overweight on the risk of developing common chronic diseases during a 10-year period. Arch Intern Med 161:1581–1586 27. Price GM, Uauy R, Breeze E et al (2006) Weight, shape, and mortality risk in older persons: elevated waist-hip ratio, not high body mass index, is associated with a greater risk of death. Am J Clin Nutr 84:449–460 28. Baumer JH. (2007) Obesity and overweight: its prevention, identification, assessment and management. Arch Dis Child Educ Pract Ed. Jun;92(3):ep92–6 29. Kohn GP, Galanko JA, Overby DW, Farrell TM. (2009) Recent Trends in bariatric surgery case volume in the United States. Surgery Aug:146(2):375–80 30. Schneider BE, Mun EC (2005) Surgical management of morbid obesity. Diabetes Care 28:475–480 31. Morino M, Toppino M, Bonnet G et al (2003) Laparoscopic adjustable silicone gastric banding versus vertical banded gastroplasty in morbidly obese patients: a prospective randomized controlled clinical trial. Ann Surg 238:835–841; discussion 841–842 32. Sugerman HJ, Kral JG (2005) Evidence-based medicine reports on obesity surgery: a critique. Int J Obes (Lond) 29: 735–745 33. Alper D, Ramadan E, Vishne T et al (2000) Silastic ring vertical gastroplasty- long-term results and complications. Obes Surg 10:250–254 34. Suter M, Jayet C, Jayet A (2000) Vertical banded gastroplasty: long-term results comparing three different techniques. Obes Surg 10:41–46; discussion 47 35. Belachew M, Legrand MJ, Defechereux TH et al (1994) Laparoscopic adjustable silicone gastric banding in the treatment of morbid obesity. A preliminary report. Surg Endosc 8:1354–1356 36. DeMaria EJ, Jamal MK (2005) Laparoscopic adjustable gastric banding: evolving clinical experience. Surg Clin North Am 85:773–787; vii 37. Angrisani L, Furbetta F, Doldi SB et al (2003) Lap band adjustable gastric banding system: the Italian experience with 1863 patients operated on 6 years. Surg Endosc 17: 409–412 38. DeMaria EJ, Schauer P, Patterson E et al (2005) The optimal surgical management of the super-obese patient: the debate. Presented at the annual meeting of the Society of American Gastrointestinal and Endoscopic Surgeons, Hollywood, Florida, USA, April 13–16, 2005. Surg Innov 12:107–121 39. Weiner R, Blanco-Engert R, Weiner S et al (2003) Outcome after laparoscopic adjustable gastric banding – 8 years experience. Obes Surg 13:427–434 40. Dolan K, Hatzifotis M, Newbury L et al (2004) A comparison of laparoscopic adjustable gastric banding and biliopancreatic diversion in superobesity. Obes Surg 14:165–169 41. Mognol P, Chosidow D, Marmuse JP (2005) Laparoscopic gastric bypass versus laparoscopic adjustable gastric
812 banding in the super-obese: a comparative study of 290 patients. Obes Surg 15:76–81 42. Chevallier JM, Zinzindohoue F, Douard R et al (2004) Complications after laparoscopic adjustable gastric banding for morbid obesity: experience with 1,000 patients over 7 years. Obes Surg 14:407–414 43. Fielding GA, Ren CJ (2005) Laparoscopic adjustable gastric band. Surg Clin North Am 85:129–140; x 44. Dargent J (2005) Esophageal dilatation after laparoscopic adjustable gastric banding: definition and strategy. Obes Surg 15:843–848 45. Shen R, Dugay G, Rajaram K et al (2004) Impact of patient follow-up on weight loss after bariatric surgery. Obes Surg 14:514–519 46. Moon Han S, Kim WW, Oh JH (2005) Results of laparoscopic sleeve gastrectomy (LSG) at 1 year in morbidly obese Korean patients. Obes Surg 15:1469–1475 47. Silecchia G, Boru C, Pecchia A et al (2006) Effectiveness of laparoscopic sleeve gastrectomy (first stage of biliopancreatic diversion with duodenal switch) on co-morbidities in super-obese high-risk patients. Obes Surg 16:1138–1144 48. Frezza EE (2007) Laparoscopic vertical sleeve gastrectomy for morbid obesity. The future procedure of choice? Surg Today 37:275–281 49. Deviere J, Ojeda Valdes G, Cuevas Herrera L et al (2008) Safety, feasibility and weight loss after transoral gastroplasty: first human multicenter study. Surg Endosc 22:589–598 50. Kantsevoy SV, Hu B, Jagannath SB et al (2007) Technical feasibility of endoscopic gastric reduction: a pilot study in a porcine model. Gastrointest Endosc 65:510–513 51. Genco A, Bruni T, Doldi SB et al (2005) BioEnterics intragastric balloon: the Italian experience with 2,515 patients. Obes Surg 15:1161–1164 52. Genco A, Cipriano M, Bacci V et al (2006) BioEnterics intragastric balloon (BIB): a short-term, double-blind, randomised, controlled, crossover study on weight reduction in morbidly obese patients. Int J Obes (Lond) 30:129–133 53. Melissas J, Mouzas J, Filis D et al (2006) The intragastric balloon – smoothing the path to bariatric surgery. Obes Surg 16:897–902 54. Spyropoulos C, Katsakoulis E, Mead N et al (2007) Intragastric balloon for high-risk super-obese patients: a prospective analysis of efficacy. Surg Obes Relat Dis 3:78–83 55. Scopinaro N, Marinari G, Camerini G et al (2005) Biliopancreatic diversion for obesity: state of the art. Surg Obes Relat Dis 1:317–328 56. Buchwald H, Avidor Y, Braunwald E et al (2004) Bariatric surgery: a systematic review and meta-analysis. JAMA 292:1724–1737 57. Scopinaro N (2006) Biliopancreatic diversion: mechanisms of action and long-term results. Obes Surg 16:683–689 58. Scopinaro N, Papadia F, Marinari G et al (2007) Long-term control of type 2 diabetes mellitus and the other major components of the metabolic syndrome after biliopancreatic diversion in patients with BMI <35 kg/m2. Obes Surg 17: 185–192 59. Tataranni PA, Mingrone G, Raguso CA et al (1996) Twentyfour-hour energy and nutrient balance in weight stable postobese patients after biliopancreatic diversion. Nutrition 12:239–244 60. Hess DS, Hess DW (1998) Biliopancreatic diversion with a duodenal switch. Obes Surg 8:267–282
D. Yakoub et al. 61. Marceau P, Hould FS, Simard S et al (1998) Biliopancreatic diversion with duodenal switch. World J Surg 22:947–954 62. Feng JJ, Gagner M (2002) Laparoscopic biliopancreatic diversion with duodenal switch. Semin Laparosc Surg 9: 125–129 63. Hess DS, Hess DW, Oakley RS (2005) The biliopancreatic diversion with the duodenal switch: results beyond 10 years. Obes Surg 15:408–416 64. Buchwald H (2005) Consensus conference statement bariatric surgery for morbid obesity: health implications for patients, health professionals, and third-party payers. Surg Obes Relat Dis 1:371–381 65. Scopinaro N, Marinari GM, Camerini G (2002) Laparoscopic standard biliopancreatic diversion: technique and preliminary results. Obes Surg 12:362–365 66. Scopinaro N, Gianetta E, Adami GF et al (1996) Biliopancreatic diversion for obesity at eighteen years. Surgery 119:261–268 67. Scopinaro N, Adami GF, Marinari GM et al (1998) Biliopancreatic diversion. World J Surg 22:936–946 68. Anthone GJ (2005) The duodenal switch operation for morbid obesity. Surg Clin North Am 85:819–833; viii 69. Slater GH, Fielding GA (2004) Combining laparoscopic adjustable gastric banding and biliopancreatic diversion after failed bariatric surgery. Obes Surg 14:677–682 70. Santry HP, Gillen DL, Lauderdale DS (2005) Trends in bariatric surgical procedures. JAMA 294:1909–1917 71. DeMaria EJ (2004) Is gastric bypass superior for the surgical treatment of obesity compared with malabsorptive procedures? J Gastrointest Surg 8:401–403 72. Marema RT, Perez M, Buffington CK (2005) Comparison of the benefits and complications between laparoscopic and open Roux-en-Y gastric bypass surgeries. Surg Endosc 19: 525–530 73. Podnos YD, Jimenez JC, Wilson SE et al (2003) Complications after laparoscopic gastric bypass: a review of 3464 cases. Arch Surg 138:957–961 74. Olbers T, Fagevik-Olsen M, Maleckas A et al (2005) Randomized clinical trial of laparoscopic Roux-en-Y gastric bypass versus laparoscopic vertical banded gastroplasty for obesity. Br J Surg 92:557–562 75. Raftopoulos I, Ercole J, Udekwu AO et al (2005) Outcomes of Roux-en-Y gastric bypass stratified by a body mass index of 70 kg/m2: a comparative analysis of 825 procedures. J Gastrointest Surg 9:44–52; discussion 52–53 76. Maggard MA, Shugarman LR, Suttorp M et al (2005) Metaanalysis: surgical treatment of obesity. Ann Intern Med 142: 547–559 77. Stein HJ, Feith M (2005) Surgical strategies for early esophageal adenocarcinoma. Best Pract Res Clin Gastroenterol 19:927–940 78. Peyre CG, Hagen JA, DeMeester SR et al (2008) The number of lymph nodes removed predicts survival in esophageal cancer: an international study on the impact of extent of surgical resection. Ann Surg 248:549–556 79. Kent MS, Schuchert M, Fernando H et al (2006) Minimally invasive esophagectomy: state of the art. Dis Esophagus 19: 137–145 80. Pennathur A, Luketich JD (2008) Minimally invasive surgical treatment of esophageal carcinoma. Gastrointest Cancer Res 2:295 81. Luketich JD, Fernando HC, Christie NA et al (2001) Outcomes after minimally invasive esophagomyotomy. Ann Thorac Surg 72:1909–1912; discussion 1912–1903
63
Upper Gastrointestinal Surgery: Current Trends and Recent Innovations
82. Spechler SJ, Lee E, Ahnen D et al (2001) Long-term outcome of medical and surgical therapies for gastroesophageal reflux disease: follow-up of a randomized controlled trial. JAMA 285:2331–2338 83. Garcia-Gallont R (2008) Laparoscopic fundoplication for GERD: are we there yet? Dig Dis 26:304–308 84. Gatenby PA, Ramus JR, Caygill CP et al (2009) Treatment modality and risk of development of dysplasia and adenocarcinoma in columnar-lined esophagus. Dis Esophagus 22: 133–142 85. Smith CD (2008) Antireflux surgery. Surg Clin North Am 88:943–958; v 86. Fibbe C, Layer P, Keller J et al (2001) Esophageal motility in reflux disease before and after fundoplication: a prospective, randomized, clinical, and manometric study. Gastroenterology 121:5–14 87. Laws HL, Clements RH, Swillie CM (1997) A randomized, prospective comparison of the Nissen fundoplication versus the Toupet fundoplication for gastroesophageal reflux disease. Ann Surg 225:647–653; discussion 654 88. Varin O, Velstra B, De Sutter S et al (2009) Total vs partial fundoplication in the treatment of gastroesophageal reflux disease: a meta-analysis. Arch Surg 144:273–278 89. Oelschlager BK, Chang L, Pellegrini CA (2003) Improved outcome after extended gastric myotomy for achalasia. Arch Surg 138:490–495; discussion 495–497 90. Neuner G, Patel A, Suntharalingam M (2009) Chemoradiotherapy for esophageal cancer. Gastrointest Cancer Res 3:57–65 91. Yakoub D, Fahmy R, Athanasiou T et al (2008) Evidencebased choice of esophageal stent for the palliative management of malignant dysphagia. World J Surg 32:1996–2009 92. Ferry DR, Anderson M, Beddard K et al (2007) A phase II study of gefitinib monotherapy in advanced esophageal adenocarcinoma: evidence of gene expression, cellular, and clinical response. Clin Cancer Res 13:5869–5875 93. Weichselbaum RR, Kufe D (2009) Translation of the radioand chemo-inducible TNFerade vector to the treatment of human cancers. Cancer Gene Ther 16:609–619 94. Atherfold PA, Jankowski JA (2006) Molecular biology of Barrett’s cancer. Best Pract Res Clin Gastroenterol 20: 813–827 95. Brock MV, Gou M, Akiyama Y et al (2003) Prognostic importance of promoter hypermethylation of multiple genes in esophageal adenocarcinoma. Clin Cancer Res 9: 2912–2919 96. Hao Y, Triadafilopoulos G, Sahbaie P et al (2006) Gene expression profiling reveals stromal genes expressed in common between Barrett’s esophagus and adenocarcinoma. Gastroenterology 131:925–933 97. Helm J, Enkemann SA, Coppola D et al (2005) Dedifferentiation precedes invasion in the progression from Barrett’s metaplasia to esophageal adenocarcinoma. Clin Cancer Res 11:2478–2485 98. Jankowski JA, Wright NA, Meltzer SJ et al (1999) Molecular evolution of the metaplasia-dysplasia-adenocarcinoma sequence in the esophagus. Am J Pathol 154:965–973 99. Binato M, Gurski RR, Fagundes RB et al (2009) P53 and Ki-67 overexpression in gastroesophageal reflux disease – Barrett’s esophagus and adenocarcinoma sequence. Dis Esophagus. 22(7):588-95. Epub 2009 Mar 6 100. Jin Z, Cheng Y, Gu W et al (2009) A multicenter, double-blinded validation study of methylation biomarkers for progression prediction in Barrett’s esophagus. Cancer Res 69:4112–4115
813
101. Tischoff I, Tannapfel A (2008) Barrett’s esophagus: can biomarkers predict progression to malignancy? Expert Rev Gastroenterol Hepatol 2:653–663 102. Tomizawa Y, Wang KK (2009) Changes in screening, prognosis and therapy for esophageal adenocarcinoma in Barrett’s esophagus. Curr Opin Gastroenterol 25:358–365 103. Kaye PV, Haider SA, Ilyas M et al (2009) Barrett’s dysplasia and the Vienna classification: reproducibility, prediction of progression and impact of consensus reporting and p53 immunohistochemistry. Histopathology 54:699–712 104. Keswani RN, Noffsinger A, Waxman I et al (2006) Clinical use of p53 in Barrett’s esophagus. Cancer Epidemiol Biomarkers Prev 15:1243–1249 105. Reid BJ, Blount PL, Rabinovitch PS (2003) Biomarkers in Barrett’s esophagus. Gastrointest Endosc Clin N Am 13: 369–397 106. Falkenback D, Nilbert M, Oberg S et al (2008) Prognostic value of cell adhesion in esophageal adenocarcinomas. Dis Esophagus 21:97–102 107. Feith M, Stein HJ, Mueller J et al (2004) Malignant degeneration of Barrett’s esophagus: the role of the Ki-67 proliferation fraction, expression of E-cadherin and p53. Dis Esophagus 17:322–327 108. Zagorowicz E, Jankowski J (2007) Molecular changes in the progression of Barrett’s oesophagus. Postgrad Med J 83:529–535 109. Schuchert MJ, McGrath K, Buenaventura PO (2005) Barrett’s esophagus: diagnostic approaches and surveillance. Semin Thorac Cardiovasc Surg 17:301–312 110. Schuchert MJ, Luketich JD (2007) Barrett’s esophagusemerging concepts and controversies. J Surg Oncol 95: 185–189 111. Das D, Chilton AP, Jankowski JA (2009) Chemoprevention of oesophageal cancer and the AspECT trial. Recent Results Cancer Res 181:161–169 112. Inoue H, Kajiyama Y, Tsurumaru M (2004) Clinical significance of bone marrow micrometastases in esophageal cancer. Dis Esophagus 17:328–332 113. Spence GM, Graham AN, Mulholland K et al (2004) Bone marrow micrometastases and markers of angiogenesis in esophageal cancer. Ann Thorac Surg 78:1944–1949; discussion 1950 114. Ryan P, McCarthy S, Kelly J et al (2004) Prevalence of bone marrow micrometastases in esophagogastric cancer patients with and without neoadjuvant chemoradiotherapy. J Surg Res 117:121–126 115. May A, Gunter E, Roth F et al (2004) Accuracy of staging in early oesophageal cancer using high resolution endoscopy and high resolution endosonography: a comparative, prospective, and blinded trial. Gut 53:634–640 116. Shimizu K, Ito K, Matsunaga N et al (2005) Diagnosis of gastric cancer with MDCT using the water-filling method and multiplanar reconstruction: CT-histologic correlation. AJR Am J Roentgenol 185:1152–1158 117. Dalal KM, Woo Y, Kelly K et al (2008) Detection of micrometastases in peritoneal washings of gastric cancer patients by the reverse transcriptase polymerase chain reaction. Gastric Cancer 11:206–213 118. Liakakos T, Polychronidis A, Bistarakis D et al (2009) Laparoscopic peritoneal cytology: can it affect decisionmaking for neoadjuvant treatment of gastric cancer? Ann Surg Oncol 16:1072–1073; author reply 1076
814 119. Birkmeyer JD, Sun Y, Wong SL et al (2007) Hospital volume and late survival after cancer surgery. Ann Surg 245: 777–783 120. Casson AG, van Lanschot JJ (2005) Improving outcomes after esophagectomy: the impact of operative volume. J Surg Oncol 92:262–266 121. Kuo EY, Chang Y, Wright CD (2001) Impact of hospital volume on clinical and economic outcomes for esophagectomy. Ann Thorac Surg 72:1118–1124 122. Reavis KM, Smith BR, Hinojosa MW et al (2008) Outcomes of esophagectomy at academic centers: an association between volume and outcome. Am Surg 74: 939–943 123. Homs MY, v d Gaast A, Siersema PD et al (2006) Chemotherapy for metastatic carcinoma of the esophagus and gastro-esophageal junction. Cochrane Database Syst Rev: CD004063 124. Homs MY, Voest EE, Siersema PD (2009) Emerging drugs for esophageal cancer. Expert Opin Emerg Drugs 14: 329–339 125. Finch MD, John TG, Garden OJ et al (1997) Laparoscopic ultrasonography for staging gastroesophageal cancer. Surgery 121:10–17 126. Bollschweiler EH, Monig SP, Hensler K et al (2004) Artificial neural network for prediction of lymph node metastases in gastric cancer: a phase II diagnostic study. Ann Surg Oncol 11:506–511 127. Mitas M, Almeida JS, Mikhitarian K et al (2005) Accurate discrimination of Barrett’s esophagus and esophageal adenocarcinoma using a quantitative three-tiered algorithm and multimarker real-time reverse transcription-PCR. Clin Cancer Res 11:2205–2214 128. Streitz JM Jr, Madden MT, Marimanikkuppam SS et al (2005) Analysis of protein expression patterns in Barrett’s esophagus using MALDI mass spectrometry, in search of malignancy biomarkers. Dis Esophagus 18:170–176 129. Kelsen DP, Ginsberg R, Pajak TF, Sheahan DG, Gunderson L, Mortimer J, Estes N, Haller DG, Ajani J, Kocha W, Minsky BD, Roth JA. (1998) Chemotherapy followed by surgery compared with surgery alone for localized esophageal cancer. N Engl J Med. Dec 31;339(27):1979–84 130. Medical Research Council Oesophageal Cancer Working Group. (2002) Surgical resection with or without preoperative chemotherapy in oesophageal cancer: a randomised controlled trial. Lancet. May 18;359(9319):1727–33 131. Cunningham D, Allum WH, Stenning SP, Thompson JN, Van de Velde CJ, Nicolson M, Scarffe JH, Lofts FJ, Falk SJ, Iveson TJ, Smith DB, Langley RE, Verma M, Weeden S, Chua YJ, MAGIC Trial Participants. (2006) Perioperative chemotherapy versus surgery alone for resectable gastroesophageal cancer. N Engl J Med. Jul 6;355(1):11–20 132. Urba SG, Orringer MB, Turrisi A, Iannettoni M, Forastiere A, Strawderman M. (2001). Randomized trial of preopera-
D. Yakoub et al. tive chemoradiation versus surgery alone in patients with locoregional esophageal carcinoma. J Clin Oncol. Jan 15;19(2):305–13 133. Walsh TN, Noonan N, Hollywood D, Kelly A, Keeling N, Hennessy TP. (1996) A comparison of multimodal therapy and surgery for esophageal adenocarcinoma. N Engl J Med. Aug 15;335(7):462–7. Erratum in: N Engl J Med 1999 Jul 29;341(5):384 134. Burmeister BH, Smithers BM, Gebski V, Fitzgerald L, Simes RJ, Devitt P, Ackland S, Gotley DC, Joseph D, Millar J, North J, Walpole ET, Denham JW; Trans-Tasman Radiation Oncology Group; Australasian Gastro-Intestinal Trials Group. (2005) Surgery alone versus chemoradiotherapy followed by surgery for resectable cancer of the oesophagus: a randomised controlled phase III trial. Lancet Oncol. Sep;6(9):659–68 135. Kelsen DP, Ginsberg R, Pajak TF, Sheahan DG, Gunderson L, Mortimer J, Estes N, Haller DG, Ajani J, Kocha W, Minsky BD, Roth JA. Chemotherapy followed by surgery compared with surgery alone for localized esophageal cancer. N Engl J Med. 1998 Dec 31;339(27):1979–84. PubMed PMID: 9869669 136. Medical Research Council Oesophageal Cancer Working Group. Surgical resection with or without preoperative chemotherapy in oesophageal cancer: a randomised controlled trial. Lancet. 2002 May 18;359(9319):1727–33. PubMed PMID: 12049861 137. Cunningham D, Allum WH, Stenning SP, Thompson JN, Van de Velde CJ, Nicolson M, Scarffe JH, Lofts FJ, Falk SJ, Iveson TJ, Smith DB, Langley RE, Verma M, Weeden S, Chua YJ, MAGIC Trial Participants. Perioperative chemotherapy versus surgery alone for resectable gastroesophageal cancer. N Engl J Med. 2006 Jul 6;355(1):11–20. PubMed PMID: 16822992 138. Urba SG, Orringer MB, Turrisi A, Iannettoni M, Forastiere A, Strawderman M. Randomized trial of preoperative chemoradiation versus surgery alone in patients with locoregional esophageal carcinoma. J Clin Oncol. 2001 Jan 15;19(2):305–13. PubMed PMID: 11208820 139. Walsh TN, Noonan N, Hollywood D, Kelly A, Keeling N, Hennessy TP. A comparison of multimodal therapy and surgery for esophageal adenocarcinoma. N Engl J Med.1996 Aug 15;335(7):462–7. Erratum in: N Engl J Med 1999 Jul 29;341(5):384. PubMed PMID: 8672151 140. Burmeister BH, Walpole ET, Burmeister EA, Thomas J, Thomson DB, Harvey JA,Mark Smithers B, Gotley DC. Feasibility of chemoradiation therapy with protracted infusion of 5-fluorouracil for esophageal cancer patients not suitable for cisplatin. Int J Clin Oncol. 2005 Aug;10(4):256– 61. PubMed PMID: 16136371
Colorectal Cancer Surgery: Current Trends and Recent Innovations
64
Oliver Priest, Paul Ziprin, and Peter W. Marcello
Contents
64.7 64.7.1
64.1 64.2
Introduction .......................................................... 815 Colorectal Imaging Techniques for Diagnosis and Staging.......................................... 816
64.2.1 64.2.2 64.2.3 64.2.4
Colon Cancer Diagnosis ........................................ Colon Cancer Staging ............................................ Rectal Cancer Staging............................................ Endorectal Ultrasound Staging in Rectal Cancer ..................................................... 64.2.5 ERUS Restaging of Rectal Cancer After Chemoradiation...................................................... 64.2.6 EUS for Detection of Recurrent Rectal Cancer ..... 64.2.7 MR Imaging in Rectal Cancer ............................... 64.2.8 Determining Tumour Characteristics and Outcomes from Imaging ................................. 64.2.9 Sentinel Lymph Node Mapping in Colorectal Cancer .................................................................... 64.2.10 The Role of Chemotherapy in Colorectal Cancer ....................................................................
816 817 817 818 818 819 819 819 820 820
64.3
Rectal Cancer Surgery ........................................ 821
64.3.1 64.3.2 64.3.3 64.3.4 64.3.5
Total Mesorectal Excision...................................... The Role of Neoadjuvant Therapy in Rectal Cancer Surgery ....................................... Management of Early Rectal Cancer ..................... Surgeon as a Source of Variability in Outcomes ... Hospital and Variability in Outcome .....................
64.4
Laparoscopic Surgery.......................................... 824
64.5
Enhanced Recovery ............................................. 825
64.6
Robotic Colorectal Surgery................................. 826
O. Priest () Department of Biosurgery & Surgical Technology, Imperial College London, 10th Floor, QEQM Building, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
821 822 822 823 824
64.7.2 64.7.3 64.7.4 64.7.5
Future Perspectives .............................................. 826 Telementoring and Remote Telepresence Surgery ................................................................... New Devices for Robotic Surgery ......................... Real-Time Intra-Operative Anatomy and Histology ......................................................... Natural Orifice Transluminal Endoscopic Surgery ............................................... Individualisation of Treatment .................................
826 828 828 828 829
References ........................................................................... 829
Abstract This chapter focuses on the recent and possible future developments in the management of adenocarcinoma of the colon and rectum. Imaging techniques for accurate diagnosis and staging are discussed, together with the impact of neoadjuvant therapy and high-quality surgery on clinical outcomes. Topical issues including the role of super-specialisation and the importance of multidisciplinary teams are highlighted. An overview of laparoscopic and robotically assisted colorectal surgery is presented, with a progress report on emerging technologies such as image-enhanced surgery, natural orifice transluminal endoscopic surgery and biological targeting of chemotherapy regimens.
64.1 Introduction In the United Kingdom, there are approximately 35,000 new data quoted is for new cases diagnosed every year. It is the third most common cancer after breast and lung and the second leading cause of cancer-related deaths. Cancer statistics from the United States estimate a diagnosis of 148,810 new cases in 2008, and that the disease will be responsible for 49,960 deaths during the same year [1]. Survival is dependent on disease stage at diagnosis (Table 64.1).
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_64, © Springer-Verlag Berlin Heidelberg 2010
815
816
O. Priest et al.
64.2 Colorectal Imaging Techniques for Diagnosis and Staging
Table 64.1 Five-year survival by Dukes’ stage in colorectal cancer Approximate frequency and 5-year relative survival (%) by Dukes’ stage Approximate 5-year Dukes’ stage Approximate (modified) frequency at survival (%) diagnosis (%) A
11
83
B
35
64
C
26
38
D
29
3
64.2.1 Colon Cancer Diagnosis Colonoscopy is considered the gold standard investigation for the diagnosis of colorectal cancer, particularly for polyp screening. Studies have shown colonoscopy to be more accurate than double-contrast barium enema for tumour detection [3]. A less invasive alternative is computed tomography (CT) colonography or “virtual colonscopy”. The reported accuracy of this investigation varies, with a wide range of sensitivities and specificities for polyp detection and a documented variation in accuracy with polyp size. A published meta-analysis of 14 studies with over 1,000 patients reported high overall sensitivity and specificity particularly for lesions larger than 10 mm, but much lower accuracy for smaller lesions [4]. A recent paper analysing the summary receiver operating characteristic curves of tests in pooled studies demonstrated that CT colonography had a reasonable sensitivity and specificity for detecting polyps larger than 10 mm. However, it was less accurate
The management of the disease has changed dramatically over the last decade with the implementation of national guidelines and clinical practice recommendations with a substantial improvement in 5-year survival, from 22% in the early 1970s to around 50% in 2001 (Fig. 64.1) [2]. These improvements, particularly in rectal cancer, have been accomplished due to advancements in surgical technique, introduction of neoadjuvant and adjuvant therapies as guided by imaging and, although more debatable, surgical and hospital specialisation and a more multidisciplinary approach.
Males COLON
Males RECTUM
Females COLON
Females RECTUM
100
50
05
06 20
03
04
20
20
02
20
01
20
00
Year of death
20
99
20
98
97
19
19
96
19
95
94
93
19
19
19
92
91
19
19
81
19
71
0 19
Fig. 64.1 Age standardised mortality rates for colorectal cancer from 1971 to 2006. Source data: Mortality statistics – deaths registered in 2006 Review of the Registrar General on deaths in England and Wales (2006)-Office for National Statistics Newport
150
19
Rate per million population
200
64
Colorectal Cancer Surgery: Current Trends and Recent Innovations
than colonoscopy across all polyp size thresholds, reaching statistical significance for polyps between 5 and 10 mm [5]. A further drawback of CT colonography is the radiation dose involved. One available alternative method that has also been evaluated is magnetic resonance colonography (MRC). This has fewer potentially serious complications, a good patient acceptability and the additional practical advantage of eliminating the need for formal bowel preparation when used in conjunction with faecal tagging techniques. A meta-analysis by Purkayastha and colleagues showed that the overall diagnostic accuracy of MRC was reasonable, with a sensitivity of 75% (95% CI: 47–91%) and a specificity of 96% (95% CI: 86–98%). The accuracy was improved for malignant lesions, with a sensitivity of 91% (95% CI: 79–97%) and specificity of 98% (95% CI: 96–99%) [6]. It was not feasible to analyse the diagnostic accuracy of MRC based on polyp size as these data were not available. The findings were also limited by the heterogeneity of studies analysed, as they incorporated a wide range of fundamentally different radiological techniques, including prepared and unprepared bowel, faecal tagged, not tagged and bright and dark-lumen methods. Further evaluation of the MRC tool with a detailed comparison of different methods employed is warranted.
64.2.2 Colon Cancer Staging Contrast-enhanced CT scan of the thorax, abdomen and pelvis is the standard practice for assessing the local extent of disease and the presence of lung or liver metastases. Although its predictive value in T-stage and nodal status is not high, a recent paper reported it at 60 and 62%, respectively, and its prognostic ability was found in the same publication to be similar to histopathology criteria. This has implications in trying to identify patients who may be selected for investigating the benefits of neoadjuvant chemotherapy in high-risk groups. Such a study, FOxTROT, which investigates the role of preoperative chemotherapy and antibody therapy in high-risk operable colon cancer, is about to start recruiting patients. The major drawback of both staging CT and CT colonography is that no functional tumour data are available. [18F]-Fluoro-2-deoxy-d-glucose (FDG)
817
positron emission tomography (PET) scanning is a well-established method utilising metabolic characteristics of tumours that is especially useful for detecting small, active metastatic deposits. A combination of CT and PET enables both functional and anatomical data. Published studies have shown improved accuracy of a combined CT and PET approach in the detection and characterisation of colorectal cancer lesions when compared with CT or PET alone. Recent reports of a dedicated PET/CT protocol defined for colorectal cancer and including colonography includes experience with over 50 patients with encouraging results [7]. PET/CT-colonography showed a statistically significant improvement in overall TNM staging accuracy, with 37/50 patients (74%) correctly classified compared to 22/50 patients (44%) with CT staging alone (P < 0.05). T-stage was correct in 42/50 patients (84%) and N-stage was correctly classified in 41/50 patients (82%) [7]. Advantages of PET/CT may derive from the detection of small colonic wall tumours and metastastic lymph nodes and early infiltration of surrounding organs. This technique may also be valuable for evaluation of local cancer recurrence, especially when the diagnosis by conventional optical colonoscopy is impaired due to scar stenosis or tumour growth outside of the anastomosis. With regards to patient management, in one study, a change in the therapy regimen was detected in 9% of patients who underwent PET/CT-colonography over CT-staging alone [8].
64.2.3 Rectal Cancer Staging Rectal cancer is defined as a tumour with its lower edge within 15 cm from the anal verge, which represents about 35–40% of colorectal tumours, and about 50% of these lesions are low-lying. Local recurrence (LR) occurs frequently in patients with transmural or node-positive rectal cancers despite radical surgery, and the incidence is directly related to the tumour stage. Radiotherapy has shown a definitive impact on local control and survival in rectal cancer alone and in combination with chemotherapy, and will be discussed further later. Accurate staging and assessment of treatment response often includes a multi-modality approach.
818
O. Priest et al.
64.2.4 Endorectal Ultrasound Staging in Rectal Cancer Indications for endorectal ultrasound (ERUS) in rectal cancer include determination of the suitability of a large polyp or small rectal cancer (T1 stage) for endoscopic mucosal resection or transanal excision. It can also determine whether preoperative chemotherapy and radiation is needed in larger rectal tumours (T3–4 or N1 stage) as well as in surveillance after surgery for rectal cancer. The accuracy of ERUS for assessing local depth of invasion of rectal carcinoma (T stage) ranges from 80 to 95%. This compares to 75–85% for magnetic resonance imaging (MRI) and 65–75% for CT alone. The major limitation with ERUS in assessing T stage is overstaging of T2 tumours, as ultrasound cannot distinguish inflammation around the tumour from actual malignant tissue. ERUS is less accurate for lymph node staging as there is overlap between the echo features of benign, inflammatory and malignant lymph nodes. Typical reported accuracy rates range from 70 to 75%, again comparing favourably with CT (55–65%) and MRI (60–65%) (Table 64.2) [9]. A recent study compared the ability of ERUS and two MRI coils to locally stage rectal carcinoma [10]. ERUS and either body coil MRI or phased-array coil MRI were employed preoperatively in 49 patients and the imaging findings compared with histological examination of the resected specimen. The accuracy of ERUS was 70% for local T-staging, compared to 43% for body coil MRI and 71% for phased-array coil MRI. For N stage, the accuracy of ERUS, body coil MRI and phased-array coil MRI was 63, 64 and 76%,
Table 64.2 T staging accuracy of ERUS Paper Year of No of T staging study patients Accuracy (%)
N staging Accuracy (%)
Badger et al.
2007
95
72
69
Zammit et al.
2005
78
80
77
Nesbakken et al.
2003
81
74
65
Garcia-Aguilar et al.
2002
545
69
64
respectively. ERUS had the best sensitivity for T staging of 80% and the same specificity (67%) as phasedarray coil MRI. For N stage, phased-array coil MRI had the best, albeit unsatisfactory sensitivity of 63% and the same specificity (80%) as the other methods [10]. Three-dimensional ERUS image reconstruction may improve the accuracy of ERUS and thereby decrease errors in staging. Kim et al. recently published a study comparing the efficacy of three-dimensional endoscopic ultrasound (EUS) with that of two-dimensional ERUS and CT for rectal cancer staging. Preoperative evaluation was completed in 86 patients. The accuracy for T-staging was 78% for three-dimensional ERUS, 69% for two-dimensional ERUS and 57% for CT, with a poor recorded accuracy for lymph node metastases of 65, 56 and 53%, respectively [11].
64.2.5 ERUS Restaging of Rectal Cancer After Chemoradiation Post-radiation oedema, inflammation, necrosis and fibrosis all reduce the accuracy of ERUS for staging rectal cancer after radiation therapy. Vanagunas et al. [12] aimed to verify the accuracy of ERUS in staging rectal cancer after neoadjuvant chemoradiation in a large cohort of patients, performing ERUS before and after concurrent 5-fluorouracil (5-FU) and radiotherapy in 82 patients with recently diagnosed locally advanced rectal cancer. All patients underwent subsequent surgical resection and complete pathologic staging. After chemoradiation, 16 patients (20%) had no residual disease at pathologic staging (T0N0) and were excluded from the study. The overall accuracy of ERUS after chemoradiation for pathologic T-stage was only 48%, as it was unable to accurately distinguish post-radiation changes from residual tumour. The T-staging was correct prior to surgery in 23 of the 56 responders (41%) and in 16 of the 24 nonresponders (67%) [12]. A subsequent study comparing the accuracy of EUS staging for rectal cancer before and after chemoradiation recorded 86% (57/66) accuracy prior to treatment, falling to only 50% after chemoradiation [13]. EUS overstaging T3 tumours accounted for most of this inaccuracy.
64
Colorectal Cancer Surgery: Current Trends and Recent Innovations
64.2.6 EUS for Detection of Recurrent Rectal Cancer Patients undergoing low anterior resection of rectal cancer generally have higher rates of local cancer recurrence than those with colon cancer, and the risk of recurrence is greatest in the first 2 years. LR is estimated to occur in 8–50% of patients. A number of studies have shown EUS to be very accurate in detecting recurrent rectal cancer at or near the anastomotic site, with EUS-guided fine needle aspiration being able to provide tissue confirmation [14]. A recent United States consensus guideline recommends performing EUS or flexible sigmoidoscopy at 3–6-monthly intervals for the first 2 years after rectal cancer resection for the purpose of detecting a surgically curable recurrence [15].
819
Difficulties arise with deep T2 tumours or minimal T3 tumours that just breach the muscularis propria, and in delineating T1 from T2 tumours. The MERCURY study is a prospective European, multicentre, multidisciplinary study to assess the diagnostic accuracy, feasibility and reproducibility of MRI in predicting the final histopathological staging of tumour within 1 mm of the CRM. From 408 consecutive patients presenting with all stages of rectal cancer, it was shown that the specificity for prediction of a clear margin by MRI was 92% (327/354, 90–95%). Magnetic resonance imaging gave more accurate information than digital rectal examination in 245 patients assessed by both methods undergoing primary surgery or short-course radiotherapy followed by immediate surgery. The study showed also that following the intervention, the use of preoperative chemoradiation in the treatment of advanced rectal cancer increased with a resultant significant reduction in rates of positive circumferential margins [18].
64.2.7 MR Imaging in Rectal Cancer
64.2.8 Determining Tumour Characteristics and Outcomes Magnetic resonance imaging (MRI) has been shown to from Imaging be the investigation of choice in rectal cancer surgery. This allows detailed assessment of the tumour in relation to the mesorectal fascia. High-resolution MRI can depict the level of tumour invasion, depth of extramural spread, lymph node involvement, vascular invasion and involvement of the mesorectal fascia and perforation of the peritoneal reflection by tumour, all of which are crucial in determining management options. It permits proper assessment of higher rectal lesions beyond the range of a rigid endoscopic rectal ultrasound probe. Although overall accuracy of staging with MRI may be similar to that obtained by ERUS, Gina Brown showed that MRI accurately predicts curative resection in rectal surgery by identifying important surgical and pathological risk factors, and so helps identify patients with more advanced primary tumours who will benefit from neoadjuvant treatment [16]. This is where it is superior to ERUS. In particular, it can identify patients where the circumferential margin will be positive or threatened i.e. tumour extending to or within 1 mm of the mesorectal fascia is considered an involved circumferential resection margin (CRM) and so at high risk of LR, and it can also identify patients with extramural vascular invasion. MRI has problems determining T stage in early rectal tumours as reported by Videhult et al. [17].
Clinical staging and histology are not always effective prognostic indicators, and efforts to look at individual tumour biology, particularly with equivalent stage and grade, may optimise clinical outcomes. Therapy outcome is influenced by the presence of hypoxic areas inside the tumour and by the uptake and retention of chemotherapeutic agents within the tumour tissue. Tumour hypoxia is well recognised to potentiate resistance to chemotherapy and radiotherapy, and so, if it can be identified, modulators may be used to overcome this, thereby tailoring treatment to individual needs. More recently, dynamic contrast-enhanced magnetic resonance imaging (DCE MRI) and colour Doppler ultrasound have been evaluated. These indirect methods have the advantage that they are non-invasive, can be performed in situ rather than on ex vivo specimens when using histopathological markers and may be used to monitor response to treatment. Ogura et al. performed transanal colour Doppler ultrasound in 46 patients with Dukes B and C rectal carcinoma [19]. They showed that quantification of a vascular point index by colour Doppler was a valuable indicator for detecting angiogenesis. Surrogate markers of tumour oxygenation such
820
as perfusion indices and diffusion coefficients obtained by dynamic or diffusion-weighted magnetic resonance imaging have been shown to be of predictive value for therapy outcome in patients undergoing neoadjuvant chemoradiation for primary rectal carcinoma. Therefore, imaging modalities may help tailor therapy for individual needs.
64.2.9 Sentinel Lymph Node Mapping in Colorectal Cancer Accurate lymph node staging in CRC is important not only for prognosis but also for selecting patients for adjuvant therapy. The pTNM classification of colorectal cancer is based on the histopathological work-up of 12 lymph nodes. Papers have shown that stage II patients with fewer than ten lymph nodes sampled have a poorer prognosis presumably due to understaging either due to incomplete harvest of lymph nodes or inadequate histopathological evaluation. Several authors recommend techniques to improve lymph node detection in surgical specimens such as fat clearing. Modern diagnostic methods like immunohistochemistry and reverse transcriptase polymerase chain reaction (RT-PCR) can increase the detection of micrometastatic disease. These techniques are labour-intensive and expensive and cannot routinely be applied to all lymph nodes found. Identifying suspicious lymph nodes intraoperatively based upon size is not useful since 69% of metastatic nodes are smaller than 5 mm in size. Sentinel lymph node (SLN) biopsy using patent blue dye (Fig. 64.2) and/or a radioisotope has become a common practice in breast cancer surgery and has
O. Priest et al.
been investigated in colorectal cancer. The principle of SLN mapping is that it seeks to identify the first draining lymph node from the primary tumour as the status of this lymph node will determine the nodal stage of the patient. In breast cancer, SLN mapping reliably predicts axillary node status in 98% of all patients and 95% of those who are node-positive. In breast cancer surgery, the SLN status will determine whether the patient undergoes therapy to the axilla (radiotherapy or surgery). In colorectal surgery, the primary aim is to optimise nodal staging rather than reduce surgical therapy or guide the extent of lymphadenectomy. The use of detailed immunohistochemical techniques in the examination of the targeted sentinel node may upstage tumours from stage II to stage III where the evidence for benefit of adjuvant chemotherapy is most definite, thereby avoiding under-treatment of these patients. Published studies on SLN biopsy in colorectal cancer show considerable variation in the practical definition of the sentinel node, in the detection techniques (either in vivo or ex vivo), in the time interval chosen between dye injection and SLN detection, in the histopathological techniques applied and in the characteristics of the patient groups studied. As a result, the reported detection rate, sensitivity and rate of false negative lymph nodes varies considerably. Successful identification of SLNs has been reported between 96 and 100% in large studies, with the SLN accurately reflecting the tumour status of the nodal basin in 92–96% of cases. Most studies report that 10–30% of patients with negative nodes staged by conventional histopathology are upstaged due to the detection of micrometastases or isolated tumour cells with immunohistochemistry of the SLN [20]. The prognostic significance of micrometastases has yet to be determined, with some studies showing similar outcomes to node negative patients. Prospective randomised trials are needed to address SLN upstaging, the biology of micrometastatic disease and the benefit of adjuvant therapy in patients with micrometastaticonly node positive disease.
64.2.10 The Role of Chemotherapy in Colorectal Cancer Fig. 64.2 Sentinel lymph node in a right hemicolectomy specimen identified ex vivo using patent blue dye with the lymphatic drainage from the tumour to the sentinel node arrowed
The randomised trial involving the North Cancer Treatment Group, the Southwest Oncology Group and the Eastern Cooperative Oncology Group in 1990
64
Colorectal Cancer Surgery: Current Trends and Recent Innovations
demonstrated that 5-FU plus levamisole reduced the recurrence rates by 40% and death rates by 33% in patients who underwent curative resection of Stage III colon cancer. For patients with advanced disease, 5-FU in combination with folinic acid has been shown to improve objective response rates when compared to 5-FU alone (10–23%), although the median survival was not affected (11.5 vs. 11 months). With the introduction of chemotherapeutic agents such as oxaliplatin and irinotecan in the last decade, the median overall survival of patients with advanced CRC has improved from 12 to about 18–21 months. Also, these strategies have improved outcomes in the adjuvant setting and have increased the number of patients that are suitable for liver resection by significant downstaging. In the recent years, Phase III trials have shown that anti-angiogenic drugs such as bevacizumab [monoclonal antibody targeting vascular endothelial growth factor (VEGF)] and cetuximab [monoclonal antibody targeting epidermal growth factor receptor (EGFR)], when combined with chemotherapy, have shown to improve survival in patients with advanced CRC. In patients with metastatic colon cancer who were refractory to irinotecan, the addition of cetuximab improved response rates to over 20% while the addition of bevacizumab to irinotecan and 5-FU also increased response rates and progression-free survival in patients with previously untreated metastatic disease. Studies are ongoing to assess the role of bevacizumab and cetuximab in adjuvant setting in the UK and USA (QUASAR II and the NCI sponsored NCCTG-N0147 trials, respectively).
64.3 Rectal Cancer Surgery 64.3.1 Total Mesorectal Excision Despite advances in adjuvant therapy, surgery still remains the only curative treatment in colorectal cancer surgery. The CRM describes the soft tissue margin closest to the deepest tumour invasion, either retroperitoneal tissue or adventitial peritoneum. Multivariate analyses have demonstrated that tumour involvement of the CRM is the most important factor in predicting LR in rectal cancer with a similar relationship emerging for colon cancer. Published
821
Line of excision includes mesorectum
Fig. 64.3 Total mesorectal excision based on the description of Heald (1982)
clinical trial data suggest that the risk of LR is strongly increased if tumour encroaches up to 1 mm from the CRM, and the current recommendation is that the clearance of more than 2 mm is necessary for a negative CRM. This has led to the development of the concept of total mesorectal excision (TME), which uses the principle of the excision of the tumour and the mesorectum en bloc complete with the intact mesorectal fascia (Fig. 64.3). This principle is based on the findings of Moynihan in 1908 regarding potential pathways for lymphatic spread and also on the descriptions of Heald. Adherence to this standard results in reduced positive circumferential margin rates for resectable rectal cancers with the resultant low LR rates of 4–7% for resectable rectal cancers published by various authors including Heald and the Mayo Clinic [21]. Adequate training is required to attain such results. In Sweden, the Stockholm Colorectal Cancer Study Group investigated the use of preoperative radiotherapy and reported high LR rates as they did not use TME techniques. The feasibility and benefits of a live operating training program and a structured workshop in training multiple surgeons in the principles of TME surgery was demonstrated by a reduction of crude 2-year LR rates to 6% following its introduction compared with 28% (surgery alone) and 14% (preoperative radiotherapy and surgery group) LR rates in the published Stockholm I trial [22].
822
64.3.2 The Role of Neoadjuvant Therapy in Rectal Cancer Surgery For more advanced rectal cancers, especially where the circumferential margin may be involved despite TME, LR rates are high. Postoperative radiotherapy was shown by Fisher in 1988 to reduce LR rates in rectal cancer surgery, and subsequently, combined chemoradiotherapy was shown to reduce LR rates further. More recently, neoadjuvant (or preoperative) radiotherapy was also shown by Marks et al. to not only reduce LR rates but also to improve sphincter sparing surgery [23]. Neoadjuvant chemotherapy was then shown by Sauer et al. in a randomised controlled trial in 2004 to be associated with reduced LR rates and toxicity compared with a postoperative treatment regimen and so is now the treatment of choice in more advanced rectal cancer where the circumferential margin is threatened [24]. This regimen uses radiotherapy doses ranging from 45 to 50 Gy in 25 daily fractions over 5 weeks with the addition of 5-FU-based chemotherapy, followed by surgery 6–12 weeks after completion of treatment. Significant tumour downstaging reducing positive CRM rates as well as nodal downstaging is often observed. More recently, pathological complete response (pCR) rates of more than 25% with neoadjuvant chemoradiotherapy have been reported. Therefore, Habr-Gama in Brazil has reported safe avoidance of surgery in this group of patients with pCR who undergo close surveillance only. They showed that in patients who had biopsy proven clinical complete response after chemoradiotherapy and did not have surgery, 5-year overall survival was 93% and 5-year diseasefree survival was 85% [25]. All five isolated recurrences were salvaged. However, these results have to be interpreted with caution as they refer only to 99 patients with sustained clinical complete response at least 12 months after treatment, whereas 122 patients were deemed to have clinical complete response after the first assessment of tumour response. The outcomes of the remaining 23 patients have not been reported. Based on Habr-Gama results, a trial of non-operative treatment of complete responders after neoadjuvant chemoradiotherapy for rectal cancer has been proposed by the Royal Marsden using serial clinical, endoscopic and MRI follow up. The role of radiotherapy in resectable rectal cancer remains controversial. The Swedish rectal cancer study showed that short course preoperative radiotherapy
O. Priest et al.
(SCPRT – total of 25 Gy given over 5 days followed by surgery within 10 days) reduced LR and improved survival [22]. However, TME was not used as part of the study protocol and LR rates were extremely high in both groups, whereas surgery alone with TME has been shown to have lower recurrence rates. The Dutch RCT compared SCPRT followed by surgery with TME surgery alone and confirmed a reduction in LR rates in the radiotherapy arm [26]. Interim data from the UK MRC CRO7 were recently published by Sebag Montefiore et al. and confirms that the addition of SCPRT to TME reduces LR from 11.1 to 4.7% at 3 years and improves disease-free survival from 74.9 to 79.5% (P = 0.031) [27]. However there is increasing evidence that this approach is associated with significant toxicity and increased perioperative complications. Accurate preoperative staging of rectal cancer is crucial in determining the most appropriate treatment strategy in rectal cancer management.
64.3.3 Management of Early Rectal Cancer Radical surgery in the form of anterior resection or abdominoperineal resection with TME remains the surgical treatment of choice for potentially curative treatment of rectal carcinoma.
64.3.3.1 Local Transanal Excision of Rectal Cancer Local excision for the management of early rectal cancer is safe and feasible. Mortality rate is 0–2% and morbidity (usually minor) is 0–22%, but restrictions due to the limited view can result in positive resection margins in 12–60% of cases. LR rates have been reported up to 39%, higher than radical resection and therefore not oncologically acceptable for patients suitable for radical surgery.
64.3.3.2 Transanal Endoscopic Microsurgery Transanal endoscopic microsurgery (TEMS) is a technique first described by Professor Gerhard Buess in 1984 for the management of benign rectal tumours [28]. TEMS employs a minimally invasive approach
64
Colorectal Cancer Surgery: Current Trends and Recent Innovations
Fig. 64.4 Forty millimetre diameter rectoscope for TEMS
with direct magnified stereoscopic vision using a rectoscope (Fig. 64.4). It offers several advantages over the traditional transanal excision by providing improved visualisation and exposure, enabling precise resection of tumours located 2–22 cm from the anal verge, with minimal morbidity and mortality, low incomplete excision and LR rates. It has been reported that in carefully selected patients with early rectal cancer, local excision using TEMS can produce LR and disease-free survival rates that are equivalent to the ones achieved after radical excision, particularly in well-differentiated T1 tumours as synchronous lymph node involvement is low. A prospective, randomised trial comparing TEMS local excision and anterior resection in the treatment of T1 rectal adenocarcinoma showed the superiority of the local approach in terms of length of hospital stay, operative blood loss, operation time and postoperative analgesic requirement [29]. There were no significant differences between the two procedures with respect to LR and overall 5-year survival rates. This report demonstrates that TEM offers a safe alternative to major surgery with a curative outcome in the majority of patients with T1 disease, and so avoiding the not insignificant morbidity and mortality of radical resection. Transanal excision may also be considered in a highly selected group of patients who show complete pathological response to neoadjuvant therapy, but as discussed previously, some oncologists are advocating a close surveillance policy without surgery in these patients. The use of TEMS for T2 and poorly differentiated T1 tumour is more controversial as unacceptable high LR rates have been shown in these group of patients compared to those undergoing radical surgery. This is reflected by the increased incidence of concomitant
823
lymph node disease in these groups of patients. Unfortunately, the sensitivity of MRI and EUS in predicting lymph node status is not accurate enough in identifying suitable patients with more advanced T stage for local treatment. The use of chemoradiation may reduce LR rates to more acceptable levels. Stipa et al. recently reported a total of 21 patients who received chemoradiotherapy either preoperatively or postoperatively based on surgeon preference, with a LR rate of 9.5% for T2 cancers, which is not dissimilar to radical surgery [30]. More recently, contact radiotherapy, which allows the use of increased doses without increasing exposure to surrounding structures has been described and may improve local control better than standard chemoradiation alone. To try and define the role of local therapy in the treatment of early rectal cancer, the TEMS Users Group in the UK is planning a feasibility study to determine the best strategy using short-course radiotherapy followed by local excision prior to the design of a trial comparing best local therapy with radical surgery. Without good level 1a evidence, the decision to offer local treatment for rectal cancer must involve all members of the multidisciplinary team (MDT) together with a full discussion with the patient. Currently, the number of patients with early rectal tumours is relatively small (8%), but when colorectal screening is introduced, many more patients may be diagnosed. There is considerable variability observed between surgeons regarding colorectal cancer outcomes, including perioperative morbidity and mortality, use of sphincter-sparing surgery, local control and overall survival. Improved outcomes in colorectal surgery have been documented in specialist units. A number of studies have investigated the influence of both surgeon and hospital characteristics.
64.3.4 Surgeon as a Source of Variability in Outcomes The effect of surgeon case volume on treatment and outcomes has not been proven. Although some studies have shown no association between case volume and different outcomes including mortality and achieving an anastomosis, Harmon et al. reported in a study of 9,739 resections for colorectal cancer that high-volume surgeons had lower in-hospital mortality than low-volume surgeons, after adjustment for patients’ comorbidity, age, sex, tumour stage and other clinical characteristics,
824
although the absolute magnitude of the difference was small (1.9%) [31]. Porter et al. examined the treatment and outcomes of 683 patients undergoing potentially curative resection for rectal cancer and showed that patients were more likely to have a sphincter-sparing anterior resection when they were treated by a surgeon with a high case volume (61.1 vs. 50.8% for low-volume surgeons) [32]. Studies have generally found better outcomes for patients treated by surgeons with more expertise in terms of training or time in practice rather than case volume. Holm et al. found that patients of surgeons who were certified for at least 10 years were less likely to have a local relapse or death from rectal cancer than those operated on by less experienced surgeons, and this was independent of surgeon case volume [33]. Porter et al. found that rectal cancer patients operated on by non-specialty-trained surgeons had worse 5-year local control and cancer-specific survival. Also, specific rectal cancer surgical training in Stockholm reduced permanent stoma rates, and improved LR and cancer-specific survival rates [34].
64.3.5 Hospital and Variability in Outcome Several studies have found an association between hospital volume and outcomes. The American College of Surgeons’ Commission on Cancer analysed survey data regarding the management and outcome of 39,502 cases of colorectal cancer and found sphincter-sparing surgery more likely to be performed in hospitals with large case loads. Schrag et al. found lower 30-day mortality and increased overall survival rate for colon cancer patients undergoing resection in high-volume facilities compared with those undergoing resection at low-volume facilities. Another study concluded that lower-volume hospitals experienced higher overall mortality rates. Tumour control may also be improved in specialist units. Holm et al. found that patients treated in university hospitals had a lower risk of LR and death from rectal cancer than patients treated in community hospitals. These studies controlled for both patient and surgeon characteristics, and therefore, although surgical subspecialisation may improve colorectal cancer outcomes, hospital-related factors play an important role in patient
O. Priest et al.
outcomes following treatment for CRC. This is further exemplified by the finding on reviewing the U.S. National Cancer Data Base that chemotherapy was less likely to be given in hospitals with small cancer case loads, and institutions recognised by the National Cancer Institute were more likely to give adjuvant chemotherapy, with its proven benefit in colorectal cancer treatment. Simons et al. found that 5-year overall survival was statistically significantly better for patients undergoing surgery at hospitals with higher case volume, and more recently, Meyerhardt et al. also showed that low-volume hospitals experienced higher patient mortality, which was not attributable to higher cancer recurrences. This suggests that other factors, not just oncological interventions, affected patient care and outcomes. Therefore, although surgical expertise appears to impact on outcome, the relation of hospital caseload on outcomes in colorectal cancer would suggest that improvements not just during surgery but generally in the patient care pathway from diagnosis, staging, through surgery and adjuvant therapies to follow up will have an impact on all outcomes including survival. This would seem to reflect the influence of the MDT as a whole, including nurses, theatre staff, oncologists and pathologists rather than just the clinician directly responsible for patient care. The importance of the MDT approach to cancer treatment is well established. Burton et al. reported an audit of the impact of MDT meetings on surgical resection margins. Sixty-two out of 178 patients underwent surgery alone without MRI-based MDT discussion, resulting in positive CRM in 16 cases (26%) as compared with 1 out of 116 (1%) in those patients with MDT discussion of MRI [35]. This report highlights the crucial role of discussing clinical details with members of relevant disciplines and reinforces the importance of accurate histopathological communication.
64.4 Laparoscopic Surgery Laparoscopic colorectal resections offer several potential postoperative benefits, including minimal impairment of gastrointestinal and pulmonary function and reduced immunosuppression, resulting in shorter hospital stay and improved recovery. The first report on laparoscopy assisted colectomy (LAC) in colon cancer was published in 1991. A number of studies have shown that oncological resection in
64
Colorectal Cancer Surgery: Current Trends and Recent Innovations
Fig. 64.5 The Handport (Smith & Nephew, UK) used during a hand-assist laparoscopic colectomy (a) and (b) control of the inferior mesenteric artery pedicle before division with EndoGIA stapler is facilitated using the handport
a
terms of resection margins including CRM for rectal cancer and extent of lymphadenectomy is comparable between LAC and open surgery (OS). However, early experiences with laparoscopic colectomy were disappointing, with higher rates of tumour recurrence at the port and incision sites compared with historical studies in OS. Several randomised controlled trials have been published comparing the oncological outcomes of LAC and OS in colorectal cancer. The Clinical Outcomes of Surgical Therapy Study Group [36] recently published their findings from 48 institutions throughout North America. A total of 872 patients were randomly assigned to undergo open (n = 428) or laparoscopically assisted (n = 435) colectomy for adenocarcinoma of the colon. Conversion of laparoscopicassisted to OS occurred in 90 patients (conversion rate 21%). Operating times were significantly longer in the laparoscopic group but postoperative recovery was faster in the laparoscopic group as reflected by briefer use of parenteral and oral analgesics. There were no significant differences reported between the groups in the rates of 30-day postoperative mortality and complications. Oncological outcomes including recurrence rates and disease-free survival were also similar. The CLASICC study in the UK comparing laparoscopic surgery with OS in both colon and rectal cancer also reported similar oncological outcomes in both groups [37]. Surprisingly, little or no advantage in various quality of life measures is observed for laparoscopic surgery as was also seen with the COST study. Of interest is the single-centre RCT reported by Delgado and Lacy from Barcelona, which is the only study demonstrating a significantly higher cancerrelated survival in the LAC group than in the OS group. LAC was independently associated with reduced risk of cancer recurrence and death from a
825
b
cancer-related cause compared with OS. The superiority of LAC over OC in these outcomes was only due to differences in patients with stage III tumours [38]. The use of laparoscopic surgery is associated with the lack of tactile feedback afforded by OS and poor depth perception with two-dimensional display. To overcome these problems, hand-assist laparoscopic surgery (HALS) has been developed. With this procedure, the surgeon inserts a hand into the peritoneal cavity to aid in the laparoscopic dissection while the pneumoperitoneum is maintained. A number of devices have been developed to facilitate this approach including the HandPort (Smith & Nephew, UK, Fig. 64.5) or the GelPort (Applied Medical, USA). This approach utilises, from the start of surgery, an incision that would be necessary for the retrieval of the specimen later during the operation. Marcello et al. have recently shown that its use may result in reduced operating time without compromising the oncological outcome compared with conventional laparoscopic surgery for left-sided and total colonic resections. Therefore, it has been shown that LAC is at least the equivalent of OS in terms of oncological outcomes in the short term at least. Long-term outcomes are awaited.
64.5 Enhanced Recovery Originally developed by Kehlet’s group in Copenhagen, a multimodal enhanced recovery programme (ERP) for elective large bowel OS was designed to improve postoperative recovery and avoid common barriers to early hospital discharge, such as the need for parenteral analgesics, intravenous fluids, slow patient mobilisation and inadequate home care. The main elements of so-called
826
“fast-track” programmes in colonic surgery encompass preoperative measures such as extensive preoperative counselling, no bowel preparation, carbohydrate-loaded liquids until 2 h before surgery, anaesthetic strategies including the use of thoracic epidural catheters and short-acting anaesthetics and early and enhanced postoperative feeding and mobilisation. Kehlet was able to reduce median length of stay to 2 days with a low readmission rate [39]. Authors have shown that the use of an ERP abolishes the previously observed differences in recovery between LAC and OS [40]. The EnROL study, a randomised controlled trial in the UK, is about to compare LAC with OS in colorectal cancer surgery with both groups managed within an ERP.
64.6 Robotic Colorectal Surgery The potential advantages of robotics in surgery such as the da Vinci system (Intuitive, USA, Fig. 64.6) include: a stable camera platform to eliminate hand-shake from a camera holder; hand-like motions of the instruments permitting a variety of tasks not possible with traditional straight laparoscopic instruments to facilitate bowel dissection and suturing; a three-dimensional virtual operative field, with improved spatial awareness as compared with standard two-dimensional imaging systems; an ergonomically comfortable position in which to sit at the remote master unit. However, there are a variety of disadvantages that have limited its use in colorectal surgery, unlike prostate cancer surgery where its
O. Priest et al.
use has increased exponentially in the USA. These include the excursion arcs of the robotic arms and the length of the surgical instruments. Some difficulties are reported with the robotic instruments reaching both up to the splenic flexure and down into the pelvis. The robotic arms are unable to self-adjust around the bed to allow the surgeon access to more than one quadrant of the abdominal cavity. Further disadvantages are a lack of haptic feedback, so that the surgeon must rely on visual cues to estimate the tension exerted on tissue. Weber et al. reported the first robotic-assisted laparoscopic colectomy in 2002 [41]. Since then, a number of series have been published from specialised centres in the United States. Outcomes in laparoscopic colonic resection are difficult to compare from one series to another due to the wide range of reported operations. Surgeons have focused primarily on feasibility and duration of surgery. Typically, dissection has been accomplished with the da Vinci system with a stapled anastomosis. The design characteristics of the da Vinci favour a medial to lateral dissection [42]. Standard laparoscopic and stapling instruments are often used for splenic flexure mobilisation and division of the mesentery and bowel anastomosis. Several studies have shown that robotic colectomies are safe and feasible with comparable morbidity and mortality. However, longer operating times are invariably reported as being usually related to longer setting up times, and there is a marked additional cost for robotic surgery (Table 64.3). There are few data reported on oncological outcomes but one series at least has demonstrated satisfactory outcomes in terms of distal and circumferential margins and lymph node harvest for rectal cancer [43]. More studies are required to determine if there is a role for robotics systems in colorectal surgery.
64.7 Future Perspectives 64.7.1 Telementoring and Remote Telepresence Surgery
Fig. 64.6 The Da Vinci slave unit during robot-assisted rectal dissection with the master unit in the background
Telementoring enables real-time advice and instructions to be given from a watching mentor. It relies upon twoway video conferencing to transmit live laparoscopic images from a distant site to the mentor, an expert surgeon based at the teaching centre. Several studies have
McMaster University
Camposampiero, Padova
Humboldt University, Berlin
North Shore University Hospital, NY
Hackensack University, NJ
Anvari
D’Annibale
Braumann
DeNoto
Ballanty
2006
2006
2005
2004
2004
2004
16
11
5
53
10
35
All benign disease
Sigmoid cancer and Sigmoid diverticulosis
Colon cancer n = 2 Rectal cancer n = 1 Diverticulosis n = 2
Colon cancer n = 14 Rectal cancer n = 8 Benign disease n = 31
Colon cancer n = 4 Rectal cancer n = 2 Benign n = 4
Diverticular disease and benign polyps
Colon cancer n = 6 Rectal cancer n=6 Anal melanoma n=2 Caecal lipoma n = 2
RALC
da Vinci RALC
da Vinci TRLC
da Vinci RALC
da Vinci TRLC
Zeus
da Vinci RALC
da Vinci RALC
249 (180–330)
197 (145–345)
201 (80–300)
240 ± 61
155 ± 14
177
211 (90–360)
6
9
40
9
0
14
0
0
0
0
0
0
NR
0
One reoperation for bleeding Two prolonged ileus
0
One enterovesical fistula
Two surgical – needed reoperation Two general
0
NR
0
4.5 range 2–10
3.4 ± 0.5
13.6 ± 4.7
10 ± 4
5.3 ± 0.95
NR
NR
LOS days (M ± SD)
Johns Hopkins University
16
Mortality Complications
Hanly
2003
Conversion %
Misericordia Hospital, Grosseto, Italy
Robot-assisted Operative (RALC) or total time min M ± robotic (TRLC) SD (range)
Giulianotti
System
Institution
Author
Indications
Table 64.3 Published series for robotic-assisted colectomy n
Colorectal Cancer Surgery: Current Trends and Recent Innovations
Year
64 827
828
validated the effectiveness of telementoring and teleconsultation for teaching during live surgery. Roboticassisted telepresence surgery (ARTS) aims to improve the telementoring experience by providing a mentor telepresence in the remote operating room, who can assist or perform the surgery telerobotically as required. Anvari et al. recently reported the McMaster experience in telepresence surgery, which included three laparoscopic right hemicolectomies, one laparoscopic anterior resection and three laparoscopic sigmoid resections [44]. Time delays in telecommunication of 150–200 ms were experienced during most procedures, and the surgeons found that when the delay exceeded 250 ms, performing tasks telerobotically became slow and less accurate.
O. Priest et al.
64.7.3 Real-Time Intra-Operative Anatomy and Histology Terahertz pulsed imaging and Fourier transform infrared spectroscopy are emerging technologies in medical imaging. They identify differences in tissue density by using electromagnetic waves in or near to the infrared region (the terahertz region). These techniques have shown tremendous experimental potential in defining different types of tissue, including distinguishing benign from malignant tissue. The feasibility of terahertz probes within the abdominal cavity has been demonstrated with in vivo studies during OS, and it has the ability to perform real-time histological analysis of tissues within the operative field that may compensate for the lack of tactile feedback inherent in minimal access techniques.
64.7.2 New Devices for Robotic Surgery The limitations of the da Vinci robot are discussed above. A solution to these problems while maintaining enhanced vision and tissue manipulation is to miniaturise surgical robots for use as intra-abdominal devices. Engineers have increasing capability to produce micro- and nanoscale components and machinery, and a group at the University of Nebraska has developed a prototype intra-abdominal mobile robot with the potential to enhance the safety of minimal access surgery [45]. Using an animal model and under endoscopic control, a gastrotomy was created, and the miniature robot was deployed into the abdominal cavity under remote control. The robot is 12 mm in diameter and 75 mm in length. This prototype endoluminal mobile robot was connected to a power cable during the porcine surgery, but a wireless in vivo mobile robot also has been developed. The gastrostomy incision was successfully closed endoscopically using two endoclips and one endoloop. The robot was then retracted back through the oesophagus. Robotic systems can also be interfaced with CT, magnetic resonance imaging scans and ultrasound information, creating a virtual surgical environment. This virtual scenario enables simulated surgery to be performed based on a patient’s individual imaging information, facilitating the preparation of operative strategies for challenging and complex cases.
64.7.4 Natural Orifice Transluminal Endoscopic Surgery The essential concept of Natural Orifice Transluminal Endoscopic Surgery (NOTES) was first described by Wilk [46]. The endoscopic approach offers not only the cosmetic benefit of scar-less surgery but also the potential to improve short-term postoperative outcomes. Surgery could be performed without the complications of pain from abdominal incisions, wound infections, hernias, adhesions and impaired immune function from the surgical stress response. Patients in high-risk groups for conventional surgery could have the greatest benefit from a transluminal approach. Since researchers at Johns Hopkins University published the first report of NOTES transgastric peritoneal exploration for liver biopsy in a pig model, there have been several reports of various surgical procedures performed via transluminal endoscopic access in animal experiments, including transgastric gastrojejunostomy, cholecystectomy, hepatic wedge resection, appendicectomy, hernia repair, partial hysterectomy and splenectomy. Alternative modes of entry into the peritoneal cavity have been recommended including transvaginal, transvesical and transcolonic approaches [47–49]. Marescaux et al. performed the first transvaginal cholecystectomy in a human, using a double-channel
64
Colorectal Cancer Surgery: Current Trends and Recent Innovations
flexible gastroscope and standard endoscopic instruments, having first obtained pneumoperitoneum using a 2 mm transumbilical needle-port [50]. A reported series demonstrated the transvaginal cholecystectomy to be clinically safe and feasible. Surgeons in Hyderabad, India presented the first case of transgastric appendicectomy in humans. The NOSCAR (Natural Orifice Surgery Consortium for Assessment and Research) working group was formed in 2005. Fundamental problems for NOTES identified by the group include access to peritoneal cavity, safe and effective gastric or intestinal closure, maintaining spatial orientation and managing iatrogenic intraperitoneal complications. Laparoscopic surgery for CRC has been proven to be as effective as OS and is associated with minimal morbidity and mortality. The idea that natural orifice transluminal surgery can offer advantages beyond those of established minimal access or OS techniques remains theoretical. Large, well-designed long-term follow-up studies are needed to demonstrate potential advantages of scar-less access to the peritoneal cavity such as prevention of hernias, adhesions and wound pain. Specific indications for transluminal surgery need to be defined, with identification of patient groups and disease conditions that are best suited to this approach. Guidelines for the safe introduction of NOTES have yet to be agreed. Critical areas for research are the development of safe devices for access and closure and more advanced therapeutic endoscopes. Future NOTES applications include bowel resection with anastomosis and cancer staging using peritoneal and lymph node biopsies: reliable instruments and tissue approximation techniques are required before these are able to be translated into clinical practice.
64.7.5 Individualisation of Treatment There is great heterogeneity in colorectal cancers at both the gene and molecular level, which is reflected by the variability in response to chemotherapy and radiotherapy. As discussed previously, it may be possible to predict response of rectal cancers to neoadjuvant chemoradiotherapy by the assessment of tumour perfusion and oxygenation. Thus, if the tumour is found to be relatively poorly oxygenated, then hypoxic modulators may be used to overcome the hypoxic-induced chemoradiation resistance. For example, tirapazamine,
829
a drug that is highly toxic to hypoxic but not to aerobic cells, has already demonstrated efficacy in selective potentiation of cisplatin in randomised Phase III trials with non-small cell lung cancer. Also, in the future, it may be possible to identify which patients may respond to different chemotherapy regimens. Cetuximab (a monoclonal antibody targeting EGFR), has been found to be persistently effective in colon cancers that do not express this receptor. More recently, it has been found that the presence of k-ras mutations is a predictor of response to cetuximab. Therefore, other markers may also be found which will help determine the optimal therapies for each patient, particularly in selecting which biological treatment to use.
References 1. Jemal A, Siegel R, Ward E et al (2008) Cancer statistics, 2008. CA Cancer J Clin 58:71–96 2. Coleman MP, Rachet B, Woods LM et al (2004) Trends and socioeconomic inequalities in cancer survival in England and Wales up to 2001. Br J Cancer 90:1367–1373 3. Smith GA, O’Dwyer PJ (2001) Sensitivity of double contrast barium enema and colonoscopy for the detection of colorectal neoplasms. Surg Endosc 15:649–652 4. Mulhall BP, Veerappan GR, Jackson JL (2005) Metaanalysis: computed tomographic colonography. Ann Intern Med 142:635–650 5. Rosman AS, Korsten MA (2007) Meta-analysis comparing CT colonography, air contrast barium enema, and colonoscopy. Am J Med 120:203–210.e204 6. Purkayastha S, Tekkis PP, Athanasiou T et al (2005) Magnetic resonance colonography versus colonoscopy as a diagnostic investigation for colorectal cancer: a meta-analysis. Clin Radiol 60:980–989 7. Kinner S, Antoch G, Bockisch A et al (2007) Whole-body PET/CT-colonography: a possible new concept for colorectal cancer staging. Abdom Imaging 32:606–612 8. Veit-Haibach P, Kuehle CA, Beyer T et al (2006) Diagnostic accuracy of colorectal cancer staging with whole-body PET/ CT colonography. JAMA 296:2590–2600 9. Romagnuolo J, Parent J, Vuong T et al (2004) Predicting residual rectal adenocarcinoma in the surgical specimen after preoperative brachytherapy with endoscopic ultrasound. Can J Gastroenterol 18:435–440 10. Bianchi PP, Ceriani C, Rottoli M et al (2005) Endoscopic ultrasonography and magnetic resonance in preoperative staging of rectal cancer: comparison with histologic findings. J Gastrointest Surg 9:1222–1227; discussion 1227–1228 11. Kim JC, Kim HC, Yu CS et al (2006) Efficacy of 3-dimensional endorectal ultrasonography compared with conventional ultrasonography and computed tomography in preoperative rectal cancer staging. Am J Surg 192:89–97
830 12. Vanagunas A, Lin DE, Stryker SJ (2004) Accuracy of endoscopic ultrasound for restaging rectal cancer following neoadjuvant chemoradiation therapy. Am J Gastroenterol 99: 109–112 13. Maor Y, Nadler M, Barshack I et al (2006) Endoscopic ultrasound staging of rectal cancer: diagnostic value before and following chemoradiation. J Gastroenterol Hepatol 21: 454–458 14. Sasaki Y, Niwa Y, Hirooka Y et al (2005) The use of endoscopic ultrasound-guided fine-needle aspiration for investigation of submucosal and extrinsic masses of the colon and rectum. Endoscopy 37:154–160 15. Rex DK, Kahi CJ, Levin B et al (2006) Guidelines for colonoscopy surveillance after cancer resection: a consensus update by the American Cancer Society and US MultiSociety Task Force on Colorectal Cancer. CA Cancer J Clin 56:160–167; quiz 185–186 16. Brown G, Radcliffe AG, Newcombe RG et al (2003) Preoperative assessment of prognostic factors in rectal cancer using high-resolution magnetic resonance imaging. Br J Surg 90:355–364 17. Videhult P, Smedh K, Lundin P et al (2007) Magnetic resonance imaging for preoperative staging of rectal cancer in clinical practice: high accuracy in predicting circumferential margin with clinical benefit. Colorectal Dis 9:412–419 18. MERCURY (2006) Diagnostic accuracy of preoperative magnetic resonance imaging in predicting curative resection of rectal cancer: prospective observational study. BMJ 333:779 19. Ogura O, Takebayashi Y, Sameshima T et al (2001) Preoperative assessment of vascularity by color Doppler ultrasonography in human rectal carcinoma. Dis Colon Rectum 44:538–546; discussion 546–548 20. Des Guetz G, Uzzan B, Nicolas P et al (2007) Is sentinel lymph node mapping in colorectal cancer a future prognostic factor? A meta-analysis. World J Surg 31:1304–1312 21. Heald RJ, Daniels I (2005) Rectal cancer management: Europe is ahead. Recent Results Cancer Res 165:75–81 22. Folkesson J, Birgisson H, Pahlman L et al (2005) Swedish Rectal Cancer Trial: long lasting benefits from radiotherapy on survival and local recurrence rate. J Clin Oncol 23:5644–5650 23. Marks GJ, Marks JH, Mohiuddin M et al (1998) Radical sphincter preservation surgery with coloanal anastomosis following high-dose external irradiation for the very low lying rectal cancer. Recent Results Cancer Res 146: 161–174 24. Sauer R, Becker H, Hohenberger W et al (2004) Preoperative versus postoperative chemoradiotherapy for rectal cancer. N Engl J Med 351:1731–1740 25. Habr-Gama A, Perez RO, Proscurshim I et al (2008) Absence of lymph nodes in the resected specimen after radical surgery for distal rectal cancer and neoadjuvant chemoradiation therapy: what does it mean? Dis Colon Rectum 51:277–283 26. Peeters KC, Kapiteijn E, van de Velde CJ (2003) Managing rectal cancer: the Dutch experience. Colorectal Dis 5: 423–426 27. Sebag-Montefiore D, Steele R, Quirke P et al (2006) Routine short course pre-op radiotherapy or selected post-op chemoradiotherapy for resectable rectal cancer: preliminary results of the MRC CR07 randomised trial. J Clin Oncol Ann Soc Clin Oncol Abstracts 1:18S
O. Priest et al. 28. Buess G, Hutterer F, Theiss J et al (1984) [A system for a transanal endoscopic rectum operation]. Chirurg 55:677–680 29. Winde G, Nottberg H, Keller R et al (1996) Surgical cure for early rectal carcinomas (T1). Transanal endoscopic microsurgery vs. anterior resection. Dis Colon Rectum 39:969–976 30. Stipa F, Burza A, Lucandri G et al (2006) Outcomes for early rectal cancer managed with transanal endoscopic microsurgery: a 5-year follow-up study. Surg Endosc 20:541–545 31. Harmon JW, Tang DG, Gordon TA et al (1999) Hospital volume can serve as a surrogate for surgeon volume for achieving excellent outcomes in colorectal resection. Ann Surg 230:404–411; discussion 411–413 32. Porter GA, Soskolne CL, Yakimets WW et al (1998) Surgeon-related factors and outcome in rectal cancer. Ann Surg 227:157–167 33. Holm T, Johansson H, Cedermark B et al (1997) Influence of hospital- and surgeon-related factors on outcome after treatment of rectal cancer with or without preoperative radiotherapy. Br J Surg 84:657–663 34. Martling A, Holm T, Rutqvist LE et al (2005) Impact of a surgical training programme on rectal cancer outcomes in Stockholm. Br J Surg 92:225–229 35. Burton S, Brown G, Daniels IR et al (2006) MRI directed multidisciplinary team preoperative treatment strategy: the way to eliminate positive circumferential margins? Br J Cancer 94:351–357 36. COST (2004) A comparison of laparoscopically assisted and open colectomy for colon cancer. N Engl J Med 350:2050–2059 37. Jayne DG, Guillou PJ, Thorpe H et al (2007) Randomized trial of laparoscopic-assisted resection of colorectal carcinoma: 3-year results of the UK MRC CLASICC Trial Group. J Clin Oncol 25:3061–3068 38. Lacy AM, Garcia-Valdecasas JC, Delgado S et al (2002) Laparoscopy-assisted colectomy versus open colectomy for treatment of non-metastatic colon cancer: a randomised trial. Lancet 359:2224–2229 39. Jakobsen DH, Sonne E, Andreasen J et al (2006) Convalescence after colonic surgery with fast-track vs conventional care. Colorectal Dis 8:683–687 40. MacKay G, Ihedioha U, McConnachie A et al (2007) Laparoscopic colonic resection in fast-track patients does not enhance short-term recovery after elective surgery. Colorectal Dis 9:368–372 41. Weber PA, Merola S, Wasielewski A et al (2002) Teleroboticassisted laparoscopic right and sigmoid colectomies for benign disease. Dis Colon Rectum 45:1689–1694; discussion 1695–1696 42. Ballantyne GH, Ewing D, Pigazzi A et al (2006) Teleroboticassisted laparoscopic right hemicolectomy: lateral to medial or medial to lateral dissection? Surg Laparosc Endosc Percutan Tech 16:406–410 43. Hellan M, Anderson C, Ellenhorn JD et al (2007) Short-term outcomes after robotic-assisted total mesorectal excision for rectal cancer. Ann Surg Oncol 14:3168–3173 44. Anvari M (2007) Remote telepresence surgery: the Canadian experience. Surg Endosc 21:537–541 45. Rentschler ME, Dumpert J, Platt SR et al (2007) Natural orifice surgery with an endoluminal mobile robot. Surg Endosc 21:1212–1215 46. Wilk P (1994) Method for use in intra-abdominal surgery. US Patent 5297536
64
Colorectal Cancer Surgery: Current Trends and Recent Innovations
47. Lima E, Rolanda C, Correia-Pinto J (2008) Transvesical endoscopic peritoneoscopy: intra-abdominal scarless surgery for urologic applications. Curr Urol Rep 9:50–54 48. Pai RD, Fong DG, Bundga ME et al (2006) Transcolonic endoscopic cholecystectomy: a NOTES survival study in a porcine model (with video). Gastrointest Endosc 64:428–434
831
49. Rattner D, Kalloo A (2006) ASGE/SAGES Working Group on Natural Orifice Translumenal Endoscopic Surgery. October 2005. Surg Endosc 20:329–333 50. Marescaux J, Dallemagne B, Perretta S et al (2007) Surgery without scars: report of transluminal cholecystectomy in a human being. Arch Surg 142:823–826; discussion 826–827
Urology: Current Trends and Recent Innovations
65
Erik Mayer and Justin Vale
Contents 65.1 Innovation in Urology ............................................... 833 65.2 New Surgical Techniques.......................................... 834 65.2.1 Renal Surgery........................................................... 834 65.3 Prostate Cancer Surgery .......................................... 835 65.3.1 Radical Prostatectomy ............................................. 835 65.4 Bladder Cancer Surgery........................................... 837 65.4.1 Novel Endoscopic Diagnostic Techniques ............... 837 65.4.2 Surgical Approaches ................................................ 837 65.5 Penile Cancer Surgery .............................................. 838 65.6 Molecular and Biological Developments ................. 838 65.6.1 Biomarkers ............................................................... 838 65.6.2 Metabonomics .......................................................... 840 65.6.3 Tissue Engineering ................................................... 840 65.7 Diagnostic Imaging ................................................... 841 65.7.1 65.7.2 65.7.3 65.7.4
PET & PET-CT ........................................................ Magnetic Resonance Spectroscopic Imaging .......... USPIO ...................................................................... Ultrasound ................................................................
841 842 843 843
Abstract This chapter reflects on hot topics of urological research, a speciality which has often been at the forefront of surgical innovation. Relatively recently, we have seen widespread uptake in robotic platforms, which has enhanced the minimally invasive approach for radical treatment of bladder, prostate and renal cancer. Focally delivered energy sources, such as cryotherapy and high-intensity focal ultrasound, are an exciting prospect even as “lessinvasive” treatments but they need to be combined with improved real-time imaging modalities. The combination of diagnostic-enhancing technology with endoscopic techniques, such as hexaminolevulinate (HAL) cystoscopy and endocytoscopy, looks to improve the management of superficial bladder cancer. The diagnosis, staging and prognostic stratification of urological malignancy will be improved through advances in both imaging modalities and molecular and biological markers of disease. These and other areas are discussed in the light of recent research and the areas of future innovation highlighted.
65.8 Screening and Chemoprevention in Prostate Cancer .................................................. 844 References ........................................................................... 845
E. Mayer () Department of Biosurgery & Surgical Technology, Imperial College London, 10th Floor, QEQM Building, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
65.1 Innovation in Urology From the days when urology separated from general surgery as a surgical speciality in the 1920s, urology has been one of the most innovative medical disciplines. The development of the rod-lens system by Harold Hopkins in the 1950s paved the way for minimal access surgery and, for many years, urologists led the way. By the late 1960s, transurethral resection of the prostate (TURP) had already replaced open retropubic prostatectomy as the gold standard treatment of benign prostatic enlargement.
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_65, © Springer-Verlag Berlin Heidelberg 2010
833
834
Another technological innovation for urology was the birth of extracorporeal lithotripsy with the Dornier HM3 lithotriptor in the 1980s. This powerful lithotriptor was able to disintegrate stones in a totally non-invasive manner using shock-waves generated outside the body. Stones up to 1.5 cm in diameter could be fragmented in one session, thus avoiding open pyelolithotomy and a 10 cm flank incision. Although lithotripsy has no widespread use outside the urinary tract, it was still a landmark development in the field of medicine in that it represents highly targeted physical treatment without the need for percutaneous access. Interestingly, while there have been many other technological advances in urology such as the introduction of miniaturised steerable flexible scopes and lasers allowing therapeutic manoeuvres in the most inaccessible parts of the urinary tract, the next wave of innovation in urology has been pharmacological. The 1990s saw the popularisation of alpha-adrenergic antagonist drugs in symptomatic benign prostatic enlargement, and the Holy Grail of an orally effective drug for the treatment of impotence. In less than a decade, alphaadrenergic antagonists converted a traditionally surgical problem into a medical one, thus avoiding surgery in two-thirds of men. The Phosphodiesterase-type 5 (PDE5) inhibitors revolutionised the treatment of up to 80% of men with mild-moderate erectile dysfunction with minimal side-effects. There have been other great advances in the fields of diagnostics. Prostate specific antigen (PSA) was discovered in the late 1970s and is a sensitive tumour marker, resulting in an “epidemic” of prostate cancer diagnosis and screening programmes in many parts of the world. Unfortunately, while highly organ specific, it is not disease specific, and for every cancer diagnosed in a screening programme, two patients undergo negative biopsies. It is likely that the future of prostate cancer diagnostics will be gene products in the urine – PCA3 shows some early promise, although it is likely that ultimately tests will look at multiple gene products to increase specificity. This, along with promising gene products in bladder cancer, will be discussed further in this chapter. Beyond the realms of treatment and diagnostics, the field of urology itself may expand to encompass areas of men’s health more generally. Some urologists regard themselves as the custodians of male health and are working ever more closely with metabolic physicians and cardiologists to look after all aspects of the ageing male.
E. Mayer and J. Vale
65.2 New Surgical Techniques There is little doubt that surgery will become increasingly non-invasive. The laparoscopic revolution will continue to change urological practice such that all primary surgery is likely to be laparoscopic, possibly with the enhanced vision and precision of robotic platforms.
65.2.1 Renal Surgery 65.2.1.1 Local Disease Laparoscopic nephrectomy has established itself as the gold standard in renal cancer, for cancers up to 10 cm in diameter. However, laparoscopic nephron-sparing surgery has been less readily adopted. The reasons for this are multiple: first, haemostasis is challenging even in the open technique; second, the loss of 3-dimensional vision makes judgement of resection margins difficult; and finally, suturing of the collecting system and oversewing of the parenchymal vessels requires great skill as it needs to be precise and rapid to avoid exceeding the established norm for warm ischaemia time [1]. The haemostasis issue may be resolved by advances in laser technology. The argon beam laser is commonly used in open surgery, but as yet does not have a laparoscopic applicator. There have been recent reports of using the green-light laser in laparoscopic partial nephrectomy [2], and the application of contact haemostatic agents (e.g. Surgicel and FloSeal™) is now common place and does reduce blood loss [3]. Ultimately, it is inevitable that the technology will exist to cut into renal parenchyma in a precise bloodless fashion, probably without the use of ischaemia and without the need for subsequent oversewing of parenchymal vessels. As far as judging adequate resection and precise collecting system closure is concerned, the da Vinci™ robot may offer real advance. The three-dimensional vision makes judgement of depth easier, and the EndoWrist™ technology permits precise suturing in a comparatively short time as compared to conventional laparoscopic suturing. This may facilitate water-tight closure of the collecting system (and indeed haemostasis) within an acceptable time-frame, and shorten the learning curve for what is a challenging laparoscopic procedure [4]. There are case series of robotic partial nephrectomy [5],
65 Urology: Current Trends and Recent Innovations
but no hard scientific evidence to support this relatively expensive approach to this problem. Some would argue that nephron-sparing surgery itself will become unnecessary in the future with advances in targeted energy sources. Cryotherapy is assuming increasing acceptance both when administered laparoscopically – perhaps for an anterior tumour – and percutaneously for a posterior tumour [6]. There is good radiological/MRI evidence that this is a reasonable treatment, with shrinkage of the tumour and loss of enhancement [7], but there is limited histological correlation and no adequate longterm survival data. Many series have been restricted to more elderly patients with potentially limited life expectancy on the basis that the treatment is experimental. This is an area in which more research is needed, perhaps in the form of randomisation between partial nephrectomy and cryotherapy, looking at complications, quality of life issues and surrogate measures of oncological efficacy. Until such data is available, we are dependent on observational studies in which renal resection is performed following ablation of tumour tissue, but to date such studies have been plagued by issues of confidently predicting viability or otherwise of ablated tissue. There are of course other means of ablating a renal tumour, and radiofrequency ablation (RFA) is well established in some centres [8]. However, once again data on efficacy are largely derived from shrinkage and lack of enhancement on follow-up imaging and there are minimal histological data or survival data. The ideal treatment would be complete tumour ablation without the need for percutaneous access, much as extracorporeal shock-wave lithotripsy has revolutionised stone disease. High intensity focused ultrasound (HIFU) is under evaluation, but the application of the energy has been plagued by problems relating to acoustic shadows beyond the ribs [9]. However, this is an active area of research and it is likely that this issue will be overcome.
65.2.1.2 Advanced Renal Cancer The extent of lymphadenectomy at resection of renal cell carcinoma continues to provoke debate [10]. Some authors have suggested that more extended lymph node dissection carries survival advantage, while the majority view remains that the dissection is a staging procedure of no direct therapeutic significance. This is an area that needs to be revisited: most of these studies
835
antedate the introduction of tyrosine kinase inhibitors and arguably, for the first time, we have an effective and well-tolerated therapy for adjuvant/salvage use [11]. Thus, it may be that the finding of microscopic lymph node disease at extended lymphadenectomy would be more likely to trigger a change in approach now as compared to 10 years ago. There is also an important political/organisational issue here for the UK healthcare system. Extended lymphadenectomy adds to the complexity of the surgery (particularly, when performed laparoscopically) and is not routine practice in many departments. In the UK, most radical nephrectomies remain within the realms of non-oncological urologists (Improving Outcomes Guidance [12]), and if extended lymphadenectomy were to become the norm, there would have to be a shift in practice towards centralisation of this surgery.
65.3 Prostate Cancer Surgery 65.3.1 Radical Prostatectomy Prostate cancer is the most common cancer in men in the UK; data from the USA indicates that PSA testing for prostate cancer screening has increased dramatically since the mid-nineties and although prostate cancer incidence increased following the introduction of PSA testing, disease-specific mortality rates decreased with greater use of early interventions [13]. A comparison of the effectiveness and adverse effects of treatments for clinically localised prostate cancer [13] suggested better long-term cancer outcome for radical prostatectomy over non-surgical treatment. However, radical prostatectomy also carried higher risks of incontinence and impotence as compared to radiation therapy and androgen deprivation. Therefore, there is a definitive place for surgical intervention to maintain improved oncological outcome, but a need to further minimise morbidity. Radical prostatectomy has traditionally been performed as an open procedure. However, the successful application of a minimally invasive approach to radical prostatectomy by Guillonneau and Vallancien in 1998 initiated this as a feasible surgical option [14]. Since this time, the mid-term outcomes of laparoscopic radical prostatectomy appear promising with regards to complications, oncologic and functional results, achieving equivalence to open surgery [14]. The shift from open to
836
laparoscopic surgery exposed the surgeon to completely new technical challenges; reduction in the range of motion, two-dimensional vision, impaired eye-hand coordination, lack of depth perception and reduced haptic feedback. Further technological innovation saw the application of robotic technology for performing prostatectomy. This was primarily introduced in an attempt to further decrease perioperative morbidity [15] but also to reduce the difficulty in performing complex laparoscopic urologic procedures [16]. The robotic interface provides three-dimensional visualisation, tenfold magnification, seven degrees of freedom of movement, motion scaling, tremor elimination and a more ergonomic working position for the surgeon. Indeed, surgeons inexperienced in laparoscopy were able to perform robot-assisted prostatectomy without the time-intensive training necessary to gain the skills for laparoscopic radical prostatectomy [14]. Currently, choice of surgical approach for radical prostatectomy is determined by a combination of local availability and the personal bias and experience of the surgeon, rather than evidence-based practice. The advantages of a minimally invasive approach to radical prostatectomy are primarily associated with reduced surgical trauma and have been reported as less blood loss, reduced blood transfusion requirements, less postoperative pain, a lower rate of immediate complications, shorter convalescence and better cosmesis [14]. The potential for the robotic-assisted approach to further decrease short-term morbidity and enhance recovery is widely quoted, however such benefits remain anecdotal without appropriate and necessary supportive data [15]. The primary goal of radical prostatectomy is to attain a surgical cure but minimise the impact on quality of life. Success of prostatectomy is governed by complete cancer resection, theoretically eliminating the chance for recurrence [17]. With respect to cancer control, long-term data have confirmed the efficacy of open radical prostatectomy with disease control rates known to be 60–75% at 10 years and cancer-specific survival rate approaching 97% at 10 years [18]. Similar long-term survival data for laparoscopic techniques are not currently available. The National Institute for Clinical Excellence (NICE – UK) published its full guidance on conventional laparoscopic prostatectomy in February 2007, concluding that based on current evidence, there is no difference between conventional laparoscopic and open radical prostatectomy in terms of positive surgical margin rates. Other studies report similar findings [19–21], although
E. Mayer and J. Vale
data limitations, including single surgeon studies, nonrandomised comparative studies, lack of appropriate controls, short follow-up time and differences in the follow-up between the open and laparoscopic procedures limit the efficacy with which these surgical techniques can be compared [22]. Menon et al. [23] and Ahlering et al. [24] compared the outcomes of robot-assisted radical prostatectomy with the open approach, reporting comparable surgical margin rates for both techniques. In a systematic review of the literature, Ficarra et al. [16] concluded that positive surgical margin rates following the robotic procedure could reach percentages similar to those of conventional laparoscopic and open series. However, Menon et al. [25] reported superior oncological outcomes following a robotic-assisted approach compared with open and conventional laparoscopic prostatectomy. Likewise, more recently, Smith et al. [26] have reported a lower positive margin rate for patients undergoing robotassisted radical prostatectomy compared with the open procedure, in the hands of surgeons with significant experience in robotic procedures. Reported differences in functional outcomes such as urinary continence and sexual potency are equally difficult to interpret due to non-standard reporting methods [17]. While differences in oncologic and functional outcomes may be real, the lack of properly performed comparative analyses precludes any proclamations of superiority for any one technique [22]. As discussed by Guillonneau [27] and Berryhill et al., [17] there has been no high quality comparison of the techniques and published data are highly criticisable, with the methods used to collect and interpret these data being equivocal and difficult to reproduce. The need for a randomised clinical trial is widely acknowledged [16, 19, 27, 28]. A randomised trial would allow for an appropriate comparison with the gold standard open procedure [16]. Prostate cancer is a multifocal disease and conventional treatment has assumed the need to treat the entire prostate. The fact that radical prostatectomy specimens usually show more extensive disease and upgrading in approximately 30% of cases would support this approach. On the other hand, low grade, small volume disease can be actively monitored and as a result some researchers suggest a “middle road” of targeted therapy with a lower morbidity risk [29]. Some of the new treatment modalities – such as HIFU – may permit this non-radical approach. There are no outcome data yet to support this contention, but retrospective
65 Urology: Current Trends and Recent Innovations
histological review of radical prostatectomy specimens suggest that in maybe three quarters of radical prostatectomy specimens, the non-index tumours (the smaller tumours away from the main focus of cancer) would have been classified as “clinically insignificant” and highly suitable for active monitoring [30]. This is a highly contentious assumption and would be hard to predict pre-operatively, but with advances in imaging such as HistoScanning™ [31], targeted treatment may become an option in the future, thus avoiding the side effects of radical treatment.
65.4 Bladder Cancer Surgery 65.4.1 Novel Endoscopic Diagnostic Techniques Both rigid and flexible cystoscopy are the major diagnostic modalities for detection of primary and recurrent carcinoma of the bladder. Sensitivity is approximately 98% when combined with biopsy of any suspicious lesions. However, rigid cystoscopy routinely requires general anaesthesia, and biopsy can result in haemorrhage necessitating prior discontinuation of anticoagulants in a frequently aged and unfit population. Early work using endocytoscopy technology proposes to provide accurate histological diagnosis in situ during conventional rigid cystoscopy and therefore avoid the need for biopsy [32]. The benefits of this approach might be greater for outpatient flexible cystoscopy, but miniaturisation of the endocytoscopy systems will be required. Other technologies have tried to improve the visibility of urothelial tumours, particularly small or flat early tumours or carcinoma in situ (CIS). Narrow-band imaging flexible cystoscopy works by enhancing the contrast between urothelial carcinomas and normal urothelium to improve visibility. Vasculature appears dark green or black against an almost white normal urothelium with narrow-band imaging, as compared to red lesions on a background of pink normal urothelium for conventional white-light imaging [33]. This technology has been tested in patients with known recurrent urothelial carcinomas and one study identified 15 additional cancers in 12 of 29 patients not otherwise detected using white-light imaging [33]. This “undetected” small volume disease using white-light imaging might have been labelled as “early recurrence” on
837
follow-up check cystoscopies. The absolute benefit of earlier detection and treatment of low-grade disease on overall prognosis has been questioned [33]. The failure to identify CIS, in or not in, combination with papillary tumours will have treatment and prognostic ramifications. Intravesical instillation of aminolevulinic acid in combination with blue light fluorescence cystoscopy causes neoplastic tissue to emit red fluorescence. A phase III, multicentre study comparing hexaminolevulinate (HAL) cystoscopy with white light cystoscopy for the detection of CIS lesions alone or in the presence of papillary Ta or T1 disease reported that 92% of CIS lesions were detected by HAL cystoscopy as compared to 68% by white light cystoscopy [34]. There was a higher false positive detection rate for HAL cystoscopy (39 vs. 31%) because of fluorescence of inflamed tissue. In real terms, there was no statistically significant difference between the two techniques (sensitivity 87 vs. 83% at the patient level). Time and resource issues have to be considered, but it appears that HAL cystoscopy is useful when used in conjunction with white light cystoscopy.
65.4.2 Surgical Approaches Just as laparoscopy and robotics have changed the landscape of radical prostatectomy, there is a burgeoning interest in minimally invasive techniques in bladder cancer [35]. Laparoscopic cystectomy is the norm in some departments and can be performed with intracorporeal reconstruction, or more commonly with extracorporeal reconstruction through a short wound, which is also used for specimen retrieval [36]. The advantage for the patient is once again shorter recovery times, but the available evidence is more limited for this than for laparoscopic prostatectomy as this is a lower volume procedure. Some surgeons elect to perform their laparoscopic cystectomy robotically and there are a few small series in the world literature [37]. As with robotic prostatectomy, we need a randomised trial of open vs. robot-assisted laparoscopic cystectomy, and such a trial has been established under the auspices of the National Cancer Research Network UK (NCRN) – BOLERO. However it is performed, cystectomy is associated with significant quality of life implications relating to having a urostomy (most patients), having a neobladder if reconstruction has been performed, and issues of body image and sexual function. With recent advances in
838
chemotherapy and radiotherapy, and better staging modalities, the dogma that the entire bladder should be removed in cases of muscle-invasive disease has been challenged. As a result, the Selective bladder Preservation Against Radical Excision (SPARE) study has been initiated under the auspices of the NCRN. This is a randomised controlled trial of SPARE in muscle-invasive disease, for which recruitment commenced in 2007 [38].
65.5 Penile Cancer Surgery Development of supraregional networks for rare urological malignancies such as penile cancer has brought about changes in surgical management similar to ones which have been seen in breast cancer. We have seen a move away from partial or radical penectomy towards glansectomy and reconstruction with a split-thickness skin graft for glans-confined squamous cell tumours. Mediumterm outcomes suggest that this is a safe and effective approach for preserving a satisfactory cosmetic appearance, without compromising local cancer control [39]. Similarly, evidence suggests that dynamic lymphoscintigraphy and sentinel lymph-node biopsy reliably detects occult metastatic disease although long-term survival benefit data are lacking. It is preferable over prophylactic bilateral inguinal lymph-node dissection with its associated high morbidity rate and no clinical benefit in 80% of patients who have negative nodes [40].
65.6 Molecular and Biological Developments 65.6.1 Biomarkers 65.6.1.1 Prostate Cancer Prostate and bladder cancer, being the two most common malignancies of the urinary tract, have naturally attracted a great deal of attention in trying to develop novel markers that will aid diagnosis and possibly provide prognostic information that will guide subsequent management. Biomarker investigation in prostate cancer using body fluids has the advantage that it is not dependent on the success of a tissue biopsy. The majority of research has been undertaken using blood and urine; limited work has been done using seminal fluid.
E. Mayer and J. Vale
PSA is limited in its ability to discriminate between benign and malignant prostatic pathology, but is currently the mostly widely used bloodbased biomarker and therefore could be seen as a benchmark for novel blood biomarker development. Research has tried to either identify a single novel marker, or a panel of markers, which may be suitable considering the heterogeneity of prostate cancer [41]. Of the single markers, early prostate cancer antigen (EPAC) has shown the most promise with one study reporting sensitivity and specificity of 92 and 94%, respectively [41]. Although this study did include patients with prostatitis, it did not include Benign Prostatic Hyperplasia or Prostate Intraepithelial Neoplasia. A panel of markers can be derived from fractionating proteins, which give a proteomic signature. This approach has yielded some encouraging results, although there has been debate surrounding standardisation of the technique, including whether serum or plasma should be used [41]. Urine is a resourceful entity for biomarker research because it can contain either exfoliated cancer cells or secreted prostatic products. Gene-based biomarkers rely on cell exfoliation and use DNA- or RNA-based assays for detection; it is either a change in the expression or activity of the gene in cancer cells that discriminates them from benign pathologies. To date, a number of cases have been investigated including: glutathione-Stransferase P1 gene (GSTP-1, low expression due to promoter hypermethylation in cancer cells), a combination of four genes (p16, ARF, MGMT, GSTP-1), the TERT gene (encodes for telomerase, the activity of which increases in cancer cells) and DD3 (PCA3) gene. It is the latter which has shown the most potential, with a study reporting a sensitivity and specificity of 66 and 89%, respectively. The DD3 gene is unique to prostate epithelial cells, and expression increases 6–1,500 fold in neoplastic prostate tissue [41]. However, all of the above markers rely on the presence of exfoliated cells, and in early stage disease, when the cancers are well differentiated and most amenable to treatment, exfoliation of cancers cells may be limited because cell-to cell-adhesion is still intact. Proteome changes on the other hand will probably have occurred, making this a more attractive medium for biomarkers of early malignant change. On this basis, several investigators have profiled proteomic changes using techniques such as reversed phase adsorption followed by matrix assisted laser desorption time of flight (MALDI-TOF) mass spectrometry and capillary electrophoresis online coupled to mass spectrometry (CE-MS). Early results suggest a good ability to
65 Urology: Current Trends and Recent Innovations
discriminate between malignant and benign prostatic pathologies, although questions remain over whether to sample initial or midstream urine and the need for, and if so, the duration of prostatic massage [41]. Biomarkers in prostate cancer are an extremely promising area in that modern scientific techniques have given us the potential to “map” early proteomic changes in prostate pathology. This serves not only as a diagnostic aid, but could also provide vital prognostic information about the natural history of malignant tumours and therefore function as an adjunct to management decisions. Difficulties in this regard may arise from the multifocality of prostate cancer and the presence of both “aggressive” and relatively indolent tumours. This is relevant to experimental techniques of focally ablating prostate tumours, using energy sources like HIFU.
65.6.1.2 Bladder Cancer Diagnosis in bladder cancer has traditionally revolved around cystoscopy, either under local or general anaesthetic. Limitations surrounding the sensitivity of this procedure have been acknowledged and technologies
839
are being developed to try and overcome them, such as HAL cystoscopy and Narrow-band cystoscopy [33]. Despite this, there are other situations where cystoscopy is non-optimal, such as for patients with significant co-morbidity and the non-discrimination between inflammatory and malignant process, particularly following intravesical BCG therapy. Similarly, the interpretation of urine cytology is operator dependent and is confounded by infective and other inflammatory processes. There is clearly the need for an improved diagnostic marker, and urinary biomarkers provide an ideal opportunity with the additional benefit over cystoscopy of being non-invasive. Existing research into potential biomarkers can be reflected upon with regards the capabilities of the novel biomarkers as compared to the sensitivity and specificity of urine cytology, the usability of the test and its costs. A recent review of the literature identified the most important biomarker development to date [42]. Table 65.1 summarises the urinary biomarkers and their sensitivity and specificity. The authors concluded that although the current generation of urinary biomarkers is an improvement over the first generation, we have yet to discover a single biomarker that will lower the frequency
Table 65.1 Summary of sensitivities and specificities and advantages and disadvantages of important urinary biomarkers for bladder cancer Specificity (%) 94 (median)
Advantage
Urine cytology
Sensitivity (%) 35 (median)
FISH
69–87
89–96
Unaffected by BCG
Labour Intensive and expensive
MSA
72–97
80–100
Also detection of low grade tumour
Complex and expensive
Immunocyt
38.5–100
73–84.2
Telomerase
70–100
60–70
BTA-TRAK
24–89
52–93
BTA-stat
57–79
48–96
On bench test
Influenced by benign genitourinary conditions
HA-HA-ase
83–94
77–93.4
Also detection of low grade tumour
Needs further study
40–87.3
Unaffected by BCG and detection of low grade tumour
No clearly defined cut-off value
Sensitivity and specificity
Needs further study
Marker
NMP22
49.5–65
Disadvantage Influenced by Inflammation, poor detection of low grade tumours
High Interobserver variability Sensitivity
Influenced by Inflammation and age Influenced by benign genitourinary conditions
BLCA-4
89–96.4
96–100
CYFRA21-1
43–79.3
68–84
Survivin
64–94
93–100
Sensitivity and specificity
Needs further study
62.5–97
Potential for generating ‘panel’ of protein/peptide markers
No clinical usable assays at present
Proteomics
71.7–80
Influenced by benign genitourinary conditions and Instillations
FISH, fluorescence in situ hybridisation; BCG, Bacillus Calmette-Guerin; MSA, microsatellite analysis; HA-HA- ase, hyaluronic acid and hyaluronidase; NMP22, nuclear matrix protein 22; CYFRA 21–1, cytokeratin 19 fragment.
Figure reprinted from Vrooman and Witjes [42]. Copyright (2008), with permission from Elsevier
840
of cystoscopy and assist in surveillance. As in prostate cancer, it may be optimal to define a “panel” of markers to provide us with the information we require. Nuclear matrix protein 22 (NMP22) has been developed as a point-of-care assay making it an attractive option for “one-stop” diagnostic urology services. This assay uses monoclonal antibodies in a lateral flow strip and therefore has a detection “cut-off” value. This cutoff value influences the subsequent sensitivity and specificity. There is evidence to suggest that the diagnostic performance of NMP22 varies according to the tested population [43]. The authors concluded that “no optimal cut-offs for detecting any or aggressive bladder transitional cell carcinoma could be derived based on NMP22 values”. Furthermore, the positive predictive value of the NMP22 test improves for patients at higher risk of bladder cancer [44]. Currently, NMP22, despite its advantages, probably cannot be recommended for surveillance in daily clinical use. BLCA-4 and Survivin are two of the more promising markers yet to be fully clinically verified. BCLA-4, a nuclear matrix protein specifically expressed in bladder cancer, has demonstrated exceptional sensitivity and specificity and is analysed with ELISA. Survivin, an apoptosis regulator, has also demonstrated good sensitivity and specificity, but with the added advantage of providing prognostic information and predicting response to intravesical or systemic therapies. There may also be the potential for its manipulation as part of targeted therapy [45].
65.6.2 Metabonomics Metabonomics is a rapidly evolving field of biomedical science and is defined as “the quantitative measurement of time-related multiparametric metabolic responses of multicellular systems to pathophysiological stimuli or genetic modification” [46]. In practice, this involves the use of spectroscopic metabolic profiling methods applied to various body compartments including urine, plasma, stool and tissue [47], coupled with multivariate statistical analysis of the data, and provides a systems approach for studying in vivo metabolic profiles. High-throughput methods such as high-resolution 1H nuclear magnetic resonance (NMR) spectroscopy or mass spectrometry (MS) can be used to generate a metabolic “fingerprint” of biological fluids or tissues reflecting the levels of
E. Mayer and J. Vale
hundreds or thousands of metabolites (MW < 1,000 Da). By using chemometric tools to analyse these complex metabolic datasets, latent information of prognostic and diagnostic use can be extracted and panels of biomarkers can be associated with specific pathologies. Metabonomics has the advantage over proteomics of understanding the integrated cellular function in living systems by examining the dynamic metabolic status of the whole organism. By combining information from proteomics and metabonomics, the biological endpoint (metabolites) could help to validate and confirm hypotheses built on proteomic information [48]. This approach has been used to study a human prostate cancer xenograft model in mice; multiple correlations between metabolites and proteins were found, which can generate hypotheses regarding biological relationships or pathway activity that can be further tested [48]. Although this work is currently experimental, it will likely be the use of parallel “omics” platforms that will give us the specificity and reliability needed for disease diagnosis, patient stratification and monitoring of drug efficacy in both benign and malignant urological diseases. Some current developments are relevant to clinical practice, such as the practicalities of specimen collection and storage, as this can influence the metabolic profile [49].
65.6.3 Tissue Engineering Some of the optimism surrounding tissue engineering and the ability to “grow” artificial organs for transplant and reconstructive surgery has not been fulfilled for a number of reasons, including the difficulties of using embryonic stem cells (ESCs). Limitations with current tissue engineering research remain; “the determination of the optimum scaffold that can be seeded by cells, the best source of stem cells, and the optimal way to differentiate stem cells in urological reconstruction and regeneration” [50]. ESCs are ideal for tissue engineering due to their totipotency; however moral and ethical concerns have arisen as a result of their origin. A number of multipotent stem cells derived from adult human tissues have been identified and appear to have much wider differentiation potential than was previously thought. Examples include human fat-derived mesenchymal stem cells (MSC) and MSC derived from rat bone marrow, which have been
65
Urology: Current Trends and Recent Innovations
used to generate differentiated smooth muscle cells for urinary tract organs such as bladder, ureter, and urethra [51]. Unipotent progenitor cells, such as those derived from rodent, porcine or human skeletal muscle have been used to restore urethral sphincter muscle and improve continence. Renal progenitor cells have also been used to generate more complex tissues such as renal structures towards developing bioartificial renal devices. Pluripotent ESCs have been used to aid the recovery of spermatogenesis in mouse models, although none have produced a viable embryo from ESC-derived gametes. The role of tissue engineering in future tissue and organ replacement surgery is an exciting prospect and research to date has produced some encouraging results. This research, however, has been carried out only in animal models with the exception of a myoblast-based therapy for incontinence, which has reported very good short-term result in a clinical trial setting [51]. Future tissue engineering research needs to continue within a multidisciplinary approach, which is integral to its success (Fig. 65.1). It will also need to be performed within a regulatory framework that is compliant with all legal regulations and ethical guidelines [50]. membranes meshes foams gels
SCAFFOLD
PROSTHESIS
CELLS
SIGNALS
somatic adult stem embryonic stem engineered
cytokines growth factors extracellular matrix cell-to-cell interactions
Fig. 65.1 The tissue engineering triad. Tissue engineering is a multidisciplinary approach that requires material sciences to provide scaffolds and life sciences to provide living cells. To generate functional biohybrid prostheses from these substantial components, specific biologic signals provide desired phenotypes and behaviour of the cells. Increasing attention has been directed to different stem cell types. Figure reprinted from Becker and Jakse [51]. Copyright (2007), with permission from Elsevier
841
65.7 Diagnostic Imaging 65.7.1 PET & PET-CT Positron emission tomography (PET) exploits compounds labelled with positron-emitting radioisotopes to detect pathologic processes and has used the glucose analogue fluorine 18 (18F) fluorodeoxyglucose (FDG), most commonly. PET has been established on the theory that it images the metabolic activity of tumour tissue and can therefore provide more sensitive and more specific information about the extent of disease than morphologic/anatomical imaging alone. It can be applied for the staging of disease as well as monitoring tumour response to treatment. PET is limited by its lack of anatomical reference such that the spatial resolution is generally inadequate for accurate anatomic localisation of pathology. Computerised tomography (CT) is limited by the reliance on nodal enlargement as an indicator of possible tumour involvement; the lower limit of detection is around 1 cm. This is complicated by the fact that lymph nodes can be enlarged for reason other than malignancy, or tumour involvement may not necessarily be reflected by lymph node size changes. Accurately correlating (co-registering) separate PET and CT sequences is made difficult by differences in patient positioning and involuntary movement of internal organs, which cannot always be resolved adequately by labour-intensive nonlinear mapping. The development of PET-CT technology has enabled the acquisition of spatially registered PET and CT data during a single imaging session. FDG-PET has several limitations when used in the investigation of urological malignancies. Its renal excretion means that a low “target-to-background” ratio for both renal and bladder cancer inhibits its usefulness for primary tumour characterisation. This does not however hinder its role for the detection of distant metastases. In renal cell carcinoma, FDG-PET has been shown to have moderate sensitivity and high specificity (63.6 and 100%, respectively) for the detection of distant metastases, although the mean size of distant metastases correctly identified was relatively large at 2.2 cm [52]. Early results for ImmunoPET studies using monoclonal antibodies have demonstrated highly accurate diagnostic information for renal tumour characterisation [53].
842
There is no role for FDG-PET in the diagnosis of primary bladder tumours. Early work with alternative tracers such as C-methionine and C-choline has been promising. C-choline has the added benefit of good lymph node uptake, with C-choline uptake in a lymph node as small as 5 mm having been demonstrated. There is a possible role for C-choline PET in detecting residual disease after TURBT and for characterising nodal involvement in staging prior to cystectomy [54]. The importance lies in the evidence that neoadjuvant chemotherapy improves survival and cure fraction in patients with metastasis to the locoregional lymph nodes. Furthermore, FDG-PET in combination with correlative imaging with CT has been shown to yield a high diagnostic and prognostic accuracy with sensitivity, specificity and accuracy of 60, 88 and 78%, respectively for nodal/metastatic disease [55]. In prostate cancer, the use of FDG-PET is limited both by urinary excretion and a low tumour metabolic glucose activity, and therefore has little application in diagnosis or staging. Alternative tracers such as 18FFCH and 11C-choline, which take advantage of rapid biosynthesis of cell membranes, have been most promising in the staging of primary and recurrent disease [54]. 11C-choline PET/CT appears to have better accuracy for lymph node staging in intermediate-risk and high-risk prostate cancer when compared with currently used nomograms. For the detection of bone metastases, 18F-fluoride PET/CT demonstrated 100% sensitivity, specificity, positive predictive value and negative predictive value in patients with high-risk prostate cancer [56]. Future advances in PET/CT for imaging in urological malignancy will include the development of novel PET radiopharmaceuticals. PET–CT however, is limited by the significant radiation dose to the patient of CT and the relatively limited soft-tissue contrast. PET–MRI has the potential to circumvent some of these limitations and is an exciting prospect at the preclinical development stage [57].
65.7.2 Magnetic Resonance Spectroscopic Imaging As discussed, the soft tissue contrast achieved with MRI can allow for functional MRI when combined
E. Mayer and J. Vale
with PET, or alternatively allow for magnetic resonance spectroscopic imaging (MRSI). Much of the work in this area has been applied to initial diagnosis, cancer localisation and local staging, assessment of tumour aggressiveness, providing a road-map for surgery and radiotherapy, and early detection of local recurrence in prostate cancer [58]. The principle behind MRSI is the provision of metabolic information regarding prostatic tissue by displaying the relative concentrations of chemical compounds in imaged tissue, and commonly uses the technique of chemical shift imaging with pointresolved spectroscopy (PRESS) voxel excitation [59]. This has the advantage over standard MRI, where signal intensity changes are not specific for a neoplastic process as compared to benign conditions. In MRSI, the interpretation of spectral patterns incorporating metabolic peaks of choline, creatine, polyamines and citrate characterise prostate cancer, although there are differences when imaging the peripheral zone as compared to the transitional zone [59]. MRSI is not yet used for primary diagnosis of prostate cancer, but may have a role in stratifying patients for further targeted biopsy in the presence of raised PSA levels and previous negative biopsies [59]. Some studies have looked at combining MRSI with dynamic contrast-enhanced MRI (DCE-MRI), which uses gadolinium chelate to predict cancers, which show early enhancement as well as early washout of the signal. As a sole modality, DCE-MRI has been shown to have reasonable sensitivity and specificity for defining prostate cancer and has advantages over standard MRI [59]. When DCEMRI is combined with MRSI, accuracy for detecting local prostate cancer recurrence in patients with biochemical progression after surgery is much improved (87% sensitivity and 94% specificity) [58]. Of most interest is the potential for MRSI to predict tumour aggressiveness to aid treatment decisions, particularly if it can overcome the limitation of under-grading by the Gleason scoring in a significant percentage of prostate biopsies. Early results show promise, but some technical restrictions need to be further explored [58]. MRSI, possibly in combination with DCE-MRI, needs further evaluation in larger validated clinical trials and will use higher-field-strength scanners that will provide improved spatial resolution, with improved visualisation of anatomic details, and improved spectral resolution for more accurate metabolic mapping [59]. The application of enhanced imaging modalities will
65
Urology: Current Trends and Recent Innovations
improve treatment selection and planning in the future, with the potential to improve patient outcomes.
65.7.3 USPIO Ultra-small super-paramagnetic iron oxide (USPIO), such as ferumoxtran-10, are nanoparticles, which can be used as contrast agents in combination with high-resolution MRI to increase the sensitivity and specificity of identifying malignant nodal involvement in patients with neoplasms, including genitourinary malignancies. It has the added advantage over unenhanced MRI and CT of depicting metastases even in normal-sized lymph nodes and therefore assists in the optimal management of cancer patients because of decisions regarding surgery, neoadjuvant/adjuvant chemotherapy and radiotherapy. Relevant to urology, it has been applied in cancer of the prostate, bladder, kidney, penis and testicle [60, 61]. Ferumoxtran-10, a first-generation USPIO nanoparticle, is intravenously administered and is infrequently associated with mild lower back pain as the most common side effect [62]. The MRI scan is performed 24 h after contrast administration. The nanoparticles consist of an iron-oxide crystalline core covered with a coating of low-molecular weight dextran, causing a signal intensity change as a result of the magnetic properties. It acts as a negative contrast agent; in healthy lymph nodes, the ferumoxtran-10 is degraded by phagocytosing macrophages (no signal intensity change – dark node), whereas in lymph nodes containing tumour, the macrophages are replaced by cancer cells, leaving ferumoxtran-10 non-degraded, this results in an increase in signal intensity change. A meta-analysis of prospective studies comparing MRI with and without ferumoxtran-10 across a number of tumour types, with histological diagnosis after surgery or biopsy, concluded that “ferumoxtran-10enhanced MRI offers higher diagnostic precision than does unenhanced MRI, and is sensitive and specific for the detection of lymph-node metastases, especially in the abdomen and pelvis”. Study sample sizes have been small to date, however the technique appears to be safe and feasible. Although the main focus has surrounded improved diagnostic precision for the detection of lymph-node metastases, there could also be benefits in distinguishing benignity in primary tumours, such as those of the kidney [61].
843
65.7.4 Ultrasound Ultrasound is applied widely within urology as a diagnostic modality. Its application in prostate biopsy is well established, but typically only uses grey-scale images. Prostate cancer can appear as hypoechoic, echogenic or isoechoic lesions and about 50% of prostate cancer is not visible with conventional ultrasound. As a result, systematic biopsy regimes are used to increase the cancer detection rates, but confer not insignificant complication rates. In an effort to improve the diagnostic accuracy of ultrasound for detection of prostate cancer, several techniques have been explored including, colour doppler [63], contrast-enhanced colour doppler [64] and elastography [63]. Although these adjuncts improve cancer detection, and detect cancers with higher Gleason scores and more cancer than systematic biopsy [65, 66], we are not yet in a position to replace traditional systematic biopsy techniques with targeted biopsy alone [63]. Contrast-enhanced ultrasound has also been applied in diagnosis of benign diseases such as pyelonephritis. Advances over basic contrast-enhanced ultrasound by incorporating contrast pulse-sequences have resulted in very high sensitivity and specificity for detecting parenchymal changes in acute pyelonephritis and has the advantage over CT of no radiation exposure [67]. Frausher et al. [68] reported on the use of contrastenhanced ultrasound and its ability to better detect crossing vessels at the ureteropelvic junction to aid presurgical evaluation of ureteropelvic junction obstruction. The ability of contrast-enhanced ultrasound to assess tumour vascularity has been used as a means to distinguish benign from malignant small renal masses using a combination of grade of vascularity and peak systolic velocity [69]. Animal studies have also shown the benefit of contrast-enhanced ultrasound in monitoring the response of renal lesions to RFA [70]. Advances in flexible technologies within endourology have also allowed for the advancement of endoluminal ultrasonography (ELUS) using high-frequency transducers located at the tip of flexible catheters. More recently, this technology has also incorporated three-dimensional reconstruction to highlight the spatial relation of anatomic structures [71]. Current urological applications of ELUS include evaluation of the striated urethral sphincter in incontinent females and to guide subsequent collagen injections, staging the depth of tumour invasion in the bladder and upper
844
tracts, locating crossing vessels to guide endopyelotomy, diagnosing and localising urethral diverticula, diagnosing submucosal migration of ureteral calculi and as a safe imaging modality in pregnant females needing ureteral stent placement [71]. Future technological advances will further miniaturise flexible catheters and transducers and expand clinical application of this imaging modality. The introduction of transducers into deflectable instruments has also allowed for the use of intraoperative real-time imaging during laparoscopic procedures. Direct contact of the transducer with the organ of interest allows higher frequencies to be used and therefore images of greater resolution to be obtained [72]. It has been most frequently applied during renal procedures to localise tumours to either be ablated using modalities such as cryotherapy or resected by partial nephrectomy. More recently, the use of endorectal real-time ultrasound during laparoscopic radical prostatectomy has shown potential to decrease positive surgical margin rates and to facilitate certain specific technical aspects of laparoscopic radical prostatectomy such as neurovascular bundle release [73, 74]. Intraoperative ultrasound has yet to be integrated into robotically assisted procedures, but forms a potential future application and may help to overcome the lack of haptic feedback associated with this operative approach.
65.8 Screening and Chemoprevention in Prostate Cancer It would be hard to write a chapter on urological research without mentioning prostate cancer screening. This continues to provoke debate and controversy, and there are two important ongoing prostate cancer screening studies. The European Randomized Screening Study for Prostate Cancer (ERSPC) continues to report intermittently, and constitutes a remarkably altruistic and forward-looking piece of research on the part of its instigators. Between 1993 and 1996, 60,000 men were randomised across six European centres between 4-yearly screening and a control group. The Prostate, Lung, Colorectal and Ovarian Cancer (PLCO) Screening Trial in the United States is a similar type of trial, which randomised approximately 77,000 men between 1993 and 2001 and continues to report. There was a time when one might have speculated that by the time these long-term studies reported,
E. Mayer and J. Vale
their conclusions would be obsolete. However, such is the ongoing debate about this issue that they are more important than ever. The ProtecT study [75] is a different approach to the screening argument in that it looks at the entire patient pathway from screening to treatment and beyond. It is interesting in that it tries to assimilate quality of life issues associated with the steps of being screened, diagnosed and treated. Hopefully, it will help us to understand whether the old public health argument that “screening can seriously damage your health” is correct, or whether early diagnosis and treatment is indeed welcomed by our male population. Another important public health matter is whether or not we should be trying to prevent prostate cancer. Broadly speaking, the areas of interest have centred on dietary supplements and the use of 5a-reductase inhibitors. The SELECT study is an ongoing phase III trial of selenium and vitamin E in preventing prostate cancer, which will randomise approximately 32,000 men between selenium alone, vitamin E alone, selenium and vitamin E, and placebo [76]. It was initiated on the basis of large-scale randomised prospective trials for other malignancies, which suggested that vitamin E and selenium reduced the incidence and mortality from prostate cancer. The Prostate Cancer Prevention Trial (PCPT) was a randomised controlled study of finasteride vs. placebo in preventing prostate cancer [77]. Remarkably, and quite exceptionally, all patients at the end of the study underwent biopsy regardless of PSA. It is probably one of the most controversial trials in urology in that it seemed to show that: • Finasteride did seem to reduce the incidence of prostate cancer as compared to the control group • Men on finasteride who did have prostate cancer had a relatively higher proportion of higher grade cancers • 15% of men with PSAs <4 ng/mL had prostate cancer despite being within the “normal range” [78] This study continues to provoke a huge amount of comment and debate in the literature and, despite being properly and carefully set up in the first instance, many point to methodological flaws and aspects which they feel annul the conclusion that finasteride protects against prostate cancer. However, it will be interesting compare the results of the dutasteride (REDUCE) prevention study which, although methodologically different, evaluates whether another common 5a-reductase
65
Urology: Current Trends and Recent Innovations
agent protects against this disease [79]. If it reaches the same conclusion as the PCPT, logic would dictate that these drugs may well be protective.
References 1. Novick AC (2004) Laparoscopic and partial nephrectomy. Clin Cancer Res 10:6322S–6327S 2. Honeck P, Wendt-Nordahl G, Bolenz C et al (2008) Hemostatic properties of four devices for partial nephrectomy: a comparative ex vivo study. J Endourol 22: 1071–1076 3. Breda A, Stepanian SV, Lam JS et al (2007) Use of haemostatic agents and glues during laparoscopic partial nephrectomy: a multi-institutional survey from the United States and Europe of 1347 cases. Eur Urol 52:798–803 4. Deane LA, Lee HJ, Box GN et al (2008) Robotic versus standard laparoscopic partial/wedge nephrectomy: a comparison of intraoperative and perioperative results from a single institution. J Endourol 22:947–952 5. Rogers CG, Singh A, Blatt AM et al (2008) Robotic partial nephrectomy for complex renal tumors: surgical technique. Eur Urol 53:514–521 6. Finley DS, Beck S, Box G et al (2008) Percutaneous and laparoscopic cryoablation of small renal masses. J Urol 180:492–498; discussion 498 7. Weight CJ, Kaouk JH, Hegarty NJ et al (2008) Correlation of radiographic imaging and histopathology following cryoablation and radio frequency ablation for renal tumors. J Urol 179:1277–1281; discussion 1281–1283 8. Zagoria RJ, Traver MA, Werle DM et al (2007) Oncologic efficacy of CT-guided percutaneous radiofrequency ablation of renal cell carcinomas. AJR Am J Roentgenol 189: 429–436 9. Wu F, Wang ZB, Chen WZ et al (2003) Preliminary experience using high intensity focused ultrasound for the treatment of patients with advanced stage renal malignancy. J Urol 170:2237–2240 10. Phillips CK, Taneja SS (2004) The role of lymphadenectomy in the surgical management of renal cell carcinoma. Urol Oncol 22:214–223; discussion 223–224 11. O’Brien MF, Russo P, Motzer RJ (2008) Sunitinib therapy in renal cell carcinoma. BJU Int 101:1339–1342 12. National Institute for Health and Clinical Excellence (2005) Guidance on cancer services, improving outcomes in urological cancers, the manual. Available at http://www.nice. org.uk/nicemedia/pdf/Urological_Manual.pdf 13. Wilt TJ, MacDonald R, Rutks I et al (2008) Systematic review: comparative effectiveness and harms of treatments for clinically localized prostate cancer. Ann Intern Med 148:435–448 14. Rassweiler J, Hruza M, Teber D et al (2006) Laparoscopic and robotic assisted radical prostatectomy–critical analysis of the results. Eur Urol 49:612–624 15. Miller J, Smith A, Kouba E et al (2007) Prospective evaluation of short-term impact and recovery of health related quality of life in men undergoing robotic assisted laparoscopic
845 radical prostatectomy versus open radical prostatectomy. J Urol 178:854–858; discussion 859 16. Ficarra V, Cavalleri S, Novara G et al (2007) Evidence from robot-assisted laparoscopic radical prostatectomy: a systematic review. Eur Urol 51:45–55; discussion 56 17. Berryhill R Jr, Jhaveri J, Yadav R et al (2008) Robotic prostatectomy: a review of outcomes compared with laparoscopic and open approaches. Urology 72:15–23 18. Blute ML (2008) Radical prostatectomy by open or laparoscopic/robotic techniques: an issue of surgical device or surgical expertise? J Clin Oncol 26:2248–2249 19. Tooher R, Swindle P, Woo H et al (2006) Laparoscopic radical prostatectomy for localized prostate cancer: a systematic review of comparative studies. J Urol 175:2011–2017 20. Touijer K, Guillonneau B (2006) Laparoscopic radical prostatectomy: a critical analysis of surgical quality. Eur Urol 49:625–632 21. Touijer K, Eastham JA, Secin FP et al (2008) Comprehensive prospective comparative analysis of outcomes between open and laparoscopic radical prostatectomy conducted in 2003 to 2005. J Urol 179:1811–1817; discussion 1817 22. Eggener SE, Guillonneau B (2008) Laparoscopic radical prostatectomy: ten years later, time for evidence-based foundation. Eur Urol 54:4–7 23. Menon M, Tewari A, Baize B et al (2002) Prospective comparison of radical retropubic prostatectomy and robotassisted anatomic prostatectomy: the Vattikuti Urology Institute experience. Urology 60:864–868 24. Ahlering TE, Woo D, Eichel L et al (2004) Robot-assisted versus open radical prostatectomy: a comparison of one surgeon’s outcomes. Urology 63:819–822 25. Menon M, Shrivastava A, Tewari A (2005) Laparoscopic radical prostatectomy: conventional and robotic. Urology 66: 101–104 26. Smith JA Jr, Chan RC, Chang SS et al (2007) A comparison of the incidence and location of positive surgical margins in robotic assisted laparoscopic radical prostatectomy and open retropubic radical prostatectomy. J Urol 178:2385–2389; discussion 2389–2390 27. Guillonneau BD (2005) Laparoscopic versus robotic radical prostatectomy. Nat Clin Pract Urol 2:60–61 28. Descazeaud A, Peyromaure M, Zerbib M (2007) Will robotic surgery become the gold standard for radical prostatectomy? Eur Urol 51:9–11 29. National Institute for Health and Clinical Excellence. Prostate cancer: full guidance 2008. Available at http://www. nice.org.uk/nicemedia/pdf/CG58FullGuideline.pdf 30. Bott SRJ, Hindley RG, Abdul-Rahman A et al (2008) Are the characteristics of prostate cancer amenable to focal therapy? BJU Int 101:1–16 31. Braeckman J, Autier P, Garbar C et al (2008) Computeraided ultrasonography (HistoScanning): a novel technology for locating and characterizing prostate cancer. BJU Int 101:293–298 32. Ohigashi T, Kozakai N, Mizuno R et al (2006) Endocytoscopy: novel endoscopic imaging technology for in-situ observation of bladder cancer cells. J Endourol 20:698–701 33. Bryan RT, Billingham LJ, Wallace DM (2008) Narrow-band imaging flexible cystoscopy in the detection of recurrent urothelial cancer of the bladder. BJU Int 101:702–705; discussion 705–706
846 34. Fradet Y, Grossman HB, Gomella L et al (2007) A comparison of hexaminolevulinate fluorescence cystoscopy and white light cystoscopy for the detection of carcinoma in situ in patients with bladder cancer: a phase III, multicenter study. J Urol 178:68–73; discussion 73 35. Haber GP, Crouzet S, Gill IS (2008) Laparoscopic and robotic assisted radical cystectomy for bladder cancer: a critical analysis. Eur Urol 54:54–62 36. Huang J, Lin T, Xu K et al (2008) Laparoscopic radical cystectomy with orthotopic ileal neobladder: a report of 85 cases. J Endourol 22:939–946 37. Murphy DG, Challacombe BJ, Elhage O et al (2008) Roboticassisted laparoscopic radical cystectomy with extracorporeal urinary diversion: initial experience. Eur Urol 54:570–580 38. Institute of Cancer Research (2008) Selective bladder preservation against radical excision (cystectomy) – SPARE. Available at http://www.icr.ac.uk/research/research_sections/ clinical_trials/clinical_trials_list/7561.shtml 39. Smith Y, Hadway P, Biedrzycki O et al (2007) Reconstructive surgery for invasive squamous carcinoma of the glans penis. Eur Urol 52:1179–1185 40. Hadway P, Smith Y, Corbishley C et al (2007) Evaluation of dynamic lymphoscintigraphy and sentinel lymph-node biopsy for detecting occult metastases in patients with penile squamous cell carcinoma. BJU Int 100:561–565 41. Schiffer E (2007) Biomarkers for prostate cancer. World J Urol 25:557–562 42. Vrooman OP, Witjes JA (2008) Urinary markers in bladder cancer. Eur Urol 53:909–916 43. Shariat SF, Marberger MJ, Lotan Y et al (2006) Variability in the performance of nuclear matrix protein 22 for the detection of bladder cancer. J Urol 176:919–926; discussion 926 44. Lotan Y, Shariat SF (2008) Impact of risk factors on the performance of the nuclear matrix protein 22 point-of-care test for bladder cancer detection. BJU Int 101:1362–1367 45. Margulis V, Lotan Y, Shariat SF (2008) Survivin: a promising biomarker for detection and prognosis of bladder cancer. World J Urol 26:59–65 46. Nicholson JK, Lindon JC, Holmes E (1999) ‘Metabonomics’: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica 29:1181–1189 47. Nicholson JK, Connelly J, Lindon JC et al (2002) Metabonomics: a platform for studying drug toxicity and gene function. Nat Rev Drug Discov 1:153–161 48. Rantalainen M, Cloarec O, Beckonert O et al (2006) Statistically integrated metabonomic-proteomic studies on a human prostate cancer xenograft model in mice. J Proteome Res 5:2642–2655 49. Barton RH, Nicholson JK, Elliott P et al (2008) Highthroughput 1H NMR-based metabolic analysis of human serum and urine for large-scale epidemiological studies: validation study. Int J Epidemiol 37(Suppl 1):i31–i40 50. Sievert KD, Amend B, Stenzl A (2007) Tissue engineering for the lower urinary tract: a review of a state of the art approach. Eur Urol 52:1580–1589 51. Becker C, Jakse G (2007) Stem cells for regeneration of urological structures. Eur Urol 51:1217–1228 52. Powles T, Murray I, Brock C et al (2007) Molecular positron emission tomography and PET/CT imaging in urological malignancies. Eur Urol 51:1511–1520; discussion 1520–1521
E. Mayer and J. Vale 53. Divgi CR, Pandit-Taskar N, Jungbluth AA et al (2007) Preoperative characterisation of clear-cell renal carcinoma using iodine-124-labelled antibody chimeric G250 (124IcG250) and PET in patients with renal masses: a phase I trial. Lancet Oncol 8:304–310 54. Bouchelouche K, Oehr P (2008) Positron emission tomography and positron emission tomography/computerized tomography of urological malignancies: an update review. J Urol 179:34–45 55. Drieskens O, Oyen R, Van Poppel H et al (2005) FDG-PET for preoperative staging of bladder cancer. Eur J Nucl Med Mol Imaging 32:1412–1417 56. Even-Sapir E, Metser U, Mishani E et al (2006) The detection of bone metastases in patients with high-risk prostate cancer: 99mTc-MDP Planar bone scintigraphy, singleand multi-field-of-view SPECT, 18F-fluoride PET, and 18F-fluoride PET/CT. J Nucl Med 47:287–297 57. Pichler BJ, Judenhofer MS, Wehrl HF (2008) PET/MRI hybrid imaging: devices and initial results. Eur Radiol 18:1077–1086 58. Sciarra A, Salciccia S, Panebianco V (2008) Proton spectroscopic and dynamic contrast-enhanced magnetic resonance: a modern approach in prostate cancer imaging. Eur Urol 54:485–488 59. Fuchsjager M, Shukla-Dave A, Akin O et al (2008) Prostate cancer imaging. Acta Radiol 49:107–120 60. Deserno WM, Harisinghani MG, Taupitz M et al (2004) Urinary bladder cancer: preoperative nodal staging with ferumoxtran-10-enhanced MR imaging. Radiology 233:449–456 61. Guimaraes AR, Tabatabei S, Dahl D et al (2008) Pilot study evaluating use of lymphotrophic nanoparticle-enhanced magnetic resonance imaging for assessing lymph nodes in renal cell cancer. Urology 71:708–712 62. Will O, Purkayastha S, Chan C et al (2006) Diagnostic precision of nanoparticle-enhanced MRI for lymph-node metastases: a meta-analysis. Lancet Oncol 7:52–60 63. Nelson ED, Slotoroff CB, Gomella LG et al (2007) Targeted biopsy of the prostate: the impact of color Doppler imaging and elastography on prostate cancer detection and Gleason score. Urology 70:1136–1140 64. Frauscher F, Klauser A, Volgger H et al (2002) Comparison of contrast enhanced color Doppler targeted biopsy with conventional systematic biopsy: impact on prostate cancer detection. J Urol 167:1648–1652 65. Ismail M, Petersen RO, Alexander AA et al (1997) Color Doppler imaging in predicting the biologic behavior of prostate cancer: correlation with disease-free survival. Urology 50:906–912 66. Mitterberger M, Pinggera GM, Horninger W et al (2007) Comparison of contrast enhanced color Doppler targeted biopsy to conventional systematic biopsy: impact on Gleason score. J Urol 178:464–468; discussion 468 67. Mitterberger M, Pinggera GM, Colleselli D et al (2008) Acute pyelonephritis: comparison of diagnosis with computed tomography and contrast-enhanced ultrasonography. BJU Int 101:341–344 68. Frauscher F, Janetschek G, Helweg G et al (1999) Crossing vessels at the ureteropelvic junction: detection with contrastenhanced color Doppler imaging. Radiology 210:727–731 69. Pallwein L, Mitterberger M, Aigner F et al (2007) Small renal masses: the value of contrast-enhanced colour Doppler imaging. BJU Int 99:579–585
65
Urology: Current Trends and Recent Innovations
70. Johnson DB, Duchene DA, Taylor GD et al (2005) Contrastenhanced ultrasound evaluation of radiofrequency ablation of the kidney: reliable imaging of the thermolesion. J Endourol 19:248–252 71. Kondabolu S, Khan SA, Whyard J et al (2004) The role of endoluminal ultrasonography in urology: current perspectives. Int Braz J Urol 30:96–101 72. Nascimento RG, Coleman J, Solomon SB (2008) Current and future imaging for urologic interventions. Curr Opin Urol 18:116–121 73. Ukimura O, Gill IS (2006) Real-time transrectal ultrasound guidance during nerve sparing laparoscopic radical prostatectomy: pictorial essay. J Urol 175:1311–1319 74. Ukimura O, Magi-Galluzzi C, Gill IS (2006) Real-time transrectal ultrasound guidance during laparoscopic radical prostatectomy: impact on surgical margins. J Urol 175: 1304–1310
847 75. Rosario DJ, Lane JA, Metcalfe C et al (2008) Contribution of a single repeat PSA test to prostate cancer risk assessment: experience from the ProtecT study. EurUrol 53:777–784 76. Klein EA, Thompson IM, Lippman SM et al (2000) SELECT: the Selenium and Vitamin E Cancer Prevention Trial: rationale and design. Prostate Cancer ProstaticDis 3:145–151 77. Thompson IM, Goodman PJ, Tangen CM et al (2003) The influence of finasteride on the development of prostate cancer. N Engl J Med 349:215–224 78. Thompson IM, Pauler DK, Goodman PJ et al (2004) Prevalence of prostate cancer among men with a prostatespecific antigen level < or =4.0 ng per milliliter. N Engl J Med 350:2239–2246 79. Andriole G, Bostwick D, Brawley O et al (2004) Chemoprevention of prostate cancer in men at high risk: rationale and design of the reduction by dutasteride of prostate cancer events (REDUCE) trial. J Urol 172:1314–1317
Cardiothoracic Surgery: Current Trends and Recent Innovations
66
Joanna Chikwe, Thanos Athanasiou, and Adanna Akujuo
Contents 66.1
Innovation in Cardiac Surgery ............................. 849
66.2
New Surgical Techniques ....................................... 851
66.2.1 Surgical Treatment of Atrial Fibrillation.................. 851 66.2.2 Off-Pump and Minimally Invasive Cardiac Surgery ........................................................ 855 66.2.3 Ventricular Assist Devices ....................................... 860 66.3
Molecular and Biological Developments Within the Speciality .............................................. 864
66.3.1 66.3.2 66.3.3 66.3.4 66.3.5 66.3.6 66.3.7
Cellular and Tissue Engineering .............................. Donor Cells .............................................................. Methods of Cell Delivery ......................................... Acute Myocardial Infarction .................................... Ischemic Cardiomyopathy ....................................... Heart Failure............................................................. Tissue Engineering ...................................................
66.4
Diagnostics and Imaging in Cardiac Surgery ...... 867
66.4.1 66.4.2 66.4.3 66.4.4
Three-Dimensional Echocardiography .................... Cardiac Magnetic Resonance Imaging..................... Multi-Slice Computerized Tomography................... Single Positron Emission Computed Tomography .............................................................
864 864 865 865 865 866 866
867 867 868 868
66.5
Future Developments ............................................. 868
66.5.1 66.5.2 66.5.3 66.5.4 66.5.5
Percutaneous Valve Technology............................... Mitral Valve Repair .................................................. Aortic Valve Replacement........................................ Pulmonary Valve Replacement ................................ Robotic Cardiac Surgery ..........................................
66.6
Cardiovascular Surgery Clinical Research Network .................................................. 871
868 869 869 870 870
References ........................................................................... 871
J. Chikwe () Department of Cardiothoracic Surgery, Mount Sinai Medical Centre, 1190 Fifth Avenue, New York, NY 10029, USA e-mail: [email protected]
Abstract The extremely technically demanding nature of cardiothoracic surgery, where even small deviations from highly standardized procedures may result in mortality or major morbidity, mandates a highly conservative approach by surgeons today. This is in stark contrast to the flexible and creative mindset that dominated the early phase of cardiothoracic surgery, and that is now required to identify and benefit from future innovation. Stem cell and molecular science, and nano- and robotic technology, which are areas unfamiliar to most cardiac surgeons, may be the source of important innovations, and cardiothoracic surgery must maintain an active and informed interest in these areas if the speciality is to become a successful early adopter of future developments. The last decade has seen major developments in several established imaging modalities significantly increasing their impact on the assessment of patients undergoing cardiac surgery. These modalities include the following: real-time three-dimensional (3D) echocardiography, cardiac magnetic resonance imaging (CMR), multi-slice computed tomography (CT) and single positron emission spectography (SPECT).
66.1 Innovation in Cardiac Surgery Cardiac surgery is a prime example of the dual effect of innovation, which may have a positive impact on growth, but also a potentially negative impact through increased vulnerability to external competition from new technology. There are strong parallels between cardiothoracic surgery and models of industry development, where this dual role of innovation was originally described. The Abarnathy-Utterbach model of business evolution has been compared with the development of
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_66, © Springer-Verlag Berlin Heidelberg 2010
849
850
cardiothoracic surgery in a thoughtful review article that is summarized here [1]. Development takes place in three phases [2]. The first “fluid” phase of development occurs during the early years of an industry and is dominated by the experimentation and technological innovation. Cardiothoracic surgery emerged as a speciality in the 1950s, a period, which saw pioneering efforts on many fronts, including cardiopulmonary bypass, crosscirculation techniques and new operative procedures and prostheses. The emphasis during this first phase tends to be on product innovation, rather than process innovation. As in new industries, many early innovations, such as cross-circulation and the Vineberg procedure, were abandoned, with a few successful technologies forming the basis of future development. The second phase, known as the “transitional phase,” is characterized by a shift from an emphasis on product innovation to an emphasis on innovation in the way products are made or the service provided. Variety is superseded by standardized products and services that have been proved in trials or in the market place, or conform to regulatory constraints. In cardiothoracic surgery, cardiopulmonary bypass with diastolic arrest achieved by cold cardioplegia to facilitate bypass grafting using a variety of venous and arterial conduits became the main approach to revascularisation, dominating alternative techniques including trans-myocardial revascularization, as well as off-pump surgery. Innovation focused on finessing aspects of this operation (such as choice of conduit, and myocardial protection), as well as perioperative management, rather than designing new operations. The final phase, which Utterbach called the “specific phase,” occurs when the rate of innovation in both product and process dwindles, occurring as small, incremental changes. In industry, this is characterized by ever-increasing focus on cost, volume and capacity. Efficiency and cost-effectiveness become the basis of success. Cardiothoracic surgery could be argued to be in this final phase of development. The way in which established businesses, in the third “specific” phase of development, respond to major product innovation offers additional insight into the current state of cardiothoracic surgery. Christensen describes two kinds of innovations [3]. The first type of innovation he terms sustaining: it is usually highly compatible with the existing dominant product, allowing easy uptake and providing early adopters with a competitive advantage. Off-pump coronary artery bypass (OPCAB) grafting
J. Chikwe et al.
would be an example of this in cardiac surgery. The second type of innovation he terms disruptive: rather than being an incremental improvement on the dominant product, these technologies are outside the mainstream. Mature businesses fail to capitalize on the advantages offered by the new technologies as not only may the principles be incompatible with existing products, but the new arrivals frequently under-perform the dominant technology by conventional standards. Customers, however, may identify features outside conventional standards that they prefer in the new products, potentially leading to dramatic changes in market dominance. This is accelerated by the facts that disruptive technologies tend to evolve faster than established products, eventually exceeding the performance of established products, and that “they allow a larger number of less skilled people to do things that historically only an expert could do.” Coronary angioplasty is an example of a disruptive technology. What lessons can be learned from this parallel between business development and cardiothoracic surgery? First, it is the axiom that well-managed market leaders fail because the very management practices that underpinned their success make it difficult for them to adopt disruptive technologies: in other words, the failure to adopt and profit from disruptive technology stems from organizational barriers, not from the lack of understanding of or access to the new technology. It is in order to address these barriers that it has been suggested that standard cardiothoracic surgical training should become cardiovascular training encompassing a substantial endovascular component, as the divide between surgical and percutaneous solutions to cardiovascular problems becomes increasingly artificial, allowing surgeons to incorporate new technologies in established operations. While this may offer a temporizing solution at an individual level, Christensen observes that when market leaders try to shoehorn disruptive technologies into existing applications (film and digital technologies being a commonly sited example), the commonest outcome is almost an inevitable failure of the previously dominant industry. Parallels with business suggest that there are no easy solutions for cardiothoracic surgeons. The extremely technically demanding nature of cardiothoracic surgery, where even small deviations from highly standardized procedures may result in mortality or major morbidity, mandates a highly conservative approach by surgeons today. This is in stark contrast to the flexible and
66 Cardiothoracic Surgery: Current Trends and Recent Innovations
creative mindset that dominated the fluid early phase of cardiothoracic surgery, and that is now required to identify and benefit from future innovation. Stem cell and molecular science, and nano- and robotic technology, which are areas unfamiliar to most cardiac surgeons, may be the source of important innovations, and cardiothoracic surgery must maintain an active and informed interest in these areas if the speciality is to become a successful early adopter of future developments. There is also a strong argument for sustaining innovation to reduce the risk and cost, and improve the outcomes associated with cardiac surgery so that existing procedures can benefit much wider patient populations, as currently only a fraction of the patients that fall within established indications for cardiac surgical procedures are referred for surgery because of geographical and economic constraints as well as clinical ones. This chapter explores these themes.
66.2 New Surgical Techniques 66.2.1 Surgical Treatment of Atrial Fibrillation Atrial fibrillation is an example of a pathology not traditionally treated by surgical modalities, where an improved understanding of the pathophysiological processes underlying this condition, together with innovation and refinement of surgical techniques designed to cure AF is enabling surgeons to offer low-risk curative techniques to an expanding patient population. The following section is based on a recent review article on the topic [4].
66.2.1.1 Incidence Atrial fibrillation affects over two million people in the United States alone [5]. Its incidence increases with age: the prevalence of AF in the population between the ages of 50 and 60 is 0.5% compared with 8% over the age of 80 [6]. As a result of the ageing population, it is estimated that over five million people in the United States will be affected by AF by the year 2050 [7]. Atrial fibrillation is commonly associated with coronary artery disease, valvular heart disease and congestive heart failure
851
[8], and affects over a third of patients in the immediate period after coronary artery bypass surgery and over half of patients undergoing valvular heart surgery [9].
66.2.1.2 Pathophysiology Atrial fibrillation is rapid and disorganized atrial activation at a rate of 240–320 beats per minute, characterized by ineffective atrial contraction, and lack of co-ordination with the irregular ventricular response. AF has recently been re-classified as paroxysmal, persistent or permanent [10]. Paroxysmal AF is characterized by episodes that terminate spontaneously within 7 days. Persistent AF lasts longer than 7 days, or requires electrical, pharmaceutical or surgical cardioversion to restore sinus rhythm. Permanent AF is used to describe AF refractory to cardioversion. It is likely that different mechanisms drive paroxysmal AF and non-paroxysmal AF. Despite intensive research efforts, the mechanisms underlying AF are not fully understood. Early theories were dominated by the concept of multiple chaotic reentry circuits, following varying lines of conduction block, which perpetuated AF [11]. Moe’s multiple wavelet hypothesis was a development of this idea, based on the finding that a wavefront propagating along the atria may be divided or redirected by changes in the conduction properties of the atria or anatomical structures, resulting in the formation of multiple wavefonts [12]. An advance in the understanding of AF that has become central to surgical-based treatment strategies came from experimental mapping studies, which identified local areas of ectopy acting as primary generators. Haissaguerre et al. showed that arrhythmias originate from the pulmonary veins up to 95% of the time in patients with paroxysmal AF [13]. A further important concept is summarized by the phrase: “AF begets AF,” which refers to the finding that it is easiest to achieve cardio-version in AF of less than 24 h duration, after which it becomes progressively more difficult [14]. As a consequence of the increased rate of myocardial contraction resulting from AF, there is an increase in intracellular calcium [15]. This eventually causes a reduction in the refractory period of myocytes, predisposing the atrium to ectopy: a situation known as electrical remodeling. From an anatomical standpoint, the atrial myocardium can be considered as consisting of areas of induction i.e., focal areas of enhanced automaticity, most commonly around the
852
pulmonary veins, but also surrounding the orifice of the SVC, the crista terminalis and the coronary sinus [16]; and areas of maintenance consisting of macroreentry circuits, which may be fixed (anatomical) or functional, and which become more persistent with time. The rate of ventricular response is determined by the characteristics of conduction across the atrio-ventricular node and any accessory conduction pathways, as well as sympathetic and vagal tone. A rapid ventricular response, beat to beat variability in the R–R interval and loss of atrial contraction, all lead to impaired diastolic filling and a reduction in cardiac output by as much as 20% [17]. This is especially true in patients where diastolic filling is already impaired, as is the case in patients with mitral stenosis, and left ventricular hypertrophy. Traditionally, thrombus formation was attributed to the presence of stasis in blood flow within the atria arising as a result of lack of effective atrial contraction. Atrial fibrillation has, however, recently been associated with increased circulating von-Willebrand factor, platelet factor 4 and b-thromboglobulin, which contribute to a hypercoaguable state [18].
66.2.1.3 Clinical Sequelae Atrial fibrillation is associated with significant excess morbidity and mortality [19]. The rate of stroke in non-anticoagulated patients with AF under the age of 60 is approximately 1.5% per year, rising to just under 10% per year in patients over the age of 80 [20]. In patients with rheumatic heart disease and AF, stroke risk is increased up to 17-fold compared with agematched controls [21]. Electrical cardio-version is not without risk: in one study that reviewed 454 patients undergoing electrical cardioversion for AF or atrial flutter, six patients (1%) had thrombo-embolic complications [22]. Anticoagulation strategies aimed at reducing the incidence of thrombo-embolic events are problematic: the risk of intracranial hemorrhage in patients anticoagulated with warfarin to maintain INR between 2.0 and 3.0 is about 0.5% per year [23]. Extracranial major bleeding events, particularly gastrointestinal hemorrhage occur in 1% of patients per year, and are associated with significant excess mortality [24]. The mortality of patients with AF is up to twice that of patients with no history of AF [25].
J. Chikwe et al.
66.2.1.4 Medical Treatment There are two main treatment strategies: rhythm control i.e., restoration and maintenance of sinus rhythm, and rate control i.e., maintenance of an optimum heart rate and concomitant anticoagulation. Rhythm control was believed to offer improvement in symptom control, functional capacity, quality of life, thromboembolic events and mortality. Several large, randomized controlled trials (PIAF, RACE, STAF) have however shown little or no improvement in these outcome measures in patients where medical cardioversion was the treatment goal compared to patients treated with a strategy aimed at rate control [26–28]. One such study (AFFIRM) showed a trend towards increased mortality in the cardioversion group [29], reflecting both variable efficacy and side-effects associated with drug regimes used to maintain sinus rhythm [10]. Medical cardioversion and maintenance of sinus rhythm is successful in only half of patients at 2 weeks: by 1 year, almost two-thirds of patients will have experienced a recurrence of AF [30]. Functional capacity has, however, been shown to improve with restoration of sinus rhythm, suggesting that a strategy of rhythm control could offer a specific benefit to patients where functional capacity is the primary concern [26, 31]. Current guidelines suggest that a strategy of rhythm control may be beneficial in younger, symptomatic patients without underlying heart disease, whereas rate control is an acceptable strategy in elderly patients with minimal symptoms resulting from AF [10].
66.2.1.5 Catheter Ablation The excess mortality and morbidity associated with AF combined with the disadvantages, outlined above, of medical management have provided impetus to efforts aimed at developing more definitive surgical treatments for AF. Radiofrequency catheter ablation post-dates surgical ablation, and is based on one of the principles underpinning surgical approaches, namely that creating full-thickness lesions results in barriers to electrical conduction. As a result of this atrial tissue isolation AF cannot be sustained even when it is induced. Areas of automaticity within the pulmonary veins have been the main target for isolation by this technique, either individually or as a whole. More recent observations that potentials commonly arise from the posterior left atrial
66 Cardiothoracic Surgery: Current Trends and Recent Innovations
wall, superior vena cava, vein of Marshall, crista terminalis, interatrial septum and coronary sinus have resulted in a wider repertoire of catheter ablative techniques, including linear LA ablation and mitral isthmus ablation. There is a lack of prospective data assessing outcomes of these techniques, but freedom from AF reported from observational studies is about 75% [32]. Radiofrequency ablation of the atrio-ventricular node results in complete heart block in over 95% of patients. In selected cases, a permanent pacemaker insertion followed by AV node ablation results in rate control, decreased symptoms and improvement in quality of life. These patients, however, require life-long anticoagulation. Major complications have been reported in approximately 6% of catheter ablation procedures, and include pulmonary vein stenosis, thromboembolism and atrioesophageal fistula formation [10]. The incidence of embolic stroke ranges from 0 to 5%, and has been addressed by maintaining an activated clotting time greater than 300s during the procedure. Atrioeosophageal fistula is a rare complication that is associated with extensive posterior left atrial wall lesion sets, and carries a high mortality.
66.2.1.6 Surgical Ablation Surgical ablation offers the possibility of entirely eliminating macro reentrant circuits in the atria, while preserving sinus node and atrial transport functions. Surgical ablation of AF aims to prevent both induction and maintenance of AF. Induction of AF may be reduced by isolating potential areas of ectopy from the body of the atrial myocardium by creating full-thickness lesions that destroy all connecting conduction tissue. Substrate modification is aimed at reducing the ability of the atrial myocardium to maintain AF. This may be achieved in two ways. First, the amount of atrial tissue available to form reentry circuits may be reduced by isolating, ablating or resecting large areas. Second, creating full-thickness lesions connecting two anatomical structures creates barriers to conduction so that AF cannot be sustained even if it is induced. As described above, paroxysmal AF, which is predominantly caused by specific trigger areas, may be effectively treated by isolating trigger areas [33]. This is in contrast to persistent AF where a more extensive procedure, which includes lesion sets that prevent maintenance of AF, is normally required [34]. The myriad of lesion sets
853
described in the literature can usefully be thought of as falling into one of the three categories: the Cox-maze III, pulmonary vein isolation, and a third broad category of “mini-maze” procedures that encompasses the left and bi-atrial lesion sets that fall somewhere between the Cox-maze III and pulmonary vein isolation in terms of complexity.
66.2.1.7 Cox-Maze III This is the gold standard surgical procedure for AF, with maintenance of sinus rhythm reported at over 95% at 5 years by Cox’s own group [35], and just under 85% in a meta-analysis of 1,500 patients in nine series [36]. The maze reentrant circuits has evolved through three iterations, to the current Cox-maze III procedure. In these maze procedures, multiple lesions are made by a “cut and sew” technique, which has the advantages that lesions are guaranteed to be full thickness, and that the atrial mass can be significantly reduced. This approach mandates cardiopulmonary bypass, adds significantly to the total ischemic time on bypass, and presents the surgeon with not insignificant technical challenges, including the potential for post-procedure bleeding from inaccessible areas at the base of the heart. As a result, the same lesion sets have been created with a variety of energy sources in attempts to simplify the Cox-maze III without reducing its efficacy [34]. Potential problems caused by formation of these lesion sets include bleeding from the base of the left atrial appendage, stenosis of the pulmonary vein orifices, injury to the circumflex coronary artery when connecting the pulmonary veins with the mitral annulus, and esophageal injury during dry radiofrequency ablation of the left atrial wall [34].
66.2.1.8 Pulmonary Vein Isolation Attempts to further increase the applicability of atrial fibrillation surgery have focused on developing limited but efficacious lesion sets that may be performed more rapidly, off bypass and potentially miminally invasively [34, 37]. The findings of Haissegurre et al. [13] focused interest on lesions isolating the pulmonary veins (Fig. 66.1). This may now be achieved with or without cardiopulmonary bypass and a left atriotomy [33]. Pulmonary vein isolation is an effective treatment for paroxysmal AF, with the prevalence of AF at 1 year
854
J. Chikwe et al.
Fig. 66.2 Robotic pulmonary vein isolation
treatment for permanent atrial fibrillation, mini-maze procedures were performed in 265 patients with similar late prevalence of AF to 242 patients that underwent the Cox-maze procedure [34]. There were no significant differences in the rate of postoperative stroke or mortality between the two groups.
Fig. 66.1 Lesions for atrial fibrillation. A surgeon’s view of (a) left sided lesions (b) right sided lesions. The white lines represent full-thickness lesions, which can be created through a “cut and sew” technique as in the original Cox-maze procedure, or using energy sources including cryoablation and radiofrequency. Image adapted from original artwork copyright of Seminars in Cardiovascular and Thoracic Anesthesia
as low as 9% in a retrospective review of 152 patients undergoing pulmonary vein isolation with concomitant cardiac surgery [33], but is ineffective as treatment of persistent or permanent AF since it does not address the primary mechanism underlying these types of AF. Pulmonary vein isolation can also be performed by the use of the Robotic DaVinci system (Fig. 66.2).
66.2.1.10 Energy Sources A variety of alternative methods of creating full-thickness lesions now exist, driven by the need to reduce the complexity, time and the requirement for cardiopulmonary bypass associated with the traditional “cut and sew” approach. The ideal energy source would create rapid and consistent full-thickness lesions, without charring or damaging nearby tissue, with an applicator that could be readily applied to relatively inaccessible areas. A number of commercial options now meet most of these criteria.
66.2.1.11 Radiofrequency Ablation 66.2.1.9 Mini-Maze Type Procedures Although the limitations of the Cox-maze III procedure have deterred most surgeons from routine use of the technique, there remains a need for lesion sets effective in the treatment of persistent and permanent AF. In an observational study of 575 patients undergoing surgical
Radiofrequency energy consists of an alternating current between 300 kHz and 1 MHz, which results in thermal injury [38]. Radiofrequency ablation may be unipolar or bipolar. In unipolar systems, the patient is grounded by an indifferent skin electrode and the current flows from the radiofrequency probe to the tissue in contact with it, where thermal energy is released as a result of resistance
66 Cardiothoracic Surgery: Current Trends and Recent Innovations
to conduction. The disadvantages of this approach include conduction of heat and consequent damage to surrounding tissues including the esophagus, surface charring, which may be thrombogenic, and inconsistently transmural lesions [39]. Bipolar radiofrequency, where current flows between two electrodes placed on either side of the tissue being treated, avoids these problems. Simultaneous measurement of conductance across the tissue being treated is used to assess the transmurality of lesions and guide the duration of each treatment, which is normally about 10 s. Unipolar radiofrequency ablation can therefore be reserved for areas inaccessible to the bipolar clamps, such as the lesions linking the mitral annulus to the right pulmonary veins. Radiofrequency ablation is the most commonly used alternative to the traditional “cut and sew” technique.
66.2.1.12 Cryoablation Cryothermy consists of freezing tissues to temperatures of −160 °C by applying probes cooled by nitrous oxide or argon for up to 2 min at a time [40]. Transmural lesions are produced reliably, although more slowly than by radiofreqency techniques, and there is no surface charring with the technique [41]. Injury to nearby structures is also less problematic than with unipolar radiofrequency ablation, and the introduction of flexible probes that can be moulded to fit the internal contours of the atria have increased the appeal of this energy source [40]. Several groups have published results for a cryo-maze (in which the Cox-maze III lesions are performed by a combination of atriotomy incisions and left atrial maze using cryoablation).
855
66.2.1.14 Ultrasound Ultrsound at 8–10 MHz was initially used in catheterbased approaches. Again, kinetic energy is converted into heat in the tissues to which the probes are applied, resulting in transmural lesions that cause conduction block. A variety of probe designs facilitate delivery of energy in circumferential lesions designed to isolate pulmonary veins as well as linear lesions.
66.2.1.15 Future Directions Electrophysiological mapping studies commonly used in the catheter laboratory may eventually offer surgeons the opportunity to provide a more tailored approach, improving efficacy and reducing the number of lesions required for a satisfactory result. Lesions sets frequently employed in catheter ablation such as isolation of individual pulmonary veins may have a useful role in surgical ablation. Nd:YAG and infra-red lasers have been used experimentally to produce endocardial and epicardial transmural lesions, with minimal charring and heat conduction to surrounding tissues. They have not yet been employed in a clinical setting. But, these new energy sources and improved probe technology for existing energy sources should increase applicability of surgical ablation by reducing the time and exposure required to create a useful lesion set. Comprehensive audit of results within a standardized reporting system should help to identify which methods of surgical ablation offer the safest and most efficacious means of treating atrial fibrillation [42]. Combining advances in technology with an improved knowledge base may open the way for surgical ablation to be routinely offered as a stand-alone procedure to the millions of people with atrial fibrillation.
66.2.1.13 Microwave Microwaves are high-frequency electromagnetic radiation that cause oscillation of water molecules when applied to tissue, converting electromagnetic energy into kinetic energy and hence heat. When applied to atrial tissue, the thermal injury creates conduction block, but without the surface charring and with greater tissue penetration than with unipolar radiofrequency ablation. Several groups have published results for partial Maze procedures using microwave ablation, with results comparable to conventional approaches.
66.2.2 Off-Pump and Minimally Invasive Cardiac Surgery Minimally invasive cardiac surgery is a broad term, which is currently used to mean conventional cardiac operations carried out using smaller incisions and/or using alternatives to conventional cardiopulmonary bypass. A minimally invasive approach involves modifications to patient selection and standard anesthetic technique, as well as endoscopic and robotic adjuncts
856
to facilitate surgery. The main aim of minimally invasive surgery is decreased morbidity and mortality compared with conventional open surgery. Cardiac surgeons are primarily concerned with effective correction of surgical lesions and reduction in operative mortality and stroke, but minimally invasive approaches are rapidly becoming part of the standard cardiac surgical repertoire, thanks to patient demand and major improvements in technology and experience. This section, which together with some of the following sections, draws primarily from two more lengthy review articles [43, 44], outlines the rationale behind minimally invasive approaches, as well as techniques in current use.
66.2.2.1 Rationale for Off-Pump Surgery Off-pump surgery is limited to closed heart procedures, the most important of which is coronary artery bypass grafting. Techniques of cardiopulmonary bypass were refined in the 1960s and 1970s, and the vast majority of surgeons elected to perform coronary artery surgery on-pump: the advantages of a still, clear operating field, good myocardial protection and near complete haemodynamic and respiratory control appeared to outweigh the many disadvantages of cardiopulmonary bypass described above. Recent improvements in experience, as well as in methods of stabilizing the heart without impairing cardiac function, have led to increasing numbers of institutions moving their coronary artery practice to off-pump, in the expectation that patients should see a benefit from reduced rates of operative mortality and stroke, as well as other morbidity. The pathophysiological changes associated with bypass are due to activation of the whole-body inflammatory response as a result of the passage of blood through the pump circuitry, augmented by changes in temperature, acid-base balance, haemodilution, nonpulsatile flow, drugs, circulating volume and the mechanics of bypass, which all contribute to significant dysfunction of blood constituent cascades and whole organ systems. Cardiopulmonary bypass activates five plasma protein systems (the contact system, the intrinsic and extrinsic coagulation pathways, and the complement and fibrinolytic cascade); and five cellular systems, which mediate systemic inflammation (platelet, neutrophil, monocyte, lymphocyte and endothelial cell systems). Activation of these pathways leads to thrombus
J. Chikwe et al.
formation, and patients are therefore aggressively heparinised (they receive 300 units of heparin per kilogram, and supplementary doses during bypass titrated against clotting studies). This is reversed using protamine at the end of the operation so that haemostasis can be achieved. When blood encounters the non-endothelialised surfaces of the CPB circuit and the operative field, plasma proteins are instantly adsorbed onto the surface to produce a protein layer: heparin-coated circuits have recently been developed and they change the reactivity of adsorbed proteins but do not reduce thrombogenicity. Bleeding and thrombotic complications associated with cardiopulmonary bypass are related to activation of platelets and plasma proteins, and heparin. Bleeding times, after full reversal of heparinisation, do not return to normal for up to 12 h after bypass. Disseminated intravascular coagulation, heparin induced thrombocytopaenia (HIT) and thrombosis (HITT) are uncommon but serious complications. Massive fluid shifts, largely into the interstitium, result from increases in systemic venous pressure, volume loading, reduction in plasma protein concentration as a result of dilution and adsorption onto the CPB circuit, and the inflammatory increase in capillary permeability described above. The combined stressors of surgery, hypothermia, CPB, and non-pulsatile flow trigger a hormonal stress response. Levels of cortisol, adrenaline and noradrenaline rise during bypass and remain raised for at least 24 h afterwards, as does blood glucose. Circulating T3 falls below normal range. Release of numerous vasoactive substances in addition to those described above, that act throughout organs systems mean: “cardiopulmonary bypass turns homeostasis into physiologic and biochemical chaos.” Myocardial compliance and contractility falls because of myocardial stunning, ischemia and edema. Myocardial function continues to fall for 6–8 h postoperatively as a result of ischemia-reperfusion injury, before returning to baseline. Vasodilatation and capillary leak mean that there is a progressive requirement for volume resuscitation, despite the additional volume transfused to the patient from the bypass circuit. Pulmonary edema is caused by activation of complement and sequestration of neutrophils in the pulmonary vasculature where they mediate an increase in capillary permeability, which is compounded by the fluid shifts described above. Cardiopulmonary bypass reduces the effect of natural surfactant, compounding pulmonary dysfunction caused by general anesthetic
66 Cardiothoracic Surgery: Current Trends and Recent Innovations
and median sternotomy. Cardiopulmonary bypass increases shunts, reduces compliance and functional residual volume, and can cause acute lung injury. Stroke is primarily due to emboli released during cannulating and clamping the aorta. In a small percentage of cases, stroke is hemorrhagic and is attributed to anticoagulation necessary for bypass. Haemodilution, microemboli, catecholamines, low perfusion pressure, diuretics, hypothermia, aprotinin and haemolysis, all impair renal function. Peptic ulceration is a response to stress, not cardiopulmonary bypass per se. Pancreatitis and mild jaundice are not uncommon. Greater permeability of gut mucosa leads to endotoxin translocation adding to the inflammatory response. OPCAB surgery avoids some of these complications, but it presents the surgeon and the anesthetist with several challenges, which anesthetic and surgical techniques must overcome. These include the reduction in cardiac output when positioning the heart, interruption to coronary blood flow during each distal anastomosis, and reproducibly performing rapid, accurate anastomoses in a perfused, moving surgical field. Haemodynamic changes occur rapidly, but can be anticipated by knowing the sequence of surgery, as well as by adjuncts to standard monitoring such as ST segment analysis, trans-esophageal echocardiography and cardiac output measurements. Inotropic support is frequently required during the distal anastomosis, during which both hyper- and hypovolaemia must be avoided. The use of trans-esophageal echocardiography allows the anesthetist to assess whether the heart is likely to tolerate particular positions, by identifying valvular regurgitation, and ventricular impairment before the heart decompensates. It may be necessary to stop ventilating for short periods to optimize surgical exposure.
66.2.2.2 Evidence for Off-Pump vs. On-Pump Surgery Over 50 randomized controlled trials of off-pump vs. on-pump surgery have now been published, as well as several meta-analyses. OPCAB surgery appears to offer favorable outcomes, leading to reductions in postoperative atrial fibrillation, transfusion requirements, inotrope requirements, ventilation times, length of hospital and intensive care unit stay, and cost compared to on-pump coronary artery surgery. But, no randomized
857
trial has demonstrated a difference in death or stroke at 30 days or 1 year [45]. In the largest randomized controlled trial of offpump surgery reported to date, the Prague-4 investigators randomized 400 patients with coronary artery anatomy considered amenable to both off and on-pump revascularization [46]. Unlike the majority of off-pump trials, poor left ventricular function, advanced age and acute coronary syndromes were not the exclusion criteria. The difference detected at 30 days in the primary end point of combined mortality, Q-wave myocardial infarction and stroke rate between the off-pump group (2.9%) and the on-pump group (4.9%) did not reach statistical significance. The larger of the remaining randomized controlled trials also failed to demonstrate a difference in 30-day mortality and stroke: a significant reduction in transfusion requirements, serum markers of myocardial damage and a cost saving with off-pump surgery in low-risk patients have been shown in randomized trials [47–50]. Initial concerns that the technical demands of off-pump surgery could lead to under-revascularisation and reduced anastomotic patency [48] have not been supported by the larger trials carried out in institutions with well-established expertise in off-pump surgery. These findings were borne out by a large, recent meta-analysis that assessed 95 randomized controlled trials of off-pump vs. on-pump coronary artery bypass grafting, of which 37 trials containing 3,369 patients were judged suitable for inclusion. No significant differences in 30-day mortality, stroke, myocardial infarction, graft patency or reintervention were demonstrated [45]. Several retrospective analyses of large registries, on the other hand, have demonstrated a statistically significant difference in postoperative mortality and stroke. The largest such study looked at over 11,000 off-pump and 106,000 on-pump patients operated on over a 2-year period. Both groups had a similar predicted mortality, but the off-pump group had lower risk adjusted mortality (2.31 vs. 2.93%, P < 0.0001), lower risk adjusted major morbidity (10.62 vs. 14.15%, P < 0.0001), fewer strokes (1.25 vs. 1.99%, P < 0.001), less renal failure postoperatively (3.85 vs. 4.26%, P < 0.001) and fewer rates of postoperative cardiac arrest (1.42 vs. 1.74%, P < 0.01) [51]. Magee et al. reviewed 204,602 patients from a national database of patients operated on between January 1999 and December 2000 [52]. Using a multivariate logistic regression analysis as well as propensity scoring, they
858
J. Chikwe et al.
showed that off-pump surgery conferred a survival benefit compared to on-pump surgery (OR 0.83, 95% confidence interval 0.7–0.96).
66.2.2.3 Minimally Invasive On-Pump Surgery Standard sternotomy and thoracotomy incisions are both associated with problematic postoperative respiratory dysfunction, immobility, wound infections and chronic pain: these complications may be reduced by minimally invasive approaches. The focus of current efforts in this area is modifying conventional cardiac surgical techniques, including standard cardiopulmonary bypass so that they may be carried out safely through small incisions, thorascopically or even robotically.
66.2.2.4 Minimally Invasive Alternatives to Aortocaval Cardiopulmonary Bypass Three main alternatives to standard to aortocaval cardiopulmonary bypass are used in minimal access cardiac surgery: off-pump surgery, which aims to eliminate the pathophysiological sequelae described above, conventional cardiopulmonary bypass via peripheral cannulation, and Port-Access bypass. Femoral, subclavian and axillary approaches to cardiopulmonary bypass have been part of the standard surgical repertoire for decades. Venous drainage into the cardiopulmonary bypass circuit is via cannulae sited in the right atrium or cavae, either directly or via the femoral or internal jugular veins. Arterial return from the bypass machine can be via the femoral, subclavian or axillary arteries. In order to inject cardioplegia into the coronary ostia, the ascending aorta must be cross clamped directly, and cardioplegia instilled into the aortic root. Port-Access bypass is a modified version of femoral-femoral bypass that incorporates a method of occluding the aorta (the endoclamp) and instilling cardioplegia from a peripheral site (Fig. 66.3). The heart can be arrested on bypass, enabling aortic and mitral valve surgery, as well as coronary artery bypass grafting, to be carried out. As the surgeon is unable to visualize the bypass cannulas or the cardioplegia delivery system, additional monitoring is mandatory. Radial arterial lines are sited bilaterally so that migration of
Fig. 66.3 Port-Access bypass illustration of Edwards Lifesciences’ port-access cardiopulmonary bypass system (ENDOCPB). Image copyright Edwards Lifesciences 2008
the arterial cannula or cardioplegia delivery system can be detected. Fluoroscopic or transoesophageal echocardiographic guidance is used to site the cannulae. One or more ports are placed to facilitate the surgical procedure. The endoclamp consists of a triple lumen catheter with an inflatable balloon at the tip: the balloon is inflated to occlude the aorta via the first lumen, the second lumen allows aortic root pressure to be tranduced and can be used as a root vent, and the third delivers cardioplegia. It is possible to site a cannula in the coronary sinus to give retrograde cardioplegia if desired, a useful adjunct if the patient has aortic incompetence, or in long procedures. Contraindications to Port-Access bypass include severe peripheral vascular disease, and intraluminal atherosclerosis of the aortic arch.
66.2.2.5 Minimally Invasive Coronary Artery Bypass Grafting Conventionally performed through a median sternotomy incision, the OPCAB may be modified so that internal mammary artery harvest and bypass grafting can be performed through small anterior thoracotomy (minimally invasive direct coronary artery bypass or MIDCAB), or entirely endoscopically (totally endoscopic coronary artery bypass grafting or TECAB). EndoACAB (Endoscopic atraumatic coronary artery bypass grafting Fig. 66.4) is a half-way house where the internal mammary artery is harvested endoscopically
66 Cardiothoracic Surgery: Current Trends and Recent Innovations
859
66.2.2.6 Minimally Invasive Conduit Harvest
Fig. 66.4 Atraumatic coronary artery bypass
but the coronary anastomosis is performed under direct vision. In “hybrid” procedures, stenosed vessels suitable for percutaneous coronary intervention, usually in the distal circumflex or posterior descending artery distribution that are beyond the limits of the surgical incision, are electively stented postoperatively. Randomized prospective data comparing MIDCAB, TECAB and endo-ACAB to conventional off-pump coronary artery surgery is sparse. Detter et al. compared 256 patients undergoing MIDCAB to 127 patients undergoing conventional coronary artery bypass grafting of LIMA to LAD [53]. Early angiographic patency was comparable with 96% of anastomses patent in the MIDCAB group compared to 98% in the CABG group (P = ns). There was no significant difference between the two groups in terms of hospital mortality or perioperative myocardial infarction. At present, MIDCAB, TECAB and endo-ACAB are limited to a small subgroup of patients with isolated LAD lesions unsuitable for percutaneous coronary intervention. In these patients, in experienced institutions, MIDCAB has shown to be a safe, cost-effective alternative to conventional coronary bypass surgery. These techniques have been compared to PCI: death and stroke rates are equivalent, patency and repeat revascularisation rates are superior in the surgery groups: PCI is, however, more cost-effective in the short term [54]. The higher rate of repeat revascularisation required after PCI compared to CABG, and the increased cost of drug eluting stents mean that this cost advantage disappears over the lifetime of the patient.
The left internal mammary artery can be harvested to a variable extent through several incisions. It is important to obtain a long length (preferably the entire length) to avoid kinking or placing the LIMA to LAD anastomosis under tension. Short lengths will not reach the LAD in patients with distal disease, or chronic obstructive airways disease. There are a number of specialized retractors, which enable the LIMA to be harvested from first to fifth rib space via a limited thoracotomy. Thorascopic IMA takedown is an alternative method of harvesting the IMA, which has several advantages. First, rib resection and retraction are avoided. Second, the entire length of the IMA can be mobilized, avoiding kinking, tension on the anastomosis and IMA steal syndromes resulting from failure to ligate proximal intercostals branches. The disadvantages are few. Harvest takes a little longer, with a longer learning curve. Insufflation of CO2 to improve visibility may result in haemodynamic compromise, and uncontrollable hemorrhage may mandate median sternotomy. Endoscopic harvest has been shown to lead to a reduction in pain without adverse effects on conduit patency [55]. Minimally invasive saphenous vein harvest confers particular benefits. In addition to significant postoperative pain and decreased mobility, the standard method of harvesting the saphenous vein through a long, continuous incision often results in delayed wound healing due to cellulitis, edema, large skin flaps, fat necrosis, haematoma and sympathetic dystrophy. There are several systems for minimal access harvest. In skilled hands, all of them result in comparable quality conduit. An alternative is using a “stripper,” which allows vein to be harvested through multiple small stab incisions without a camera. Meta-analyses comparing minimally invasive with conventional methods of saphenous vein harvest have shown significant reduction in both the rates of wound infection, and in noninfective leg wound complications and length of hospital stay [55–57]. There is little evidence to suggest that endoscopic harvesting has an adverse effect on the quality of conduit.
66.2.2.7 Minimally Invasive Valve Surgery Percutaneous and transapical approaches to valve surgery are discussed in the section Future Developments. Minimally invasive mitral valve surgery via a limited
860
J. Chikwe et al.
66.2.2.8 Thoraco-Abdominal Aortic Aneurysm Repair
Fig. 66.5 Minimal access aortic valve replacement
right thoracotomy has been successfully performed with various modifications in hundreds of patients to date. Access to the mitral valve normally requires selective cannulation of each cavae, which can be achieved via a combination of femoral or Port-access cannulation of the inferior vena cava, and direct superior vena caval cannulation through the thoracotomy incision or via the internal jugular vein. Mean arterial pressure is monitored via bilateral radial arterial lines. Surgery may be safely carried out on the beating or fibrillating heart, avoiding the need to occlude the aorta and administer cardioplegia. Transoesophageal echocardiography is necessary to assess removal of air from the heart and valve function prior to weaning bypass. The aortic valve is accessed through an upper sternotomy incision (Fig. 66.5). The technique is a modified version of the standard open technique, although Port-Access bypass can be used instead of aorto-caval bypass. Retrograde cardioplegia is used if desired, and the heart is vented via the right superior pulmonary vein, or via the dome of the left atrium. The main modification is that the operating surgeon obtains the best view of the aortic valve standing over the patient’s right shoulder. Removal of air from the heart is carried out under echocardiographic guidance in the standard fashion, elevating the apex of the heart to facilitate this using internal defibrillator paddles. While several small, single centre studies have claimed equivalence in terms of early surgical outcome, and reduction in length of hospital stay in the minimally invasive surgical groups [58, 59], there are no randomized controlled trials of minimally invasive valve surgery vs. conventional valve surgery.
Thoracoabdominal aortic aneurysm repair is associated with a mortality of 10–20%, and a risk of paraplegia of approximately 10% despite refinements in surgical technique and spinal cord protection. Large thoracoabdminal incisions are associated with considerable morbidity. Endovascular stenting has been explored as a means of reducing the morbity and mortality associated with thoracoabdominal aneurysm surgery, with combined endovascular and surgical repair of thoracoabdominal aortic aneurysms presenting a recent and exciting development. The abdominal aneurysm is repaired surgically, which allows reimplantation of visceral vessels, and also provides access for endoluminal stenting. Excellent results in selected patients have meant that at experienced institutions endovascular approaches are a useful adjunct in the approach to the high-risk patient with thoracoabdominal aneurismal disease [60, 61]. Minimally invasive approaches to most aspects of adult cardiac surgery are technically feasible. Evidence of equivalence in surgical outcome, stroke and mortality, as well as the expected improvements in postoperative morbidity, length of hospital and quality of life following discharge, is currently limited to relatively small, nonrandomized series at experienced institutions. Large multi-centre randomized trials of minimally invasive vs. conventional approaches are required. Off-pump coronary artery surgery is the single exception to this statement: a growing body of high-quality evidence demonstrates equivalent or superior outcomes with off-pump vs. on-pump surgery.
66.2.3 Ventricular Assist Devices Over two million people worldwide are estimated to have end-stage heart failure requiring maximal medical therapy [62]. Heart failure is a disease associated with advanced age, and as a result of the increasingly elderly population, it is estimated that the number of patients with heart failure in developed countries is likely to double in the next 25 years. Heart transplantation is the definitive treatment for end-stage heart failure offering improved survival and quality of life compared to conventional surgery or medical management, but
66
Cardiothoracic Surgery: Current Trends and Recent Innovations
primarily because of a chronic shortage of donor organs, heart transplants are performed in less than 2,500 patients worldwide each year. Ventricular assist devices (VADs) provide complete circulatory support to left (LVAD), right (RVAD) or both ventricles (BiVAD), allowing patients to be weaned from inotropic support, to mobilize and be discharged home, and in some cases, to recover cardiac function. As a result, the role has been expanded from a bridging device to transplantation, to a permanent (destinational) device in patients ineligible for transplantation, and in rare cases, bridge to recovery and explantation. This section outlines recent developments in assist device technology, and current outcomes in these three roles.
66.2.3.1 Recent Developments in Assist Device Technology Currently, VADs are based on one of the two main models. The earliest devices worked by positive displacement, propelling blood through one-way valves by cyclically changing the internal volume of a pumping chamber. These devices are bulky, as size reduction is limited by the requirement for a minimum stroke volume and the need for valves. They cannot, consequently, be sited within the pericardium and are either placed in pockets created in the abdomen or paracorporeally, both of which techniques have associated morbidity. The main positive displacement devices are made by Thoratec, such as the Heartmate XVE (Thoratec Laboratories Corp., Pleasanton, CA), which is the only device currently approved by the Federal Drug Administration (FDA) for destinational therapy in the United States; and the Berlin Heart (Berlin Heart AG, Berlin, Germany), which is the only such device to come in a range of sizes that can be used in small adults and children. In contrast to positive displacement, the newer generation of devices is based on one or more very small electro-magnetically driven impeller drive shafts, which rotate at very high speed to propel blood forward. These devices can generate up to 10 L of flow per minute, yet are much smaller, non-pulsatile and almost noiseless. These rotary VADs are based on either axial or centrifugal flow. The impellers of axial flow pumps add energy by deflecting blood flow circumferentially, in contrast to centrifugal pumps, which use disc shaped impellers to propel blood by a combination of centrifugal and circumferential velocity. Blood flow through both axial
861
and centrifugal flow pumps is a function of rotor speed, and the pressure difference across the inlet and outlet cannulae, which can be much smaller than those used in positive displacement pumps. Rotary devices are small enough to be placed within the pericardium or the ventricle itself, reducing the morbidity associated with abdominal pockets and paracorporeal placement. Their main disadvantage is the tendency for blood stasis and thrombus formation in areas of low flow, mandating antiplatelet therapy and anticoagulation to reduce thromboembolic complications such as stroke. The main rotary devices in use are the Jarvik 2000 (Jarvik Heart Inc., New York, NY), which is licensed for the use in Europe for bridge to transplantation (BTT), recovery and destinational therapy, in contrast to the USA where it is only FDA approved for BTT, and the HeartMate II (Thoratec Laboratories Corp). There are several new avenues under exploration in VAD technology. Transcutaneous power transfer through electromagnetic coupling avoids the need for transcutaneous power drivelines, which are the primary route of device infection. The LionHeart 2000 (LionHeart Ventricular Assist system, Arrow International, Reading, PA) was evaluated in a trial including 23 patients in 7 European and 1 US centre between 1999 and 2003 with survival comparable to existing devices[63]. Several devices with magnetically levitated motors, such as the VentrAssist™ (Ventracor Limited, ABN) (Fig. 66.6), which reduce the wear on bearings associated with conventional devices, thus expending the life-expectancy of devices, are undergoing trials as BTT support devices. CorAide (Arrow International, Reading, PA) adapts its rotor speed in response to physiological demands, and is currently being tested as a BTT device. Finally, increasing miniaturisation of these devices is a key strategy to reduce the morbidity associated with implantation, extending the patient population that can benefit from them. The currently approved devices offer similar results in terms of survival in three areas of application: BTT, bridge to recovery and destinational.
66.2.3.2 Bridge to Transplantation The percentage of patients supported by a VAD at the time of heart transplantation has risen from under 3% in 1990 to almost one-third in 2004 [64], reflecting both a significant worsening in the risk profile of patients undergoing cardiac transplantation, as well as
862
J. Chikwe et al.
a
c
b Fig. 66.6 VentrAssist™ (Ventracor Limited, ABN) (a) The VentrAssist™ (Ventracor), a third generation ventricular assist device. (b) Cutaway view showing magnetic coils. (c) Diagram showing device once implanted. All images copyright Ventracor Limited 2008
greater experience with and reliability of ventricular assist technology. The first attempt at using intracorporeal mechanical support to bridge a patient to transplantation was performed in 1978 by Normal and colleagues, who supported a patient for 5 days before he succumbed to multi-organ failure post transplantation. Oyer and colleagues at Stanford University Medical Centre first successfully used an implantatble LVAD as a BTT device in 1984. Currently, over 75% of support devices are implanted as BTT according to a recent analysis of the international registry held by
the International Society for Heart and Lung Transplantation [65]. Consequently, the major source of outcomes information from these devices will be BTT therapy, even though this patient group constitutes a high-risk patient subset: a large proportion of these patients receive BTT therapy because they are too sick at the time of referral for heart transplantation. This is reflected in the finding that the primary cause of death in this group is multi-organ failure, despite the fact that these devices yield normal cardiac output immediately after implantation, and explains why the
66
Cardiothoracic Surgery: Current Trends and Recent Innovations
survival at 1 year is significantly lower than that observed in cardiac transplantation. The two strongest predictors of death in patients receiving left VADs as BTT therapy were shown in analysis of the ISHLT database to be right ventricular failure requiring concurrent insertion of a right VAD, and age [65]. Over 70% of patients under the age of 30 receiving and LVAD as BTT survive to transplantation at 1 year, compared to less than 50% of patients over 50 years of age. The negative impact of right ventricular failure appears to have been reduced in a study by Copeland and colleagues, who reported a 79% survival at 1 year in 62 patients with severe biventricular failure who received the SYnCardia CardioWest total artificial heart [66] (SynCardia West Systems, Tucson, AZ), which subsequently gained FDA approval for BTT therapy.
66.2.3.3 Destinational Therapy In the REMATCH (randomized evaluation of mechanical assistance in the treatment of congestive heart failure) trial, 129 patients with end-stage heart failure (New York Heart Association Class IV, left ventricular ejection fraction <25%, and either oxygen consumption <12–14 mL/kg/min or dependence on intravenous inotropic support), who were ineligible for heart transplantation were randomized to receive either maximal medical therapy or permanent implantation of a HeartMate IVE device [67]. The survival at 1 and 2 years was significantly better in 68 patients who received LVADs (52 and 23%, respectively) than it was in 61 patients who received optimal medical therapy (25 and 8%, respectively) P < 0.001. The survival of surgical patients improved over the course of the study, with 37% of patients entering during the second half surviving to 2 years compared to 21% of the patients recruited during the first half (P < 0.001). There was no improvement in medical outcome over the same time period. Symptoms and quality of life improved significantly in the surgical group, but this was not observed in the medically treated group. The Heartmate VXE was subsequently approved for use in the US as a destinational device by the FDA, and outcomes have been similar to the later cohort of patients in the REMATCH trial. The limitations of destinational therapy are significant. The 1-year survival is still significantly lower than
863
that achieved by transplantation (50 vs. 83% respectively) [68]. Moreover, although survival is better than that achieved with medical therapy, the rate of major complications associated with assist devices is double that of medically treated patients, reflected in a much greater number of days of hospitalization (408 vs. 150 days) [69]. Major morbidity includes bleeding requiring re-exploration in the perioperative period, systemic sepsis and multi-organ failure, which is the commonest cause of death in these patients. Late morbidity is predominantly caused by device infection or dysfunction, accounting for a quarter of deaths and a third of late deaths, respectively. Neurological sequelae including thormboembolic and hemorrhagic stroke occur in a significant number of patients. The HeartMate XVE was designed to address some of the aspects predisposing the HeartMate IVE to dysfunction: early results suggest this has been achieved, with a reported 2-year survival rate in experience centers as high as 60%. Selection of patients for destinational therapy is increasingly utilizing composite mortality risk scores, originally designed for BTT patients, such as the Heart Failure Survival Score (HFSS). Right ventricular function, hepatic and renal function, and nutrition are important predictors of outcome. The patients that derive greatest benefit from surgical intervention are those with the most severe decompensation requiring greatest inotropic support, and it is therefore important to identify other outcome predictors to optimize patient selection.
66.2.3.4 Bridge to Recovery End-stage heart failure is characterized by myocyte hypertrophy, progressive interstitial fibrosis and loss of contractility. Several studies have shown that mechanical unloading of the heart by assist devices results in reversal of many of the changes that characterize heart failure. Decreased levels of TNF-a, an inflammatory cytokine produced in the failing heart that induces cardiac enlargement, heart failure and death in experimental animals, have been observed in patients receiving LVAD therapy[70], accompanied by down regulation in the genes that stimulate collagen production and interstitial fibrosis. Chronic mechanical unloading results in decreased myocyte hypertrophy, reduction in left ventricular mass resulting in improved cellular, structural, electrophysiological and hemodynamic function [68].
864
Clinical recovery, to a point where an assist device may be successfully explanted, is observed much less commonly than the evidence of improvement in cellular and molecular markers of failure. Patients with acute nonischemic cardiomyopathy are the most likely to recover to successful explantation: chronic dilated cardiomyopathy is much less amenable to recovery. Currently, approximately 6% of the patients under 30 years of age receiving LVADs in the ISHLT database recover sufficient myocardial function to be successfully explanted, compared to less than 1% of patients aged >50 years [65]. The development of smaller pumps with less associated morbidity may extend the indications for implantation to earlier in the disease process, possibly increasing the likelihood that recipients will recover to explantation. Currently, BTT allows patients with a significantly higher risk profile than previously to survive to transplantation. The chronic shortage of donor organs means that this therapy will remain limited in its application. While destinational and bridge to recovery may potentially be offered to a much wider patient population, it is too early to predict whether improvements in device technology and medical treatment will allow these therapies to become the dominant treatment for end-stage heart failure.
66.3 Molecular and Biological Developments Within the Speciality 66.3.1 Cellular and Tissue Engineering Three new paradigms form the foundation of current developments in cardiovascular cellular and tissue engineering. The first reflects a move away from the traditional view of the heart as an organ composed of terminally differentiated cells incapable of repair, to a concept of select cardiomyocytes able to regenerate through recruitment of residential and circulating stem cells. Although this putative intrinsic mechanism is overwhelmed in clinical myocardial infarction or end-stage heart failure, proponents of stem cell-based therapy see this as a means by which future cellular therapy may operate. The second, controversial concept similarly
J. Chikwe et al.
reflects a departure from conventional thinking, in this case that resident stem cells are incapable of differentiating outside tissue lines, to one where plasticity means that stem cells of one lineage e.g., haematopoietic, can develop into another e.g., cardiac blood vessels, under the correct biological cues. Exploratory, frequently nonblinded, nonrandomised and noncontrolled experimental studies have been held up as demonstrating that stem cell therapy has the ability to improve perfusion and contractility of the heart in three main pathological conditions: acute myocardial infarction (AMI), chronic ischemic heart disease (IHD) and chronic heart failure (CHF). Finally, this section outlines recent developments in tissue engineering where the established concept of seeding expanded autologous cell colonies onto a biological matrix to produce an organized cellular population, has been augmented by simultaneously subjecting the matrix to physical stresses experienced by heart valves to direct tissue differentiation appropriately to produce a functional human heart valve.
66.3.2 Donor Cells A range of stem cells have been used as the basis for studies of regeneration, offering a variety of advantages and disadvantages and comparative studies are lacking. The majority of research groups use unfractionated bone marrow cells, which are relatively easy to harvest, and which contain populations of haematopoietic, mesenchymal and endothelial stem cells that require relatively little additional manipulation. Endothelial stem cells have a role in angiogenesis, being able to differentiate into endothelial cells at sites of neorevascularisation, as well as promoting angiogenesis through paracrine effects. Mesenchymal cells can, under highly specific culture conditions, be differentiated in vitro into cells sharing many properties of cardiomyocytes, including the ability to secrete angiogenic cytokines. Skeletal myoblasts can be harvested from adult skeletal muscle biopsies. Although they retain properties of skeletal muscle when grafted into myocardium, they are not able to electrically couple with cardiomyocytes. Embryonic stem cells are truly totipotential cells harvested from blastocysts, which, under appropriate conditions, can be differentiated into cardiomyocytes displaying structural and
66
Cardiothoracic Surgery: Current Trends and Recent Innovations
functional properties similar to mature human cardiomyocytes, including the ability to electrically couple with host cardiomyocytes when grafted into adult myocardium. Allogenicity of these cells is a significant technical barrier to their wider use, as are the ethical concerns surrounding experimental work using human embryos. Finally, the adult heart contains resident cardiac stem cell populations capable of differentiating into cardiomyocyte or vascular cell types. It is possible to clonally expand these cells from small biopsies.
66.3.3 Methods of Cell Delivery Two main methods of delivery have been used to transport cells to the intended location: intravascular and direct injection into the ventricular wall. Intravascular delivery takes the form of either direct intracoronary injection via a catheter port proximal to a balloon tip intermittently inflated to ensure that donor cell contact with the target region is maximized, to intravenous infusions, which is less invasive but allows donor cells to migrate to any organ. Direct injection into the ventricular wall is the optimal mode of delivery in patients with vascular disease, reducing the ability of intravascular methods to deliver donor cells to the target area. Injections may be performed in one of the three ways: transendocardially, transcoronary vein or transepicardially. Transendocardial and transcoronary vein routes employ percutaneous techniques to introduce catheters across the aortic valve or through the coronary arteries, respectively. Needles delivering the donor cells to the target site may be guided in the case of transendocardial approaches by electrophysiological mapping studies and in the case of trans-coronary vein approaches by ultrasound. The coronary veins are used for this route rather than the coronary arteries to avoid potentially catastrophic artery dissection or rupture, but the approach is technically extremely demanding. Transepicardial injection is carried out under direct vision at the time of cardiac surgery, and although much less challenging as a technique, the relative invasiveness of this route limits its application to a nonsurgical population.
865
66.3.4 Acute Myocardial Infarction Several investigators have sought to determine whether stem cells administered at the time of AMI could modify subsequent pathological LV remodeling, infarct extension and eventual heart failure. Transplantation of haematopoietic stem cells has been shown to improve ventricular function, through improved regional revascularisation, after coronary artery ligation in a murine model of AMI [71]. Clinical trials differ from animal studies as stem cells are administered via the intracoronary route at the time of primary angioplasty after the occluded artery has been successfully stented. Several small studies have shown decreased infarct size and improved regional wall motion in the infarct border zone at 5–14 days after AMI, when patients receiving bone marrow or mesenchymal stem cells were compared with controls receiving standard interventional and medical therapy [72, 73]. Pretreating patients with stem cell factor and granulocyte colony stimulating factor in order to stimulate angiogenesis and myogenesis in the affected area may further augment LVEF, but was associated in one small study with a significant rise in serum creatinine kinase-MB, and late in-stent restenosis [74]. Future research is likely to focus on establishing safety and efficacy of these treatments in larger scale, randomized studies with more appropriate controls, and with particular regard to patients with severe left ventricular dysfunction at the time of primary angioplasty who are likely to benefit most from this intervention.
66.3.5 Ischemic Cardiomyopathy A significant proportion of patients with ischemic heart disease are not candidates for revascularisation because they have very diffuse coronary artery disease, or very poor target vessels. Chronic myocardial ischemia is associated with regional impairment of contractile function, arrhythmias and sudden cardiac death, and these patients are often very symptomatic. Collateral flow and regional contractility have been shown to improve after transendocardial injection of bone marrow and endocardial stem cells in a porcine model of chronic ischemic cardiomyopathy [75]. Transepicardial and transendocardial delivery of stem cells in less than 100 patients
866
across a handful of studies has been reported to be associated with an increase in regional perfusion and wall motion as well as a decrease in reported angina [76]. The synergistic effect of granulocyte colony stimulating factor described in AMI has not been demonstrated in chronic ischemic cardiomyopathy. Randomized studies with more appropriate controls are required to establish the safety and efficacy of these treatments.
66.3.6 Heart Failure Stem cells therapy has been proposed as a means of regenerating functional myocardium within akinetic scar tissue associated with heart failure due to ischemic cardiomyopathy. The main difficulties inherent in this idea are: first, the challenge of delivering stem cells to avascular tissue, and second, the difficulty of delivering the cues essential for growth and differentiation without an effective blood supply. Injection of myoblasts into infarct areas has been shown in animal studies to improve left ventricular ejection fraction, even though these cells do not become electrically coupled with host cells and contractile activity is, therefore, unsynchronized [76]. Clinical trials have taken place using cultured skeletal muscle myoblasts at the time of coronary artery bypass grafting. Although most trials report improvements in regional wall motion and global left ventricular ejection fraction, early and late ventricular arrhythmias have proved to be a significant problem [76]. It has been suggested that myocytes can generate ectopy by generating action potentials, despite the fact they are not electrically coupled to host cardiomyocytes, through local electrical interaction. Future research is likely to focus on establishing safety and efficacy of these treatments in larger scale, randomized studies with more appropriate controls, perhaps restricting treatment to patients that already have or require an intra-cardiac defibrillator. Better techniques for optimizing cell delivery and environmental support may help to improve results further.
66.3.7 Tissue Engineering Severe valvular heart disease affects millions of patients in the US alone, and its prevalence is increasing as the
J. Chikwe et al.
population ages. Repair of diseased valves is possible in a minority of cases; for majority of patients, the definitive treatment is valve replacement. The main options for valve prosthesis are mechanical valves, xenograft bioprosthesis and homografts. Mechanical valves are durable but require life-long anticoagulation, the complications of which include an approximately 2% risk per year of major hemorrhagic or thromboembolic events including stroke. Xenograft bioprostheses, most commonly porcine pericardial valves, do not require anticoagulation but have a life expectancy ranging between 5 and 15 years, depending on the patient and the position in which they are inserted, after which time structural degeneration necessitates reoperaion. Homograft valves, harvested from donor hearts unsuitable for transplantation, have a similar durability to xenograft valves, are relatively limited in supply and technically more challenging to implant. A drawback common to all these valves is their inability to grow, necessitating reoperation in pediatric patients. The ideal valve would be non-thrombogenic and non-allogeneic with the capacity to grow and repair: tissue engineering using clonal expansions of the patient’s own cells supported on a biological matrix may eventually provide the solution to this problem [77]. Tissue engineering involves isolating and culturing the patient’s own cells from a peripheral arterial biopsy, seeding the cells onto a structural matrix on which cell growth and proliferation continues. The matrix must support cell growth providing a biomechanically sufficient scaffold and permit cell-to-cell interaction allowing tissue to form into a functional structure. Three types of matrix are currently used: decellularised fixed heart valves, synthetic biodegradable polymers and biological/polymer hybrids. Xenografts material may be decellularised by a number of methods including enzymatic cell removal and freeze drying, which remove the risk of residual allogenicity while preserving the structural components of the extracellular matrix. Synthetic scaffolds, composed of polyesters or hydroxyl-acids, have an interconnected pore network that allows cell growth, transport of nutrients and metabolites. Polyesters are more difficult to mould into shape than the hydroxyl-acid-based polymers, which are thermoplastic. Cell culture on the structural matrix does not take place in a passive environment: conditioning protocols have been developed to apply flow, shear stresses and strains to the valve, mimicking the conditions it would naturally be subject to in vivo, which contribute to
66
Cardiothoracic Surgery: Current Trends and Recent Innovations
tissue development, allowing cell growth to take place in a fashion much closer to natural valves.
66.4 Diagnostics and Imaging in Cardiac Surgery The last decade has seen major developments in several established imaging modalities, significantly increasing their impact on assessment of patients undergoing cardiac surgery. This section focuses on the role of realtime 3D echocardiography, CMR, multi-slice CT and SPECT.
66.4.1 Three-Dimensional Echocardiography Standard two-dimensional (2D) echocardiography generates accurate real-time cross-sectional images of the heart and great vessels, which can be combined with pulsed- and continuous-wave Doppler and color flow to provide accurate quantitative information about cardiovascular structure and function. Gating acquisition technology, which allows images of a moving structure from multiple planes to be integrated in time allowing for both the cardiac cycle and respiration, has progressed to an extent that it is possible to generate volume-rendered 3D reconstructions of complex anatomical structures, as well as color flow jets that may be viewed in three-dimensions, but these may now be performed in real time, rather than after a lengthy period of data analysis, which previously limited wider application. Three-dimensional echocardiography has several key applications. Assessment of left ventricular function is one of the most important applications of echocardiography in clinical practice. Two-dimensional echocardiography relies on several geometric assumptions to calculate cardiac chamber volumes and ejection fractions, and relies on a number of subjective judgments by the operator, the limitations of which are well recognized. Three-dimensional echocardiography allows simultaneous real-time mapping and display of all left ventricular echocardiographic segments, obviates most of the geometric assumptions and user variability inherent in 3D echocardiography, and generates
867
detailed quantitative data. Three-dimensional echocardiography has been shown to be more accurate than 2D echocardiography in assessment of both right and left ventricular volumes, global and regional function [78]. This is particularly useful in the post cardiopulmonary bypass setting after complex cardiac surgery, where accurate real-time quantification of right and left ventricular dysfunction and response to interventions may be a key part of successful weaning from cardiopulmonary bypass. Three-dimensional echocardiography has perhaps been of most direct use to cardiac surgeons in intraoperative assessment of valvular heart disease. Not only are the images much more intuitive than those offered by standard 2D echocardiography, 3D echo has also afforded a number of insights into mitral valve pathophysiology. Three-dimensional echocardiography played a key role in appreciation of the saddleshape of the mitral annulus, and has greatly increased understanding of the predictive value of leaflet surface, coaptation depth, tethering and tenting, and papillary muscle position for success of subsequent mitral valve repair. The unique ability of 3D echo to display the mitral valve orifice and leaflets in an en-face view allows accurate measurement of leaflet height, and annular dimensions, as well as clear visualization of surgical pathology. It may be that the ability to tailor patient-specific repairs preoperatively should increase the percentage of patients with mitral regurgitation that receive successful repair rather than replacement from less than 50% nationally to closer to the rates of 90% achieved at experienced institutions. In view of the increased mortality and morbidity associated with mitral valve replacement compared to repair, this would be a significant achievement.
66.4.2 Cardiac Magnetic Resonance Imaging Magnetic resonance angiography (MRA) imaging yields high-resolution, animated 2D reconstructions in any plane. Improvements in image gating technology mean that MRA has become the modality of choice for imaging complex anatomical structures and relationships, particularly intracardiac masses, congenital cardiac disease, particularly postoperatively. MRA is routinely used in planning elective aortic aneurysm surgery.
868
MRI is of value in assessing myocardial function, particularly in distinguishing acutely ischemic myocardium from hibernating myocardium (myocardium with depressed function due to chronic ischemia) and from nonviable infracted myocardium [79]. The contrast agent gadolinium accumulates in myocardial scar and fibrotic tissue, and studies have shown close correlation between increased enhancement and infracted myocardium. Contrast enhancement has been reliably used to map infarct size and depth, and identify flow limiting coronary stenoses through calculation of fractional flow reserve. This facilitates risk stratification, the planning of revascularisation strategies and identifying patients with irreversible advanced ischemic cardiomyopathy that will not benefit from revascularisation and should be considered for transplantation or assist device. Magnetic resonance imaging is able to provide better indication of the histological diagnosis than echocardiography, which has traditionally been used to differentiate cardiac masses. Using cardiac MRI as part of a multimodal approach provides important additional information, which reliably allows a distinction to be made preoperatively between the two commonest causes of an intracardiac mass: thrombus and myxoma. The difference is the key as thrombus may be treated medically with anticoagulation, whereas myxoma is an indication for urgent cardiac surgery. Hypodensity in T1-weighted images relative to the myocardium, and hyperdensity in T2-weighted images, characterizes tissue with high extracelluar water content, and is commonly observed in myxoma. In addition, myxomas typically show a heterogeneous appearance in MRI due to areas of necrosis or hemorrhage. In gadolinium enhanced MRI, myxomas generally show a heterogeneous pattern of contrast enhancement due to high neovascularisation. In contrast, atrial thrombi have a brighter appearance than tumor or myocardium in images with short inversion time, a darker appearance in images with long inversion time, and almost never show contrast enhancement.
66.4.3 Multi-Slice Computerized Tomography Multi-slice CT coronary angiography (MSCTA) has moved beyond detecting early markers of coronary atherosclerosis. 64 row multi-detector CT, which images
J. Chikwe et al.
the heart in seconds offers superior spatial resolution to magnetic resonance imaging. Compared with conventional coronary angiography, the sensitivity and specificity of multi-slice CT for detecting significant lesions (defined as >50% stenosis) were 95 and 98%, respectively [80]. MSCTA has been described as a reliable, noninvasive test to rule out coronary artery disease in symptomatic patients, obviating the need for coronary arteriography. This has been of particular use in the preoperative assessment of patients undergoing valvular heart surgery. Additionally, MSCTA provides important information in patients undergoing resternotomy, enabling the precise location of vascular structures such as the ascending aorta and patent coronary grafts relative to the sternum.
66.4.4 Single Positron Emission Computed Tomography Single positron emission computed tomography (SPECT) is an established tool for evaluating the relative importance of coronary stenoses in multi-vessel disease by imaging uptake of radioactive tracers at rest and during exercise. Image gating now allows simultaneous assessment of perfusion and wall motion, quantifying ventricular function and reversibility of ischemia, and guiding surgical revascularisation. SPECT has recently been used to risk-stratify patients undergoing revascularization [81].
66.5 Future Developments 66.5.1 Percutaneous Valve Technology Although developments in operative technique and perioperative practice mean that conventional valve repair and replacement may be carried out even in patients with major comorbidity, with very low operative mortality and morbidity, there remains a substantial proportion of patients that pose such a prohibitive risk of perioperative mortality that conventional surgery is effectively contraindicated. It is hoped that these patients may benefit from percutaneous valve technology, which may eventually evolve to an extent
66
Cardiothoracic Surgery: Current Trends and Recent Innovations
that it is applicable to a wider patient population. Currently, percutaneous mitral valve repair and aortic replacement have been performed as part of clinical trials in a small number of highly selected patients with very variable results.
869
connected by a suture, which can be adjusted to reduce the size of the annulus as well as reposition the papillary muscles. Used in 34 patients, there was no evidence of residual MR at 1 year [83].
66.5.3 Aortic Valve Replacement 66.5.2 Mitral Valve Repair Surgical mitral valve repair has been based on principles established by Alain Carpentier, namely maintaining leaflet flexibility, restoring adequate area of coapattion and stabilizing the annulus. Percutaneous intervention is a radical departure from these principles. MitraCLip (Evalve, Menlo Park, CA) is a clip that can be placed percutaneously via a septal puncture across the free edges of the anterior and posterior leaflets of the mitral valve, mimicking the Alfieri edge-toedge repair [82]. In the phase I feasibility study, it was possible to apply the clip in 24 of 27 patients with moderate to severe central MR, with no procedural complications reported. Major adverse events at 30 days included partial clip detachment (n = 3), stroke (n = 1). Fourteen patients had moderate or less MR at 1 month, and thirteen at 6 months. Phase I trials of a suturing device, which produces a transcatheter edgeto-edge repair (Edwards Lifesciences, Irvine, CA) are also ongoing. Percutaneous annuloplasty has proved more challenging. The aim of surgical annuloplasty is to restore annular geometry, reducing annular dilatation, which occurs predominantly in the septolateral dimension, restoring adequate area of leaflet coaptation. Several methods of achieving a percutaneous annuloplasty have been described, including placing devices within the pericardium, through the myocardium, via the left ventricle or atrium, and finally via the coronary sinus. The latter approach has proved to be the most successful. The coronary sinus is the great vein through which most of the myocardial venous return flows into the right atrium, and it passes in close proximity to the posterior mitral annulus. Coronary sinus devices undergoing Phase I trials include two expandable stents designed to stabilize progressively stiffer traction devices without occluding the coronary sinus (Cardiac Dimensions, Kikland, WA). The Coapsys device (Myocor, Maple Grove, MN) consists of two pads positioned on either side of the left ventricle
Significant aortic stenosis has a prevalence rate of up to 2% in the elderly. Symptomatic aortic stenosis is associated with an average life expectancy of under 5 years if patients present with syncope, less than 3 years if angina is among the symptoms, and less than 2 years if patients complain of exertional dyspnoea, depending upon symptoms [84]. Although the operative mortality rate in selected patients is less than 2%, the majority of patients with aortic stenosis have comorbidity precluding conventional cardiac surgery, and only a minority of patients with severe aortic stenosis ever undergo cardiac surgery. Attempts have been made to ameliorate aortic stenosis percutaneously since the early 1980s. Early efforts using balloon valvuloplasty were not widely adopted as procedural mortality and morbidity, predominantly related to embolic complications, were prohibitively high, with regurgitation and restenosis a major problem in survivors [85]. In 1993, Andersen et al. reported percutaneous implantation of a collapsible tissue valve mounted on a stent in pigs [86]. The barrier to application of this technology in humans, represented first by the heavy calcification usually present in severe aortic stenosis, was addressed by purpose-designed balloons which fracture, but do not fragment the aortic valve, allowing the stent to expand without associated embolism (Fig. 66.7). Careful sizing and positioning of the stent is required to avoid occluding the coronary ostia. The first percutaneous stent valve (Edwards Lifesciences) was implanted for aortic stenosis in 2002, and since then, over 100 patients have been treated with the Edwards valve [87]. Initial reports show low procedural morbidity and mortality [88]. The main drawback of these techniques, initially was the large size of the delivery systems, which risked end-organ and limb ischemia as well as stroke in these high-risk patients. Transapical aortic valve replacement, which is performed via a mini-thoracotomy, was one solution to these problems. This may be superseded in all but the most hostile aortas by more recent
870 Fig. 66.7 Percutaneous aortic valve. (a) Diagram showing the steps involved in percutaneous aortic valve replacement. Balloon valvuloplasty is performed to dilate the stenotic orifice so that the stent mounted valve can be delivered to the aortic annulus where it is expanded by a second balloon inflation, positioned so that the coronary ostia are above the level of the cusps. Image copyright of the Cleveland Clinic Foundation 2008. (b) A stent mounted aortic valve. Image copyright of Corevalve ReValving® system, CoreValve USA 2008
J. Chikwe et al.
a
b
delivery systems, which are down to 18F in size. Nanotechnology may provide another alternative: a mechanical aortic valve, which can be collapsed down to a 10F compatible catheter (Advence Bioprosthetic Surfaces, Sam Antonio, TX).
66.5.4 Pulmonary Valve Replacement Pulmonary valve replacement has focused on patients that have undergone corrective surgery for congenital pulmonary atresia such as found in Tetralogy of Fallot, normally involving the creation of a pulmonary conduit to convey blood from the right ventricle to the lungs. These patients may undergo several operations in a lifetime, associated with increasing risk of mortality and morbidity, and a percutaneous option to reduce the number and impact of
repeat operations is attractive. One study reporting 94 patients who underwent percutaneous pulmonary valve implantation, described success rates of up to 100%, low complication rates, and improvements in most indices of right ventricular function [89]. Complication rates have included homograft rupture, stent migration and coronary artery compression. Freedom from death or explantation at 1 year is 89%.
66.5.5 Robotic Cardiac Surgery The Da Vinci ® system (Intuitive Surgical Inc, Sunnyvale, CA) is the main robotic system in use today. With the patient intubated with a double lumen tube to allow single lung ventilation, positioned supine and the left chest elevated by 20–30, articulating
66
Cardiothoracic Surgery: Current Trends and Recent Innovations
robotic arms are inserted through small port incisions controlled from a distant console by a surgeon in real time. The surgeon has a magnified, 3D surround view of the operative field. The movement of the arms is directly related to movements made at the console by the surgeon, eliminates tremor and has a wide degree of freedom, allowing precise manipulation of surgical instruments within the restricted space. The single greatest drawback is the lack of tactile feedback. Through two working ports and a camera port in the left side of the chest, both the left and right internal mammary arteries may be harvested with electrocautery or a harmonic scalpel. The distal coronary anastomosis may be performed with or without cardiopulmonary bypass. If cardiopulmonary bypass is required, this may be established with peripheral cannulation. It is additionally possible to use an endovascular balloon to occlude the ascending aorta so that cardioplegia can be instilled through a proximal port, arresting the heart facilitating robotic anastomosis (totally endoscopic coronary artery bypass TECAB). Alternatively, stabilizers may be used so that the anastomosis may be performed on the beating heart, with or without cardiopulmonary bypass. A variety of anastomotic devices have been designed: nitinol clips are the most widely used alternative to a hand-sutured anastomosis. Magnetic anastomotic devices are currently undergoing animal studies. More commonly, a 3–4 cm anterior thoracotomy is made overlying the left anterior descending coronary artery, so that a direct hand sewn anastomosis may be performed, utilizing a coronary shunt to maintain blood flow to the distal coronary while diverting it away from the operative field during the anastomosis. In one of the largest series of robotic-assisted coronary artery bypass grafts, Svrivastava et al. performed complete arterial revascularisation using bilateral internal mammary arteries in 148 of 150 patients [90]. The rate of perioperative MI was 2%, and in 55 patients who underwent follow-up angiography, all of 136 grafts were patent [90]. The robotic system has been used to perform mitral valve repair on cardiopulmonary bypass, using a completely thoracoscopic approach with excellent results [91]. Robotic surgery has not been widely adopted by cardiac surgeons, primarily due to the technical complexity and expense associated with this technique. If technology becomes more cost-effective, and the significant issue of lack of tactile feedback is successfully addressed, this may change.
871
66.6 Cardiovascular Surgery Clinical Research Network The National Heart, Lung, and Blood Institute convened a Working Group of experts in cardiac surgery in 2004 to identify critical gaps in knowledge and areas of opportunity in cardiac surgery research [92]. The Working Group’s charge was to develop a list of recommendations for future research directions, focusing on surgical revascularisation, novel surgical approaches, valvular research directions, biotechnology and cellbased therapy at surgery, heart failure, imaging modalities, and barriers to clinical research. The working group recommended creation of a Cardiovascular Surgery Clinical Research Network to support relatively small, short-term clinical studies, and to generate an environment capable of sustaining larger trials including randomized clinical trials, comparing minimally invasive surgery to catheter-based ablation approaches for the cure of chronic atrial fibrillation, on- and off-pump surgery, new valve repair techniques, and comparison of isolated CABG vs. CABG plus mitral repair for mild to moderate mitral regurgitation in the setting of coronary artery disease requiring surgical revascularisation. The Working Group also recommended evaluation of the efficacy of CABG with left VAD implantation for shock complicating AMI, trials with VADs in conjunction with cell-based or gene-based therapy, including organized human dosing studies, and trials of new computer-enhanced modalities for imaging, instrumentation and robotics.
References 1. Cohen DJ (2007) Cardiothoracic surgery at a crossroads: the impact of disruptive technologic change. J Cardiothorac Surg 2:35 2. Utterback J (1994) Mastering the dynamics of innovation: how companies can seize opportunites in the face of technological change. Harvard Business School Press, Boston 3. Christensen C (2003)The innovator’s dilemma. First Harpers Businees Essentials Edition, Harper Collins, New York 4. Chikwe J, Goldstone AB, Raikhelkar J, Fischer A (2009) Current concepts in ablation of atrial fibrillation. In: Seminars in Cardiovascular and Thoracic Anesthesia 5. Feinberg WM et al (1997) Relationship between prothrombin activation fragment F1.2 and international normalized ratio in patients with atrial fibrillation. Stroke prevention in atrial fibrillation investigators. Stroke 28(6):1101–6110
872 6. Furberg CD et al (1994) Prevalence of atrial fibrillation in elderly subjects (the Cardiovascular Health Study). Am J Cardiol 74(3):236–421 7. Go ASet al (2001) Prevalence of diagnosed atrial fibrillation in adults: national implications for rhythm management and stroke prevention: the anticoagulation and Risk Factors in Atrial Fibrillation (ATRIA) Study. JAMA 285(18): 2370–5237 8. Wattigney WA, Mensah GA, Croft JB (2003) Increasing trends in hospitalization for atrial fibrillation in the United States, 1985 through 1999: implications for primary prevention. Circulation 108(6):711–671 9. Creswell LL et al (1993) Hazards of postoperative atrial arrhythmias. Ann Thorac Surg 56(3):539–459 10. Fuster V et al (2006) ACC/AHA/ESC 2006 guidelines for the management of patients with atrial fibrillation: full text: a report of the American College of Cardiology/American Heart Association Task Force on practice guidelines and the European Society of Cardiology Committee for Practice Guidelines (Writing Committee to Revise the 2001 guidelines for the management of patients with atrial fibrillation) developed in collaboration with the European Heart Rhythm Association and the Heart Rhythm Society. Europace 8(9): 651–745 11. Earley MJ, Schilling RJ (2006) Catheter and surgical ablation of atrial fibrillation. Heart 92(2):266–724 12. Moe GK, Abildskov JA (1959) Atrial fibrillation as a selfsustaining arrhythmia independent of focal discharge. Am Heart J 58(1):59–70 13. Haissaguerre M et al (1998) Spontaneous initiation of atrial fibrillation by ectopic beats originating in the pulmonary veins. N Engl J Med 339(10):659–666 14. Ricard P et al (1997) Prospective assessment of the minimum energy needed for external electrical cardioversion of atrial fibrillation. Am J Cardiol 79(6):815–681 15. Yue L et al (1997) Ionic remodeling underlying action potential changes in a canine model of atrial fibrillation. Circ Res 81(4):512–255 16. Lin WSet al (2003) Catheter ablation of paroxysmal atrial fibrillation initiated by non-pulmonary vein ectopy. Circulation 107(25):3176–8313 17. Daoud EG et al (1996) Effect of an irregular ventricular rhythm on cardiac output. Am J Cardiol 78(12): 1433–6143 18. Heppell RM et al (1997) Haemostatic and haemodynamic abnormalities associated with left atrial thrombosis in nonrheumatic atrial fibrillation. Heart 77(5):407–141 19. Grogan M et al (1992) Left ventricular dysfunction due to atrial fibrillation in patients initially believed to have idiopathic dilated cardiomyopathy. Am J Cardiol 69(19): 1570–3157 20. Wolf PA, Abbott RD, Kannel WB (1991) Atrial fibrillation as an independent risk factor for stroke: the Framingham Study. Stroke 22(8):983–898 21. Wolf PA et al (1978) Epidemiologic assessment of chronic atrial fibrillation and risk of stroke: the Framingham study. Neurology 28(10):973–797 22. Arnold AZ et al (1992) Role of prophylactic anticoagulation for direct current cardioversion in patients with atrial fibrillation or atrial flutter. J Am Coll Cardiol 19(4):851–585 23. Hart RG, Tonarelli SB, Pearce LA (2005) Avoiding central nervous system bleeding during antithrombotic therapy: recent data and ideas. Stroke 36(7):1588–9153
J. Chikwe et al. 24. Hart RG, Halperin JL (2001) Atrial fibrillation and stroke: concepts and controversies. Stroke 32(3):803–880 25. Krahn AD et al (1995) The natural history of atrial fibrillation: incidence, risk factors, and prognosis in the Manitoba Follow-Up Study. Am J Med 98(5):476–844 26. Hohnloser SH, Kuck KH, Lilienthal J (2000) Rhythm or rate control in atrial fibrillation – pharmacological intervention in atrial fibrillation (PIAF): a randomised trial. Lancet 356(9244):1789–1794 27. Van Gelder IC et al (2002) A comparison of rate control and rhythm control in patients with recurrent persistent atrial fibrillation. N Engl J Med 347(23):1834–4180 28. Carlsson J et al (2003) Randomized trial of rate-control versus rhythm-control in persistent atrial fibrillation: the strategies of treatment of atrial fibrillation (STAF) study. J Am Coll Cardiol 41(10):1690–6169 29. Wyse DG et al (2002) A comparison of rate control and rhythm control in patients with atrial fibrillation. N Engl J Med 347(23):1825–3183 30. Van Gelder IC et al (1999) Pharmacologic versus directcurrent electrical cardioversion of atrial flutter and fibrillation. Am J Cardiol 84(9A):147R–151R 31. Gosselink AT et al (1994) Long-term effect of cardioversion on peak oxygen consumption in chronic atrial fibrillation. A 2-year follow-up. Eur Heart J 15(10):1368–7132 32. Oral H et al (2006) Circumferential pulmonary-vein ablation for chronic atrial fibrillation. N Engl J Med 354(9):934–491 33. Gillinov AM et al (2006) Surgery for paroxysmal atrial fibrillation in the setting of mitral valve disease: a role for pulmonary vein isolation? Ann Thorac Surg 81(1):19–26; discussion 27–28 34. Gillinov AM et al (2006) Surgery for permanent atrial fibrillation: impact of patient factors and lesion set. Ann Thorac Surg 82(2):502–153; discussion 513–514 35. Prasad SM et al (2003) The Cox maze III procedure for atrial fibrillation: long-term efficacy in patients undergoing lone versus concomitant procedures. J Thorac Cardiovasc Surg 126(6):1822–8182 36. Khargi K et al (2005) Surgical treatment of atrial fibrillation; a systematic review. Eur J Cardiothorac Surg 27(2):258–625 37. Cox JL (2003) Atrial fibrillation II: rationale for surgical treatment. J Thorac Cardiovasc Surg 126(6):1693–9169 38. Gillinov AM, Blackstone EH, McCarthy PM (2002) Atrial fibrillation: current surgical options and their assessment. Ann Thorac Surg 74(6):2210–7221 39. Gillinov AM et al (2004) Bipolar radiofrequency to ablate atrial fibrillation in patients undergoing mitral valve surgery. Heart Surg Forum 7(2):E147–5E12 40. Gammie JS et al (2005) A multi-institutional experience with the CryoMaze procedure. Ann Thorac Surg 80(3):876– 880; discussion 880 41. Lustgarten DL, Keane D, Ruskin J (1999) Cryothermal ablation: mechanism of tissue injury and current experience in the treatment of tachyarrhythmias. Prog Cardiovasc Dis 41(6):481–948 42. Shemin RJ et al (2007) Guidelines for reporting data and outcomes for the surgical treatment of atrial fibrillation. Ann Thorac Surg 83(3):1225–3120 43. Chikwe J, Athanasiou T, Casula R (2006) Recent advances in adult cardiac surgery. Br J Hosp Med (Lond) 67(4): 200–520
66
Cardiothoracic Surgery: Current Trends and Recent Innovations
44. Chikwe J, Donaldson J, Wood A (2006) Minimally invasive cardiac surgery. Br J Cardiol 13:123–218 45. Cheng DC et al (2005) Does off-pump coronary artery bypass reduce mortality, morbidity, and resource utilization when compared with conventional coronary artery bypass? A meta-analysis of randomized trials. Anesthesiology 102(1):188–203 46. Straka Z et al (2004) Off-pump versus on-pump coronary surgery: final results from a prospective randomized study PRAGUE-4. Ann Thorac Surg 77(3):789–973 47. Puskas JD et al (2003) Off-pump coronary artery bypass grafting provides complete revascularization with reduced myocardial injury, transfusion requirements, and length of stay: a prospective randomized comparison of two hundred unselected patients undergoing off-pump versus conventional coronary artery bypass grafting. J Thorac Cardiovasc Surg 125(4):797–808 48. Khan NE et al (2004) A randomized comparison of offpump and on-pump multivessel coronary-artery bypass surgery. N Engl J Med 350(1):21–82 49. Angelini GD et al (2002) Early and midterm outcome after off-pump and on-pump surgery in Beating Heart Against Cardioplegic Arrest Studies (BHACAS 1 and 2): a pooled analysis of two randomised controlled trials. Lancet 359(9313): 1194–9119 50. Ascione R et al (2004) Beating heart against cardioplegic arrest studies (BHACAS 1 and 2): quality of life at mid-term follow-up in two randomised controlled trials. Eur Heart J 25(9):765–770 51. Cleveland JC Jr et al (2001) Off-pump coronary artery bypass grafting decreases risk-adjusted mortality and morbidity. Ann Thorac Surg 72(4):1282–8128; discussion 1288–1289 52. Magee MJ et al (2003) Patient selection and current practice strategy for off-pump coronary artery bypass surgery. Circulation 108(Suppl 1):II9–14 53. Detter C et al (2001) Single vessel revascularization with beating heart techniques – minithoracotomy or sternotomy? Eur J Cardiothorac Surg 19(4):464–740 54. Reeves BC et al (2004) A multi-centre randomised controlled trial of minimally invasive direct coronary bypass grafting versus percutaneous transluminal coronary angioplasty with stenting for proximal stenosis of the left anterior descending coronary artery. Health Technol Assess 8(16):1–43 55. Aziz O et al (2005) Does minimally invasive vein harvesting technique affect the quality of the conduit for coronary revascularization? Ann Thorac Surg 80(6):2407–1244 56. Athanasiou T et al (2003) Leg wound infection after coronary artery bypass grafting: a meta-analysis comparing minimally invasive versus conventional vein harvesting. Ann Thorac Surg 76(6):p. 2141–2146 57. Athanasiou T et al (2004) Are wound healing disturbances and length of hospital stay reduced with minimally invasive vein harvest? A meta-analysis. Eur J Cardiothorac Surg 26(5):1015–2106 58. Leshnower BG, Trace CS, Boova RS (2006) Port-accessassisted aortic valve replacement: a comparison of minimally invasive and conventional techniques. Heart Surg Forum 9(2):E560–4E56; discussion E564 59. Yamada T et al (2003) Comparison of early postoperative quality of life in minimally invasive versus conventional valve surgery. J Anesth 17(3):171–617
873
60. Roselli EE et al (2007) Endovascular treatment of thoracoabdominal aortic aneurysms. J Thorac Cardiovasc Surg 133(6):1474–8142 61. Donas KP et al (2007) Hybrid open-endovascular repair for thoracoabdominal aortic aneurysms: current status and level of evidence. Eur J Vasc Endovasc Surg 34(5):528–353 62. AHA (2003) Heart disease and stroke statistics – 2004 update. American Heart Association, Dallas 63. Pae WE et al (2007) Does total implantability reduce infection with the use of a left ventricular assist device? The LionHeart experience in Europe. J Heart Lung Transplant 26(3):219–229 64. Jaski BE et al (2001) Cardiac transplant outcome of patients supported on left ventricular assist device vs. intravenous inotropic therapy. J Heart Lung Transplant 20(4):449–546 65. Deng MC et al (2005) Mechanical circulatory support device database of the International Society for Heart and Lung Transplantation: third annual report – 2005. J Heart Lung Transplant 24(9):1182–7118 66. Copeland JG et al (2004) Total artificial heart bridge to transplantation: a 9-year experience with 62 patients. J Heart Lung Transplant 23(7):823–381 67. Rose EA et al (2001) Long-term mechanical left ventricular assistance for end-stage heart failure. N Engl J Med 345(20):1435–4143 68. Kirklin JK, Holman WL (2006) Mechanical circulatory support therapy as a bridge to transplant or recovery (new advances). Curr Opin Cardiol 21(2):120–612 69. Taylor DO et al (2004) The Registry of the International Society for Heart and Lung Transplantation: twenty-first official adult heart transplant report – 2004. J Heart Lung Transplant 23(7):796–803 70. Uray IP et al (2002) Left ventricular unloading alters receptor tyrosine kinase expression in the failing human heart. J Heart Lung Transplant 21(7):771–872 71. Orlic D et al (2003) Bone marrow stem cells regenerate infarcted myocardium. Pediatr Transplant 7(Suppl 3):86–88 72. Britten MB et al (2003) Infarct remodeling after intracoronary progenitor cell treatment in patients with acute myocardial infarction (TOPCARE-AMI): mechanistic insights from serial contrast-enhanced magnetic resonance imaging. Circulation 108(18):2212–8221 73. Strauer BE et al (2002) Repair of infarcted myocardium by autologous intracoronary mononuclear bone marrow cell transplantation in humans. Circulation 106(15): 1913–8191 74. Kang HJ et al (2004) Effects of intracoronary infusion of peripheral blood stem-cells mobilised with granulocyte-colony stimulating factor on left ventricular systolic function and restenosis after coronary stenting in myocardial infarction: the MAGIC cell randomised clinical trial. Lancet 363(9411):751–675 75. Ghostine S et al (2002) Long-term efficacy of myoblast transplantation on regional structure and function after myocardial infarction. Circulation 106(12 Suppl 1): I131–6I13 76. Wollert KC, Drexler H (2005) Clinical applications of stem cells for the heart. Circ Res 96(2):151–613 77. Yacoub M, Nerem R (2007) Introduction. Bioengineering the heart. Philos Trans R Soc Lond B Biol Sci 362(1484):1253–5125
874 78. Lang RM et al (2006) Three-dimensional echocardiography: the benefits of the additional dimension. J Am Coll Cardiol 48(10):2053–6209 79. Kramer CM (2006) The expanding prognostic role of late gadolinium enhanced cardiac magnetic resonance. J Am Coll Cardiol 48(10):1986–7198 80. Hoffmann MH et al (2005) Noninvasive coronary angiography with multislice computed tomography. JAMA 293(20): 2471–8247 81. Sabharwal NK, Lahiri A (2003) Role of myocardial perfusion imaging for risk stratification in suspected or known coronary artery disease. Heart 89(11):1291–7129 82. Herrmann HC et al (2006) Mitral valve hemodynamic effects of percutaneous edge-to-edge repair with the MitraClip device for mitral regurgitation. Catheter Cardiovasc Interv 68(6):821–882 83. Mishra YK et al (2006) Coapsys mitral annuloplasty for chronic functional ischemic mitral regurgitation: 1-year results. Ann Thorac Surg 81(1):42–64 84. Ross J Jr, Braunwald E (1968) Aortic stenosis. Circulation 38(1 Suppl):61–76 85. Bashore TM, Davidson CJ (1991) Follow-up recatheterization after balloon aortic valvuloplasty. Mansfield Scientific Aortic Valvuloplasty Registry Investigators. J Am Coll Cardiol 17(5):1188–9115 86. Knudsen LL, Andersen HR, Hasenkam JM (1993) Catheterimplanted prosthetic heart valves. Transluminal catheter
J. Chikwe et al. implantation of a new expandable artificial heart valve in the descending thoracic aorta in isolated vessels and closed chest pigs. Int J Artif Organs 16(5):253–622 87. Eltchaninoff H, Tron C, Cribier A (2003) Percutaneous implantation of aortic valve prosthesis in patients with calcific aortic stenosis: technical aspects. J Interv Cardiol 16(6): 515–251 88. Cribier A et al (2006) Treatment of calcific aortic stenosis with the percutaneous heart valve: mid-term follow-up from the initial feasibility studies: the French experience. J Am Coll Cardiol 47(6):1214–2123 89. Coats L et al (2005) The potential impact of percutaneous pulmonary valve stent implantation on right ventricular outflow tract re-intervention. Eur J Cardiothorac Surg 27(4): 536–453 90. Srivastava S et al (2006) Use of bilateral internal thoracic arteries in CABG through lateral thoracotomy with robotic assistance in 150 patients. Ann Thorac Surg 81(3):800–680; discussion 806 91. Rodriguez E et al (2006) Robotic mitral surgery at East Carolina University: a 6 year experience. Int J Med Robot 2(3):211–521 92. Williams RG et al (2006) Report of the National Heart, Lung, and Blood Institute Working Group on research in adult congenital heart disease. J Am Coll Cardiol 47(4): 701–770
Vascular Surgery: Current Trends and Recent Innovations
67
Mark A. Farber, William A. Marston, and Nicholas Cheshire
Contents 67.1
Innovation Within the Specialty ........................... 875
67.2
New Surgical Techniques Within Specialty ......... 876
67.2.1 Clinical Applications ................................................ 877 67.3
Molecular and Biological Developments .............. 886
67.3.1 67.3.2 67.3.3 67.3.4
Emerging Therapies, Tissue Repair ......................... Growth Factor Therapies .......................................... Living Human Dermal Substitutes ........................... Stem Cell Therapies .................................................
67.4
Imaging and Diagnostics........................................ 888
67.5
Training ................................................................... 888
67.6
Future Developments and Research Focus .......... 889
67.6.1 67.6.2 67.6.3 67.6.4
Aneurysmal Disease ................................................. Thoracic Disease ...................................................... Carotid Disease/Peripheral Arterial Disease ............ Venous Disease/Wound Care/Prevention .................
886 886 886 888
889 891 892 892
References ........................................................................... 892
Abstract Over the past one and one-half decades, the specialty of vascular surgery has undergone a tremendous amount of growth and expansion. Surprisingly, the majority of the developments have occurred in the form of new devices. One should not overlook the global changes among the specialty of vascular surgery. Traditionally, the specialty was knowledgeable about vascular diseases, its etiology and the natural history. Physicians used surgical procedures to remedy or alter the natural course for their patients. Initially, vascular surgical specialists did not invent or adopt percutaneous therapies, but that has all but changed. The new breed of vascular surgeons is a “true” vascular specialist. They possess expertise in all areas of vascular disease: medical management, percutaneous therapy and surgical intervention. Their position is unique, as no other vascular specialists currently possess all three of these capabilities. It enables the vascular surgeon to individualize therapy for each patient based upon his or her risks and benefits, instead of offering only the modality he or she is capable of performing. They will continue to possess this advantage until surgical therapy for vascular problems is no longer necessary.
67.1 Innovation Within the Specialty
M. A. Farber () University of North Carolina, 3025 Burnett Womack, Chapel Hill, NC 27599, USA e-mail: [email protected]
Over the past one and one-half decades, the specialty of vascular surgery has undergone a tremendous amount of growth and expansion. This expansion has been aided by advancements in several areas of medicine, including anesthesia, hypertension management, coronary artery disease treatment, hypercholesterol and COPD therapy. Surprisingly, the majority of the developments have occurred in the form of new devices.
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_67, © Springer-Verlag Berlin Heidelberg 2010
875
876
Since the first introduction of the intravascular stent by Palmaz et al. [1], device modifications have been made in stent material and structure. Flexibility, deployment accuracy, strength, and fatigue resistance have all been drastically improved resulting in two classes of stents: balloon- and self-expandable. In addition to stents, other tools have been introduced, including plaque excision devices such as the Silverhawk atherectomy catheter, and the laser. While these latter two devices have not been tested with as much scientific rigor as the aforementioned stents, they play a role in achieving minimally invasive treatment of vascular disease. Advancements in the specialty have not only occurred in the areas of occlusive disease but also in the management of aneurysms. The first successful minimally invasive repair of an abdominal aortic aneurysm (AAA) occurred in 1991 [2, 3]. Since that time, there have been a plethora of devices to exclude abdominal and thoracic aortic aneurysms with stentgraft combinations, thereby relining the aorta in the diseased segment. While some physicians have publicly intimated that this therapy has been a failed experiment [4], its development has actually been accompanied by a significant amount of testing by engineers on design, fatigue and durability. Although not perfect, this therapy has provided treatment for patients that only several years ago had no options and were consoled about the natural course of the disease. While the treatment of venous disease has always lagged behind advancements in arterial treatment, techniques for thrombus removal and new vena cava filter devices have seen a steady increase in the past several years. On the horizon are new valve replacements and repair. The understanding of venous disorders and their management is rapidly increasing. It has been aided by the use of minimally invasive techniques adopted from arthroscopic instruments to remove varicose veins as well as a growing interest in injection therapy of veins both for cosmetic and therapeutic reasons. One should not overlook the global changes in the specialty of vascular surgery. Traditionally, the specialty was knowledgeable about vascular diseases, its etiology and the natural history. Physicians used surgical procedures to remedy or alter the natural course for their patients. Initially, vascular surgical specialists did not invent or adopt percutaneous therapies, but that has all but changed. The new breed of vascular surgeons is a “true” vascular specialist. They possess expertise in all areas of vascular disease: medical management,
M. A. Farber et al.
percutaneous therapy, and surgical intervention. Their position is unique, as no other vascular specialists currently possess all three of these capabilities. It enables the vascular surgeon to individualize therapy for each patient based upon his or her risks and benefits, instead of offering only the modality they are capable of performing. They will continue to possess this advantage until surgical therapy for vascular problems is no longer necessary.
67.2 New Surgical Techniques Within Specialty While the advent of endovascular devices has led to a drastically different approach to the treatment of aneurysmal disease of the descending thoracic and abdominal aorta, branched and fenestrated devices are still in their infancy and not mainstream as of yet. As a result, several surgeons have relied upon debranching techniques in conjunction with endovascular therapies to widen the applicability of endovascular devices into the aortic arch and visceral section of the aorta. These “hybrid” techniques are showing great promise and have led to a reduction in complications in many series [5, 6]. Patients, who were deemed nonsurgical candidates because of the significant comorbid risk factors, are now able to be treated. However, there has not been a prospective randomized trial comparing these techniques to traditional open repair, nor will there probably ever be given the speed at which devices are being developed. Over the past 50 years, ruptured AAAs has seen limited improvement in results with most series reporting 40–50% mortality in those patients reaching the operating room [7]. However, with the use of EVAR, a dramatic drop in operative mortality is being achieved with most modern day series having mortality rates between 20 and 30% [8, 9]. In order to provide this therapy in an emergency, the vascular specialist must have a wide selection of devices stocked at the hospital and must have a team readily available to perform the procedure. Whether detailed imaging is necessary to adequately perform these procedures is debatable. As this therapy continues to advance, its durability and outcome will be assessed to determine whether it is to be used as a bridge therapy to definitive repair or if it will attain similar results with respect to durability as compared to elective repair of AAA.
67
Vascular Surgery: Current Trends and Recent Innovations
877
Additionally, new surgical techniques are being developed in conjunction with endovascular methods. Traditional balloon embolectomy is rarely performed without guidewire and fluoroscopic guidance. As familiarity and improvements with lytic agents continues, critical limb ischemia (CLI) is being managed nonoperatively or in combination with intraoperative lytic therapy to improve results. As mentioned above, stab phlebectomy is becoming a procedure of the past as new, less invasive methods of injection sclerotherapy and Trivex treatments take hold and are preferably requested by the patients. One thing for certain, given the choice, the patient will invariably choose the less invasive option.
similar, however small differences may exist in certain subgroups such as high-risk patients and those with large aneurysms. These types of patients may be better served with traditional open repair or observation because of the increased secondary intervention rates. That having been stated, the patient’s preference as well as numerous other factors impact the decision of any individual patients for the vascular specialists. Clinical trials involving endovascular treatment of thoracic aneurysmal disease (TEVAR) are somewhat more limited. There is currently only one published prospective industry sponsored trial involving the Gore TAG device in the US [19]. While three other trials have been completed or are near completion, their data are not yet available. Similar to EVAR, TEVAR outcomes in the aforementioned trial have demonstrated a reduction in paraplegia, mortality and morbidity rates in this high-risk population. Interestingly, thoracic aortic pathology involves disease entities other than aneurysmal disease. While devices are being used to treat patients with these other conditions, clinical trial data are lacking and fall outside the instructions for use with the devices. Device modifications and clinical evaluation are an area of active research and development.
67.2.1 Clinical Applications 67.2.1.1 Aneurysmal Disease The treatment of thoracic and AAAs has undergone the most drastic change over the past decade. In this short period of time, most centers are utilizing endovascular devices to treat greater than 60% of the patients afflicted with these conditions. The adoption of this new technique has occurred more rapidly than was experienced with laparoscopic techniques. One overwhelming reason for this has been the impact of clinical trial data. There have been five prospective trials in the US conducted under phase II device evaluation for endovascular repair of AAA (EVAR) [10–14]. These trials date from 1999 to 2004, but all suggest the same trend. While the trials were designed as prospectively randomized investigations, the data reveals that endovascular treatment confers a reduction in major morbidity of greater than 50%. Five-year results have also been published on several of these cohorts. While early mortality was reduced with minimally invasive techniques, mortality at 5 years was not statistically different. This should not come as a surprise as most patients are elderly and expire from other comorbid conditions at a significant rate that impacts the statistics in the long term. There have been three similar trials in Europe that span a similar time period [15–18]. The additional strengths of these trials are that they are not industry sponsored, but are instead physician directed and involve numerous devices. Data are for the most part
67.2.1.2 Aortic Debranching Current limitations of endovascular aneurysmal repair are that major aortic branches cannot currently be crossed with sacrifice. Aneurysms involving the transverse arch and visceral region are currently not amenable to endovascular repair. In an effort to utilize the advantages of endovascular therapies in these areas, debranching techniques are being developed at several centers of excellence around the world. The most common reason for not recommending an endovascular repair of an AAA is an inadequate proximal implantation site. Commercially available stent grafts require a minimal aortic neck length of 15 mm and a neck angle less than 60° to provide an adequate seal zone [20–22]. While it is crucial that the aortic walls in the neck region be fairly parallel and free of significant calcification and thrombus, implanting the device in a region where significant angulation occurs can also jeopardize the durability of the endovascular repair. When adverse neck anatomy is present, visceral reconstruction can be used, providing more favorable
878
Fig. 67.1 Pararenal aortic aneurysm excluded by performing a left ilio-renal bypass and EVAR
anatomy (Fig. 67.1). Often, it may require reconstruction of the celiac, superior mesenteric and/or renal arteries if the aneurysm is pararenal (Fig. 67.2). Occasionally, a complete abdominal aortic debranching may be necessary (Figs. 67.3–67.6). The benefit of complete visceral debranching has been questioned by several authors but does have a role in selected patients with severe., or COPD in whom aortic cross clamping would not be tolerated. It should be noted that before performing visceral procedures, it is important to evaluate the anatomic and hemodynamic status of the visceral section including both renal arteries. A preoperative spiral computed tomography angiogram (CTA) or magnetic resonance angiogram (MRA) with 2–3 mm intervals combined with reconstructions of the aorta provides the most thorough
M. A. Farber et al.
Fig. 67.2 Hybrid repair of a pararenal aortic aneurysm utilizing a hepato-renal bypass to provide adequate neck length and an infrarenal device
evaluation of the aneurysm, the aorta, and its branches. Occasionally, visceral angiography is necessary when adequate noninvasive imaging cannot be obtained. For hemodynamic evaluation, duplex ultrasonography has been used, which has a high degree of accuracy when performed in experienced peripheral vascular laboratories. In addition, it provides baseline information, with which post-reconstruction evaluations can be compared. Multiple factors are involved in the determination of which vessels need to be revascularized. Inspection of the aorta and determining where an appropriate landing zone exists is paramount. Once this has been established, the vascular specialist can then determine which vessels need to be reconstructed. Possibilities include unilateral renal reconstructions, bilateral bypasses based
67
Vascular Surgery: Current Trends and Recent Innovations
879
Fig. 67.4 Repair of a type IV TAAA using retrograde right iliorenal bypass, left ilio-SMA, celiac and left renal bypasses and EVAR Fig. 67.3 Type IV TAAA repair using a thoraco-celiac, thoracomesenteric, thoraco-left renal and right ilio-renal bypass in conjunction with EVAR
upon the celiac axis, supra-celiac aorta or iliac inflow sources, mesenteric reconstructions and complete debranching from either the thoracic aorta or iliac vessels. In the aforementioned hybrid procedures, the aorta is never cross-clamped, allowing for continual perfusion to the abdominal viscera and the spinal cord. Both Fulton et al. [6] and Black et al. [5] reported no incidents of paraplegia following their hybrid procedures. Additionally, the mortality for the hybrid procedure in pararenal and type IV TAAAs was 0%. In our series, we had one endoleak (Type I) that was sealed with the placement of a proximal cuff. No renal impairment requiring dialysis was reported by Fulton et al. [6],
while 2 out of 26 patients required temporary renal support in the patient population reported by Black et al. [5]. Graft patency of visceral bypasses for the hybrid procedures are excellent at 100 and 98% for Fulton et al. and Black et al., respectively [5, 6]. These initial results suggest that similar to infrarenal AAA endovascular repair, visceral bypass combined with endovascular repair of pararenal and TAAAs reduces the morbidity and mortality of open repair.
67.2.1.3 Great Vessel Reconstruction Close proximity or involvement of the aortic arch vessels often excludes arch and proximal descending thoracic lesions from endovascular repair (Fig. 67.7).
880
M. A. Farber et al.
Fig. 67.5 Complete visceral debranching using right ilio-renal and celiac bypass and left ilio-SMA and left renal bypass along with an infrarenal device
Fig. 67.6 Type IV TAAA hybrid repair performed via hepatorenal bypass used in conjunction with a right iliac based inflow and left ilio-SMA and left ilio-renal bypass
Debranching revascularization procedures to reposition the great vessels of the aortic arch more proximally have the potential to expand the number of lesions amenable to stent-graft repair by extending the length of the proximal landing zone. While some specialists consider visceral debranching to offer limited advantages in many patients compared to direct open repair, the same is not true for great vessel debranching. As a result, many options have been designed to facilitate this approach. Open repair of aneurysms involving these vessels usually require hypothermic circulatory arrest, which can impact morbidity and mortality. Partial aortic occlusion and great vessel reconstruction via sternotomy or extra-anatomic bypass in conjunction with TEVAR significantly reduces the impact of the procedure (Figs. 67.8–67.10).
Overstenting the origin of the left SCA effectively expands the length of the proximal landing zone. This is often necessary with type B dissections where the primary entry site is frequently located near the origin of the left SCA [23]. Close proximity to the left SCA is also often seen in patients who sustain traumatic thoracic aortic transactions [24–26]. Concern about the potential of left upper extremity ischemia and vertebrobasilar insufficiency led to the practice of routine left SCA revascularization prior to endograft coverage of the vessel’s origin in order to obtain a proximal landing zone of adequate length (Figs. 67.11 and 67.12) [27, 28]. Additional experience with thoracic endografting suggests that left SCA coverage without revascularization is generally well tolerated with no symptoms, or symptoms that are not limiting to the patient [29, 30].
67
Vascular Surgery: Current Trends and Recent Innovations
Fig. 67.7 Magnetic resonance angiography (MRA) of the thoracic aorta demonstrates a thoracic aortic aneurysm at the level of the arch measuring 6.5 cm in diameter
881
Fig. 67.9 Total aortic arch debranching. Intraoperative completion angiogram after deployment of a Gore TAG thoracic endoprosthesis demonstrates exclusion of the aneurysm as well as patency and location of the bypass graft
Fig. 67.8 Intraoperative picture of the ascending arch to innominate artery and left common carotid artery bypass utilizing a 10 mm Dacron graft
Symptoms may include neurologic signs consistent with vertebrobasilar insufficiency, as well as left upper extremity hypoperfusion such as claudication, rest pain, or ischemia. Later revascularization can be performed depending upon the severity of these symptoms and should be considered in situations of left-handed professionals, younger patients, or when these symptoms are lifestyle limiting. Otherwise, expectant management is appropriate. Additional techniques described to
Fig. 67.10 3D Reconstruction of Total arch debranching postoperatively
revascularize the left SCA include endovascular transluminal graft fenestration [31]. Transluminal placement of endovascular stents at the ostia of the supra-aortic vessels in order to reestablish or improve blood flow
882
M. A. Farber et al.
Fig. 67.11 Options for left subclavian artery revascularization Left subclavian artery to left common carotid artery transposition
Fig. 67.12 Options for left subclavian artery revascularization left common carotid artery to left subclavian artery bypass
through these vessels has also been reported [32, 33]. This intervention was successful in resolving left upper extremity rest pain in one of our patients 1 month after endograft deployment. Total aortic arch debranching by ascending arch to supra-aortic vessel bypass grafting to facilitate endograft repair has been previously described in case reports and limited case series (Figs. 67.13 and 67.14) [34–40]. A recent case series reported by Bergeron et al. [34] included 11 patients. There were no reported cases of CVA or paraplegia. Additionally, other debranching vascular bypass and transposition procedures have been described, which facilitate coverage of the origin of the left CCA and include carotid-carotid bypass (Figs. 67.15 and 67.16) [27, 41, 42]. Short- and long-term advantages of these combined techniques over traditional open repair for aortic arch lesions is unknown given the limited experience with endografting in this vascular region and a lack of long-term data. Supra-aortic debranching bypass procedures can successfully extend the proximal landing zone but the long-term effects of these extra-anatomic bypasses, as
well as durability of thoracic endografts deployed in the aortic arch are unknown.
67.2.1.4 Ruptured AAA (rAAA) The relatively recent application of EVAR to ruptured AAA has partially been driven by persistently poor outcomes associated with open repair since the development of operative reconstruction in the 1950s. A meta-analysis of open repair of rAAAs over the past 50 years by Bown et al. [7] demonstrated a reduction in operative mortality of only 3.5% per decade, with an estimated mortality rate of 41% for the year 2000. In a review of 6,223 patients between 1985 and 1994, an operative mortality rate of 68% was reported [43]. Recent single institution series have demonstrated comparable outcomes with reported 30-day or operative mortality rates of 33–54%. Much of the operative mortality associated with open surgery may be attributed to the additional physiologic insult the patient sustains from the operative intervention. Open surgical repair necessitates operative exposure
67
Vascular Surgery: Current Trends and Recent Innovations
Fig. 67.13 Total aortic arch debranching procedure. Ascending arch to innominate artery bypass with transposition of the left common carotid artery to the bypass graft
through a transabdominal or retroperitoneal approach. Either approach necessitates the use of general anesthesia, which impairs sympathetic tone, potentially precipitating hemodynamic collapse in these often hypotensive and hypovolemic patients. Similarly, sudden decompression of intra-abdominal pressure following opening of the abdomen may also result in an acute drop in blood pressure. Dissection in the retro-peritoneum may potentially precipitate free rupture of the contained hematoma. Operative exposure potentiates hypothermia and blood loss, both of which may contribute to coagulopathy. Cross clamping of the aorta interrupts outflow to the pelvis and lower extremities, and places significant stress on the heart, especially in patients who are hemodynamically unstable, or have preexisting cardiac disease. Additionally, clamping and unclamping may also subject the patient to ischemia reperfusion injuries, as well as potentiating the fibrolytic state.
883
Fig. 67.14 Total aortic arch debranching procedure. Ascending arch to innominate artery bypass with transposition of the left common carotid artery to the bypass graft. The innominate artery is ligated proximally and a metallic clip is placed to mark the distal aspect of the proximal bypass graft anastomosis. Endovascular exclusion of the aortic arch aneurysm
The application of EVAR to rAAAs largely avoids the aforementioned physiologic stress associated with open surgery. Remote access to the aorta through the femoral vessels is largely a minimally invasive procedure, which can often be performed without the need for a general anesthetic. Adequate availability of personal and institutional resources is necessary to successfully provide this form of treatment. Additionally, the patient’s candidacy for EVAR must be evaluated with sufficient imaging studies in a limited amount of time. Minimizing the physiologic insult from the aortic intervention is largely the reason that EVAR has been advocated over conventional open surgery for rAAAs. The minimally invasive nature of EVAR should therefore, at the very least, translate into improved short-term outcomes in hospital morbidity and mortality. These potential benefits, are at present, difficult to assess. EVAR in this setting has been criticized in that it tends
884
Fig. 67.15 Right common carotid artery to left common carotid artery bypass and left common carotid artery to left subclavian artery bypass
to preselect the more stable patients with less complex aortic anatomy largely due to the imaging requirements and anatomic limitations of the endovascular intervention. Many of the reported outcomes are from single institution case series, greatly limited by sample size. Studies comparing EVAR to open surgery for rAAAs often lack randomization, and some have included symptomatic AAAs in their emergent cohorts. Despite these limitations, outcomes are encouraging, and there appears to be a general consensus that EVAR is feasible, and at least equivalent to open surgery for patients with rAAA and suitable vascular anatomy [8, 44]. One prospective randomized controlled trial comparing EVAR with open surgery for rAAAs has been reported [9]. In this single-institution study reported by Hinchliffe et al., all randomized patients were assessed to be candidates for open surgery with a total of 32 patients recruited (n = 15 EVAR; 17 open surgery). Thirty-day mortality was reported on an intention-to-treat basis and demonstrated no difference between EVAR
M. A. Farber et al.
Fig. 67.16 Endograft exclusion of aortic aneurysm after bypass of the left common carotid and left subclavian artery
(53%) and open surgical groups (53%). Additionally, among intervention survivor in both groups, there was no difference in the overall number of moderate or severe postoperative complications (EVAR 77%; open surgery 80%). Although the study design represents a significant improvement over nonrandomized retrospective case series, the low number of patients randomized to both study arms makes conclusions unreliable. Additionally, the mortality rates for both interventions were surprisingly high despite the exclusion of high-risk patients. Harkin et al. performed a systematic review of the literature for studies relevant to EVAR of rAAAs [8]. 8 of the 34 published reports identified, 17 compared EVAR and open surgery. The overall procedure-related mortality was 18% for EVAR and 34% for open surgery (not subjected to statistical analysis). Additionally, of the studies reporting secondary outcomes, the majority reported significant reductions in length of intensive care unit stay, length of procedure, blood loss and transfusion requirements in favor of EVAR.
67
Vascular Surgery: Current Trends and Recent Innovations
885
In another systematic review by Visser et al., the inclusion criteria limited the analysis to ten studies directly comparing EVAR and open surgery [45]. In this review, a total of 148 patients underwent EVAR and 330 underwent open surgical procedures. Although an advantage in favor of EVAR was observed for the pooled 30-day mortality (22 vs. 38%) and systemic complications (28 vs. 56%), the mortality benefit was not significant when adjusting for hemodynamic condition at presentation. Greco et al. performed a review of public discharge data sets from four states (California, Florida, New Jersey, and New York) from 2000 through 2003 [46]. The analysis consisted of 290 EVAR and 5,508 open surgery patients. A significant overall mortality advantage in favor of EVAR was observed when combining data from all four states (39.3 vs. 47.7%), although this observation was not consistent when analyzing outcomes in individual states. Additionally, EVAR was found to be associated with significantly lower rates of pulmonary, bleeding and renal complications, as well as a shorter overall length of stay. EVAR appears to be a technically feasible form of treatment for rAAAs in patients with suitable vascular anatomy. Patient candidacy for EVAR in this setting is largely dependant upon anatomic considerations. Despite the minimally invasive approach of this form of therapy, post-procedure complications are common mainly due to the physiologic insults sustained from the rupture, although procedure-related complications can occur. Although endograft-related complications would be expected to be high given the complex vascular anatomy encountered in this patient population, the long-term durability and risk of secondary intervention in this setting remain relatively undefined. EVAR for rAAA has become the standard of care at many centers of excellence and appears to be at least equivalent to conventional open surgery in the short term.
and surgical intervention and medical management was shown to be better than medical management alone [47, 48]. Early carotid artery stenting procedures, however, were plagued by a high stroke rate. Therefore, early efforts were centered on designing embolic protection devices. Particulate matter can be collected in a filter type mechanism or carotid flow reversed in an effort to prevent it from embolizing to the brain. As the technique of carotid stenting has improved and identification of which patients are at high risk of complications (difficult arch anatomy and calcifications), outcomes have improved. Whether these factors in conjunction with surgical outcomes, disease pathology and stents being adopted from other peripheral interventions slowed implementation is difficult to determine. In certain patients, who possess known anatomic risk factors (radiated neck, high lesion, reoperative field), there is likely an advantage to endovascular therapy [49]. However, in most other patients, this significant advantage does not appear as evident [50]. One must also keep in mind that significant pharmacologic advancements in stroke treatment are also occurring, and results of trials performed almost 20 years ago may not demonstrate similar results today. Current cholesterol, hypertensive and platelet management drugs are showing potential promise and may eclipse all interventional therapy as the best option in most patients however; no significant clinical trial data has been published to date.
67.2.1.5 Carotid Stenting While most areas in vascular surgery have been overrun with new minimally invasive techniques, carotid stenosis management has been the slowest to transition to the new therapeutic options. Based on the NASCET, ACAS and other carotid endarterectomy trials, carotid interventions have increased over the past two decades
67.2.1.6 Venous Disease Although delayed in comparison to arterial disease, percutaneous venous disease is rapidly progressing. Traditional vein stripping has been supplanted by saphenous ablation with radio frequency or laser catheters. In addition, arthroscopic type equipment (Trivex System) is now utilized to remove superficial venous varicosities with a reduction in surgical time, reduced complications and improved cosmetic results. Much of this has been driven by a better understanding and standardization of venous pathophysiology. Percutaneously, interventions are also not limited to infrainguinal disease. Lytic therapy has been instrumental in treating patients with severe iliofemoral DVT and often uncovers an iliac vein extrinsic compression or web (May– Thurner Syndrome) which subsequently can be treated with venous balloon or stent dilatation.
886
67.3 Molecular and Biological Developments Molecular and biological developments have mainly centered on artificial blood vessels development and the prevention or regression of disease with molecular therapy. Artificial blood vessel development either by gene manipulation or by manufacturing a three-layered compliant tube has been investigated for many years. Many characteristics must be designed into such a structure and include lack of thrombogenicity, compliance, resistance to infection, ability to remodel, heal, and contract, all while maintaining the secretory function of the endothelial cells. Research has centered about designing scaffolding constructed of either collagen or biodegradable polymers. Some investigators have eliminated the scaffolding altogether, however patency can at times be extremely poor. While some initial results have been met with exuberance, it is probably at least a decade before we will see approval by the FDA or other regulatory bodies. Prior to this, several hurdles must be overcome. First, the time to produce these vessels must be shortened from several months to days. The current costs of production are significantly high and need to be reduced to become a financially viable option. Finally, regulatory approval of a biodegradable polymer for vascular use needs to be developed. An additional area of interest has developed around inhibition and promotion of vascular growth factors. On one hand, the inhibition of intimal hyperplasia with various drugs is being investigated to improve the current results of therapy to enhance patency and reduce both short- and mid-term complications associated with intimal hyperplasia and thrombosis. Additionally, vascular endothelial growth factors are being investigated to help in the growth of new blood vessels. These two fields can work in conjunction with one another as failure in one region can be investigated as a success in the other [51].
67.3.1 Emerging Therapies, Tissue Repair Chronic non-healing wounds are a common problem leading to significant morbidity due to pain, incapacitation, limb loss and potential death. Among patient groups commonly developing lower extremity ulcers,
M. A. Farber et al.
the presence of diabetes mellitus is a major predictor of limb loss. In the USA, over 80,000 major amputations per year are performed in patients with diabetes. The number of amputations has increased nearly 70% in the last 10 years, and the rate of amputation per person has increased nearly 40%. Standard wound management protocols for patients with diabetic foot ulcers (DFUs) include aggressive pressure offloading, debridement, infection control, revascularization when necessary and wound bed preparation. When performed in systematic fashion, most DFU will improve and eventually heal. Patients who cannot undergo revascularization, or those who develop recurrent infection are at high risk for limb loss. Despite aggressive therapy, the average healing time for most patients treated with standard techniques is over 12 weeks. As long as the wound is open, a high risk of wound infection exists, with the potential for wound enlargement, osteomyelitis and limb loss. Treatment modalities that speed the healing process have the potential to improve limb salvage, as well as hasten the patient’s return to activity and well being.
67.3.2 Growth Factor Therapies The search for active treatments that can be added to standard wound care to speed healing has to date yielded two fundamental concepts: topical application of growth factors, and living human skin equivalents. Currently, there is only one growth factor approved for the treatment of chronic limb ulcers, Regranex (OrthoMcNeil Pharmaceutical, Inc., Raritan, NJ), a formulation of PDGF-B. Having been studied in several multicenter randomized prospective trials, it was found to significantly accelerate the healing rate of DFU in comparison to standard treatment alone. Unfortunately, the expense of Regranex and lack of reimbursement for Medicare patients has limited its usefulness in the elderly population that often presents with recalcitrant DFU. The use of Regranex and its multicenter studies have been described in detail elsewhere.
67.3.3 Living Human Dermal Substitutes In theory, the application of healthy living dermal cells may transform chronic non-healing wounds back into
67
Vascular Surgery: Current Trends and Recent Innovations
887
an acute wound through the expression of multiple growth factors, matrix proteins and glycosaminoglycans, all at appropriate times. It is hoped that this process will allow more rapid progression to a healed wound. The technique of tissue engineering has been under development for years, encompassing the harvesting of cells, separation, purification, preservation and growth of cell lines. To date, two preparations utilizing living human dermal components have been approved for use in lower extremity ulcers. Revolutionary products, Apligraf [52] and Dermagraft [53] represent the result of years of development to allow commercial use of living tissue. Apligraf (Organogenesis Inc., Canton, MA), a bilayer, is composed of both dermal fibroblasts and epidermal cells, seeded on a bovine collagen matrix (Fig. 67.17). It has been studied in prospective trials leading to FDA approval for use in the USA for both DFU and lower extremity ulcers due to chronic venous insufficiency.
Dermagraft (Advanced BioHealing, La Jolla, CA) is an allogeneic, human neonatal derived dermal fibroblast culture grown on a biodegradable scaffold (Fig. 67.18). The fibroblasts secrete a rich concentration of extracellular matrix components including collagen and glycosaminoglycans. The resulting three-dimensional matrix may be implanted into chronic non-healing wounds to supply functional fibroblasts and their corresponding expressed proteins. The scaffold biodegrades over a 1–2 week time period leaving behind only cellular components and proteins. The matrix is cryopreserved, allowing shipment and storage locally until needed. Typical shelf life is approximately 6 months. Both Apligraf and Dermagraft produce a diverse array of cytokines involved in the tissue repair process, including growth factors, interleukins and angiogenic factors. Both products have been validated in multicenter, prospective randomized controlled trials to accelerate healing of chronic non-healing wounds. Of note, both living tissue equivalents were also found to
Fig. 67.17 Histology of Apligraf compared to normal human skin
Fig. 67.18 (a) Dermal fibroblasts seeded onto a bioabsorbable scaffold (Vicryl mesh). (b) After 2 weeks, a living dermal substitute has formed, which can support the migration, proliferation and stratification of an epidermis
888
reduce the incidence of infectious complications in studies of DFUs. Dermagraft is FDA approved for the treatment of DFUs and Apligraf is approved for the treatment of limb ulcers of diabetic and venous etiology. Currently, several second-generation skin substitutes are under development in clinical trials. These living skin equivalents will attempt to improve on the benefits provided by Apligraf and Dermagraft while reducing the costs associated with the use of living tissues.
67.3.4 Stem Cell Therapies For patients with CLI who are not currently candidates for revascularization, several concepts are under development for the delivery of autogenous stem cells to ischemic tissue. The concept is that patients with arterial obstruction in the lower extremity may be induced to develop new capillary networks by injection of vascular progenitor cells into the ischemic tissue [54]. Harvesting of stem cells is generally performed by direct bone marrow aspiration or by the mobilization of stem cells into the periphery through the use of a stimulating factor followed by apheresis [55]. Processing of the progenitor cells may be critical, with the potential to select and produce a high concentration of vascular progenitor cells being a key issue. Reinjection of the patient’s own concentrated stem cells can be performed into the muscular beds of ischemic tissue to hopefully induce development of rich capillary networks that will alleviate symptoms. Currently, these strategies are in early phase clinical trials that may eventually yield clinically meaningful therapies for limb salvage and pain control in patients with CLI.
67.4 Imaging and Diagnostics Many of the advancements that have been made with endovascular therapy would never have been possible if it were not for the improvements in imaging and diagnostics. Over the past decade, we have seen CT scan and MRA imaging improve to such a point that we can now image at intervals less than 1 mm. As a result, the precise determination and location of vessels can be identified, as well as its orientation and relationship to
M. A. Farber et al.
other structures. The use of diagnostic interventional angiography has all but disappeared, as three-dimensional imaging provides the clinician with additional important information in planning treatment for their patients. It should be noted however that imaging of this degree is not applicable to all patients. MRA still lags behind CTA with respect to fine detail resolution, especially for small vessel disease. Fortunately, diagnostic angiography in conjunction with vascular ultrasound can be utilized to accurately diagnose and treat these patients. A section on vascular imaging would not be complete unless vascular ultrasound was mentioned. While initially developed over 30 years ago, it has advanced along with CT and MRA. Many gold standards are now based upon diagnostic ultrasound criteria and include the identification of carotid stenosis and the presence of deep venous thrombosis. New techniques are still being developed and researched, and include the use of power doppler and advanced techniques referred to as acoustic radiation force impulse (ARFI) ultrasound to aid in the management of patients with vascular disease [56, 57]. Ultrasound is also no longer completely noninvasive and nontherapeutic. Intravascular ultrasound is now being used to aid in guiding therapy and is extremely helpful in evaluating thoracic aortic dissections and helps in reducing the amount of contrast utilized with interventions. In addition, both intravascular and transcutaneous doppler are being utilized during the interventions to determine the success of therapy.
67.5 Training As a result of the rapidly changing field of vascular disease, the training of specialists in the area is undergoing major upheaval. Traditionally, three specialists have been involved with patients stricken with vascular disease: cardiologists, vascular surgeons and interventional radiologists. While each of these specialists has significantly different training and background, they overlap considerably in their abilities. As a result, patients are being evaluated and treated by each of these specialists. However, it appears that those individuals who actually manage the patients and are not just proceduralists, are growing significantly in numbers and will probably be managing these patients in the future.
67
Vascular Surgery: Current Trends and Recent Innovations
889
From a surgical training standpoint, things have also changed. While several decades ago vascular surgery was considered a core component of general surgery, we have become super specialists that today have more in common with nephrologists, diabetologists, radiologists and cardiologists than we do with other surgical subspecialties. The techniques and tools that are required and type of training is vastly different from other areas in surgery. Many specialists function as the patient’s primary care physician, radiologists and interventionalist. As a result, in many countries, the vascular surgeons have segregated from the general surgeons and opted for a separate medical specialty certification. Not surprisingly, medical students interested in operating may no longer opt for vascular surgery as a specialty since much of our involvement with patients has little to do with a scalpel and traditional surgical procedures. For this reason, emphasis has be made to approach medical students interested in patients with vascular disease that may be contemplating other specialties such as cardiology or interventional radiology instead of courting surgical resident in their senior years of training. By doing so, vascular training programs in the United States are expanding their education to involve 5 years dedicated to the treatment of vascular problems. This is similar to what neurosurgical and urological specialties have done in the past. Whether this will result in more competent vascular specialists will have to wait until these programs mature. One thing that is important, those interested in the treating patients with vascular problems should become competent in all areas of vascular disease management, including medical, interventional and surgical therapy.
through clinical trials. These include Enovus (Boston Scientific, Natick, MA), Aptus (Aptus Endosystems Inc, Sunnyvale, CA), Aorfix (Lombard Medical, Oxfordshire, OX) and Zenith Fenestrated and Branched (Cook Medical, Bloomington, IN). Each of these devices addresses a different challenge that exists with EVAR. While the performance criteria of previous devices has been set extremely high, it will remain to be seen whether any of these devices will outperform or increase the suitability for patients with challenging anatomy. Lastly, the environment for clinical trial testing of EVAR device has changed as a result of the excellent results obtained with current devices. No longer are there “no options” for the high-risk patient as advanced endovascular specialists have been trained across the country with access to numerous devices and techniques.
67.6 Future Developments and Research Focus 67.6.1 Aneurysmal Disease 67.6.1.1 Abdominal Aortic Aneurysms It has been 3 years since the approval of the latest device for infrarenal endovascular aortic repair (EVAR) by the FDA. Several devices with new design concepts and potential improvements have been introduced
67.6.1.2 Current Limitations All currently approved devices have an infrarenal neck length requirement of 1.5 cm to provide successful exclusion of the AAA. In addition, the neck must be free of significant thrombus and calcification and not be angled greater than 60°. Implanting current devices outside these guidelines results in suboptimal outcomes and places the patient at increased risk of device failure and potential rupture [58, 59]. Addressing these issues with newer devices may lead to an increase of suitability for EVAR from 10 to 40%. However, these improvements must not be at a detriment to current results, which provide a greater than 95% protection from rupture at 5 years. While device iterations have led to both flared and tapered components for treating various iliac diameters, intentional hypogastric exclusion is still required in 10–20% of patients. While tolerated in most, severe gluteal ischemia can significantly impact the quality of life of the patient. Additionally, ischemic complications from bilateral embolizations have been reported [60, 61]. A system designed to preserve hypogastric perfusion would improve outcomes in this group of patients. Lastly, long-term durability may need to be addressed as larger longitudinal studies are published. Most patients with AAA have a life expectancy of less than 10 years and current device fatigue testing is required to simulate 10 years of stress. As devices are implanted into younger patients, these criteria may need to be altered.
890
67.6.1.3 Specific Treatment Concerns Neck Quality The amount of calcification and thrombus present in the aortic neck has been of concern to vascular interventionalists since some devices rely upon this region for both fixation and exclusion of the AAA. However, the degree of calcium and thrombus that is considered significant varies among physicians. In our experience, it has been the reason for exclusion for EVAR in only a small number of patients when compared to other anatomic factors. A novel concept introduced by the Enovus device was to utilize a “gasket” concept to accomplish exclusion, which would potentially remedy this problem. This concept would allow sealing in adverse neck characteristics such as unusual shape, calcified and thrombotic regions. However, as the result of fractures in the supra-renal stent, the device is currently no longer used in clinical trials. Whether other mechanism can be used to accomplish exclusion in the “poor quality” necks such as staples or biologic mechanisms, this remains to be seen. Neck Angle When the angle of the neck to the axis of the aneurysm exceeds 60°, then the success of EVAR declines. Multiple factors play a role in this unfavorable outcome. As the angle becomes more severe, the device does not align itself with the neck in a parallel fashion, thereby reducing the amount of apposition in the sealing region. In addition, the position of the device is generally lower in relation to the renal arteries since positioning becomes more difficult in an angled orientation. Device modifications are being introduced that allow for a more flexible design in the sealing neck region without compromising fixation. One must keep in mind however that the forces acting downward on the device are greater in these severely angulated necks and therefore more robust fixation may be needed to secure the device appropriately. The correct balance between these two aspects must be kept in mind as these devices are designed and tested. Neck Length Patients with short necks pose an extremely difficult problem for the vascular specialist. While an infrarenal
M. A. Farber et al.
device can be implanted into a neck with less than 15 mm of length without significantly increasing major adverse outcomes, the future implications are sometimes dramatic. Many implantations into short necks can be accomplished with the impression that one has accomplished exclusion of the aneurysm based upon intraoperative angiography. However, on follow-up, with more sensitive imaging, the patient is identified as having a type Ia endoleak. This negatively impacts future options, as current fenestrated devices are more difficult to implant successfully when prior aortic endovascular repairs have been conducted. This, therefore restricts the patients’ future options to visceral debranching, open conversion or observation, and excludes them from undergoing a fenestrated implant, especially when a short-bodied device is used. There have now been greater than 2,000 fenestrated implantations worldwide. In the US, a few individuals have pioneered the majority of this work and currently investigational sites are awaiting FDA approval to continue work on the fenestrated devices manufactured by Cook Medical. Initial results are extremely promising and the technique has been described previously [62, 63]. While early results compared to traditional open surgery appear to provide improved outcomes with respect to perioperative morbidity and mortality, it remains to be seen whether the technique will provide mid- and longterm outcomes similar to that of standard EVAR.
67.6.1.4 Branched Concepts Current branched designs are underway to address two issues: hypogastric artery involvement and TAAA. Iliac artery involvement occurs in 40% of all AAA. Current standard of care is to embolize one hypogastric artery if it is not important in pelvic blood flow contribution. While this provides an adequate sealing region in the external iliac artery, gluteal claudication, which can occasionally be severe, does develop. Despite reports of patients recovering after several months of collateral development, some patients never recover fully. In addition, there have been detrimental outcomes when both hypogastric arteries have been excluded even though some authors have advocated its safety [64]. As a result, efforts are underway to provide branched iliac components to maintain hypogastric blood flow in this setting. While early designs have been successful outside the US, no current trial is underway.
67
Vascular Surgery: Current Trends and Recent Innovations
891
Recently, Chuter reported on his initial experience with branched TAAA devices [65]. These devices allow for the extension of this technology into the visceral segment, however, they are extremely complex. It is anticipated that several design improvements will be needed. These devices will remain in development for several years to come, but provide promise for advancements in this technology.
successful implant without these critical components could be extremely difficult to accomplish. As the bioengineering and material science technology develops, we may see additional options that do not currently exist today. It has been more than 17 years since the first EVAR implant in humans. Current devices are extremely robust, provide excellent clinical outcomes and are applicable to greater than 60% of the population. Future device iterations are mainly aimed at improving applicability to a larger proportion of the population by addressing proximal neck anatomic issues that currently exist. When initially designed, these problems were not recognized or completely understood. As the medical community continues to work together with design engineers, it is hopeful that these problems will be solved with improvement in patients afflicted with aneurysmal disease.
67.6.1.5 Migration/Fixation Current data suggests that all devices can migrate, and that migration is a time-dependant variable. This applies not only to infrarenal devices, but also to those with transrenal fixation. It is difficult to draw any strong conclusions concerning the different migration risks between current devices. However, when used outside their intended IFU guidelines, devices have a greater risk of migration compared to clinical trial results [58]. Although current devices perform extremely well, this is an area of active research and a region for improvement with new devices. Endovascular stapling devices are being developed in order to improve fixation and attempt to eliminate any migration potential. Aptus Endosystems is currently conducting a phase II trial involving such a concept, with early results appearing promising. Whether endovascular stapling eliminates the concern for migration is yet to be seen. One must be cautious as improvements in one aspect of a design can lead to other modes of failure such as material fatigue and stent fracture. If this fixation technique is successful, then it may be applied to other problematic areas such as component separation and Type Ia endoleaks that can often be difficult to address with current technology.
67.6.1.6 Metal/Fabric Fatigue Material fatigue has been seen in an extremely small number of devices. Current iterations have redundancy built into their design and rarely has fatigue played a role in device failure after FDA approval. While current designs are tested for 10 years of durability, additional testing may be necessary if long-term data suggest potential failure. Eliminating the metal-fabric interaction by designing absorbable stents or unsupported devices could eliminate these issues, however obtaining a
67.6.2 Thoracic Disease There are many unmet needs in the treatment of thoracic aortic disease. Current devices have a profile that is too large to be inserted via a femoral artery approach. This is secondary to the size of the prosthesis that needs to be inserted and a consequence of the patient demographics, which contains a higher proportion of women (40%) compared to infrarenal disease (20%). As a result, one-fifth of patients require an iliac approach to have a device implanted. Many physicians take a simplistic approach to the thoracic aorta and consider it to be a simple tubular structure. However, dynamic imaging data reveal that it is an extremely dynamic tube with a significant amount or curvature and tortuosity. As such, early devices were designed to provide pushability via stiffness. However, this is a significant drawback in patients with long aneurysms as the catheter must traverse four regions that contribute to the tortuosity: iliac arteries, abdominal aorta, distal thoracic aorta and the arch. After traversing each region, the interventionalist has less control and deliverability of the device. New systems are being desired with delivery catheter support that allow for pushability and improved flexibility without damage to the delivery system or the device. Deployment accuracy is also of importance. Only one current device allows for precise control of both the proximal and distal aspects during implantation.
892
This can have significant implications as the catheters are longer and precise deployment is critical as most aneurysms approach either the great vessels of the visceral section. If mal-deployed, additional pieces need to be inserted, this increases the risk of stroke and other complications with the procedure. Flexibility of the device and conforming to the thoracic aorta has also been a problem with first-generation devices. Mal-apposition results in disturbances of flow through the aorta and can cause attachment site endoleaks and may lead to device collapse and fatigue if severe enough. While this is overcome using bare stent configuration at the ends, this is not without risks and can potentially result in aortic perforation if positioned incorrectly. Current IFUs for thoracic device have centered on thoracic aneurysmal disease. However, unlike the infrarenal aorta, several other thoracic diseases are present, which have been treated with the current devices. Limitations and complications are being recognized around the world. As such, companies are actively involved in designing disease-specific devices to treat different pathologies such as thoracic transections and dissections. The next frontier that will be descended upon is the ascending aorta and transverse arch. Branched vessel designs have been implanted in compassionate use cases and devices designed to treat specific ascending aortic pathologies. However, as a result of increased movement/forces, shorter aortic segments and presence of branched vessels, this has been extremely difficult to engineer. While still in design and development, the next 5–10 years should prove rewarding for patients with disease in these locations, including involvement of valvular heart disease.
67.6.3 Carotid Disease/Peripheral Arterial Disease Current clinical trials are being designed to incorporate state-of-the-art medical therapy and compare those results to surgical and interventional outcomes. Since the intellectual property arena is severely crowded in this area, it remains to be seen whether new carotid stent systems will emerge to mitigate some of the current issues. Current carotid treatments are rarely hindered by patency or restenosis rates. This is in distinct contrast to all other peripheral arterial disease treatments, which
M. A. Farber et al.
are hampered by intimal hyperplasia and/or recurrent disease. Areas of intense research involve one of three basic principles: plague removal (laser, or atherectomy devices), arterial rechanneling (current stent balloon systems), or medical therapy for plaque regression of stabilization. New materials are being evaluated for use in the medical field and included bioabsorable stents, bioactive stents and polymer stent designs. Their impact on intimal hyperplasia, disease treatment efficacy and complication rates are yet to be determined. If history repeats itself, one should probably not bet against technologic advancements of any kind.
67.6.4 Venous Disease/Wound Care/ Prevention While arterial disease receives much more attention than venous disease, this area is probably the most active of the research topics. Technology is being used to develop venous valves to help treat reflux disease and patients with post-phlebitic syndrome. Techniques to more rapidly relieve acute deep venous obstruction are being designed and include mechanical devices as well as new lytic agents. As imaging improves, our understanding of the hemodynamic alterations that occur and how to reverse them will increase. This invariably will lead to alternative treatment options for patients afflicted with this disabling disease. With respect to molecular advancements in vascular disease, future developments will evaluate many alternative treatments surrounding prevention and genetic manipulation to induce the growth of blood vessels. Additional areas of active research include identifying methods to halt or reverse numerous pathways, including atherosclerosis, aneurysmal disease, venous thromboembolism and growth factors.
References 1. Palmaz JC, Sibbitt RR, Reuter SR et al (1985) Expandable intraluminal graft: a preliminary study. Work in progress. Radiology 156:73–77 2. Parodi JC, Palmaz JC, Barone HD (1991) Transfemoral intraluminal graft implantation for abdominal aortic aneurysms. Ann Vasc Surg 5:491–499
67
Vascular Surgery: Current Trends and Recent Innovations
893
3. Parodi JC (1995) Endovascular repair of abdominal aortic aneurysms and other arterial lesions. J Vasc Surg 21:549– 555; discussion 556–547 4. Collin J, Murie JA (2001) Endovascular treatment of abdominal aortic aneurysm: a failed experiment. Br J Surg 88:1281–1282 5. Black SA, Wolfe JH, Clark M et al (2006) Complex thoracoabdominal aortic aneurysms: endovascular exclusion with visceral revascularization. J Vasc Surg 43:1081–1089; discussion 1089 6. Fulton JJ, Farber MA, Marston WA et al (2005) Endovascular stent-graft repair of pararenal and type IV thoracoabdominal aortic aneurysms with adjunctive visceral reconstruction. J Vasc Surg 41:191–198 7. Bown MJ, Sutton AJ, Bell PR et al (2002) A meta-analysis of 50 years of ruptured abdominal aortic aneurysm repair. Br J Surg 89:714–730 8. Harkin DW, Dillon M, Blair PH et al (2007) Endovascular ruptured abdominal aortic aneurysm repair (EVRAR): a systematic review. Eur J Vasc Endovasc Surg 34: 673–681 9. Hinchliffe RJ, Bruijstens L, MacSweeney ST et al (2006) A randomised trial of endovascular and open surgery for ruptured abdominal aortic aneurysm – results of a pilot study and lessons learned for future studies. Eur J Vasc Endovasc Surg 32:506–513; discussion 514–515 10. Carpenter JP (2004) Midterm results of the multicenter trial of the powerlink bifurcated system for endovascular aortic aneurysm repair. J Vasc Surg 40:849–859 11. Greenberg RK, Chuter TA, Sternbergh WC III et al (2004) Zenith AAA endovascular graft: intermediate-term results of the US multicenter trial. J Vasc Surg 39:1209–1218 12. Moore WS (2003) The Guidant Ancure bifurcation endograft: five-year follow-up. Semin Vasc Surg 16:139–143 13. Peterson BG, Matsumura JS, Brewster DC et al (2007) Fiveyear report of a multicenter controlled clinical trial of open versus endovascular treatment of abdominal aortic aneurysms. J Vasc Surg 45:885–890 14. Zarins CK, White RA, Moll FL et al (2001) The AneuRx stent graft: four-year results and worldwide experience 2000. J Vasc Surg 33:S135–S145 15. EVAR, Trial Participants (2005) Endovascular aneurysm repair and outcome in patients unfit for open repair of abdominal aortic aneurysm (EVAR trial 2): randomised controlled trial. Lancet 365:2187–2192 16. EVAR, Trial Participants (2005) Endovascular aneurysm repair versus open repair in patients with abdominal aortic aneurysm (EVAR trial 1): randomised controlled trial. Lancet 365:2179–2186 17. Blankensteijn JD, de Jong SE, Prinssen M et al (2005) Twoyear outcomes after conventional or endovascular repair of abdominal aortic aneurysms. N Engl J Med 352:2398–2405 18. Greenhalgh RM, Brown LC, Kwong GP et al (2004) Comparison of endovascular aneurysm repair with open repair in patients with abdominal aortic aneurysm (EVAR trial 1), 30-day operative mortality results: randomised controlled trial. Lancet 364:843–848 19. Makaroun MS, Dillavou ED, Kee ST et al (2005) Endovascular treatment of thoracic aortic aneurysms: results of the phase II multicenter trial of the GORE TAG thoracic endoprosthesis. J Vasc Surg 41:1–9
20. Chaikof EL, Blankensteijn JD, Harris PL et al (2002) Reporting standards for endovascular aortic aneurysm repair. J Vasc Surg 35:1048–1060 21. Dillavou ED, Muluk SC, Rhee RY et al (2003) Does hostile neck anatomy preclude successful endovascular aortic aneurysm repair? J Vasc Surg 38:657–663 22. Sternbergh WC III, Carter G, York JW et al (2002) Aortic neck angulation predicts adverse outcome with endovascular abdominal aortic aneurysm repair. J Vasc Surg 35:482–486 23. DeSanctis RW, Doroghazi RM, Austen WG et al (1987) Aortic dissection. N Engl J Med 317:1060–1067 24. Borsa JJ, Hoffer EK, Karmy-Jones R et al (2002) Angiographic description of blunt traumatic injuries to the thoracic aorta with specific relevance to endograft repair. J Endovasc Ther 9(Suppl 2):II84–II91 25. Duhaylongsod FG, Glower DD, Wolfe WG (1992) Acute traumatic aortic aneurysm: the Duke experience from 1970 to 1990. J Vasc Surg 15:331–342; discussion 342–343 26. Siegel JH, Smith JA, Siddiqi SQ (2004) Change in velocity and energy dissipation on impact in motor vehicle crashes as a function of the direction of crash: key factors in the production of thoracic aortic injuries, their pattern of associated injuries and patient survival. A Crash Injury Research Engineering Network (CIREN) study. J Trauma 57:760–777; discussion 777–778 27. Czerny M, Zimpfer D, Fleck T et al (2004) Initial results after combined repair of aortic arch aneurysms by sequential transposition of the supra-aortic branches and consecutive endovascular stent-graft placement. Ann Thorac Surg 78:1256–1260 28. Dake MD, Miller DC, Mitchell RS et al (1998) The “first generation” of endovascular stent-grafts for patients with aneurysms of the descending thoracic aorta. J Thorac Cardiovasc Surg 116:689–703; discussion 703–704 29. Czermak BV, Waldenberger P, Perkmann R et al (2002) Placement of endovascular stent-grafts for emergency treatment of acute disease of the descending thoracic aorta. AJR Am J Roentgenol 179:337–345 30. Gorich J, Asquan Y, Seifarth H et al (2002) Initial experience with intentional stent-graft coverage of the subclavian artery during endovascular thoracic aortic repairs. J Endovasc Ther 9(Suppl 2):II39–II43 31. McWilliams RG, Murphy M, Hartley D et al (2004) In situ stent-graft fenestration to preserve the left subclavian artery. J Endovasc Ther 11:170–174 32. Larzon T, Gruber G, Friberg O et al (2005) Experiences of intentional carotid stenting in endovascular repair of aortic arch aneurysms – two case reports. Eur J Vasc Endovasc Surg 30:147–151 33. Mitchell RS, Dake MD, Sembra CP et al (1996) Endovascular stent-graft repair of thoracic aortic aneurysms. J Thorac Cardiovasc Surg 111:1054–1062 34. Bergeron P, Coulon P, De Chaumaray T et al (2005) Great vessels transposition and aortic arch exclusion. J Cardiovasc Surg (Torino) 46:141–147 35. Dietl CA, Kasirajan K, Pett SB et al (2003) Off-pump management of aortic arch aneurysm by using an endovascular thoracic stent graft. J Thorac Cardiovasc Surg 126: 1181–1183 36. Drenth DJ, Verhoeven EL, Prins TR et al (2003) Relocation of supra-aortic vessels to facilitate endovascular treatment of
894 a ruptured aortic arch aneurysm. J Thorac Cardiovasc Surg 126:1184–1185 37. Kato M, Kaneko M, Kuratani T et al (1999) New operative method for distal aortic arch aneurysm: combined cervical branch bypass and endovascular stent-graft implantation. J Thorac Cardiovasc Surg 117:832–834 38. Kato N, Shimono T, Hirano T et al (2002) Aortic arch aneurysms: treatment with extraanatomical bypass and endovascular stent-grafting. Cardiovasc Intervent Radiol 25: 419–422 39. Melissano G, Civilini E, Bertoglio L et al (2005) Endovascular treatment of aortic arch aneurysms. Eur J Vasc Endovasc Surg 29:131–138 40. Schumacher H, Bockler D, Bardenheuer H et al (2003) Endovascular aortic arch reconstruction with supra-aortic transposition for symptomatic contained rupture and dissection: early experience in 8 high-risk patients. J Endovasc Ther 10:1066–1074 41. Criado FJ, Barnatan MF, Rizk Y et al (2002) Technical strategies to expand stent-graft applicability in the aortic arch and proximal descending thoracic aorta. J Endovasc Ther 9(Suppl 2):II32–II38 42. O’Neill-Kerr D, Shaw D, Gordon M et al (2004) Carotidcarotid bypass prior to endoluminal exclusion in a patient with acute type B aortic dissection. Cardiovasc Intervent Radiol 27:182–185 43. Lawrence PF, Gazak C, Bhirangi L et al (1999) The epidemiology of surgically repaired aneurysms in the United States. J Vasc Surg 30:632–640 44. Lee WA, Huber TS, Hirneise CM et al (2002) Eligibility rates of ruptured and symptomatic AAA for endovascular repair. J Endovasc Ther 9:436–442 45. Visser JJ, van Sambeek MR, Hamza TH et al (2007) Ruptured abdominal aortic aneurysms: endovascular repair versus open surgery – systematic review. Radiology 245: 122–129 46. Greco G, Egorova N, Anderson PL et al (2006) Outcomes of endovascular treatment of ruptured abdominal aortic aneurysms. J Vasc Surg 43:453–459 47. Anon (1991) Beneficial effect of carotid endarterectomy in symptomatic patients with high-grade carotid stenosis. North American Symptomatic Carotid Endarterectomy Trial Collaborators. N Engl J Med 325:445–453 48. Anon (1995) Endarterectomy for asymptomatic carotid artery stenosis. Executive Committee for the Asymptomatic Carotid Atherosclerosis Study. JAMA 273:1421–1428 49. Landis GS, Faries PL (2007) A critical look at “high-risk” in choosing the proper intervention for patients with carotid bifurcation disease. Semin Vasc Surg 20:199–204 50. Ouriel K (2006) Carotid stent trials: past, present, and future. Perspect Vasc Surg Endovasc Ther 18:300–303 51. Leon L, Greisler HP (2003) Vascular grafts. Expert Rev Cardiovasc Ther 1:581–594
M. A. Farber et al. 52. Veves A, Falanga V, Armstrong DG et al (2001) Graftskin, a human skin equivalent, is effective in the management of noninfected neuropathic diabetic foot ulcers: a prospective randomized multicenter clinical trial. Diabetes Care 24: 290–295 53. Marston WA, Hanft J, Norwood P et al (2003) The efficacy and safety of dermagraft in improving the healing of chronic diabetic foot ulcers: results of a prospective randomized trial. Diabetes Care 26:1701–1705 54. Tateishi-Yuyama E, Matsubara H, Murohara T et al (2002) Therapeutic angiogenesis for patients with limb ischaemia by autologous transplantation of bone-marrow cells: a pilot study and a randomised controlled trial. Lancet 360: 427–435 55. Burt R, Pearce W, Luo K et al (2003) Hematopoietic stem cell transplantation for cardiac and peripheral vascular disease. Bone Marrow Transplant 32(Suppl 1):S29–S31 56. Dumont D, Behler RH, Nichols TC et al (2006) ARFI imaging for noninvasive material characterization of atherosclerosis. Ultrasound Med Biol 32:1703–1711 57. Mauldin FW Jr, Zhu HT, Behler RH et al (2008) Robust principal component analysis and clustering methods for automated classification of tissue response to ARFI excitation. Ultrasound Med Biol 34:309–325 58. Fulton JJ, Farber MA, Sanchez LA et al (2006) Effect of challenging neck anatomy on mid-term migration rates in AneuRx endografts. J Vasc Surg 44:932–937; discussion 937 59. Leurs LJ, Kievit J, Dagnelie PC et al (2006) Influence of infrarenal neck length on outcome of endovascular abdominal aortic aneurysm repair. J Endovasc Ther 13:640–648 60. Bratby MJ, Munneke GM, Belli AM et al (2008) How safe is bilateral internal iliac artery embolization prior to EVAR? Cardiovasc Intervent Radiol 31:246–253 61. Engelke C, Elford J, Morgan RA et al (2002) Internal iliac artery embolization with bilateral occlusion before endovascular aortoiliac aneurysm repair-clinical outcome of simultaneous and sequential intervention. J Vasc Interv Radiol 13:667–676 62. Greenberg RK, Haulon S, O’Neill S et al (2004) Primary endovascular repair of juxtarenal aneurysms with fenestrated endovascular grafting. Eur J Vasc Endovasc Surg 27: 484–491 63. Haddad F, Greenberg RK, Walker E et al (2005) Fenestrated endovascular grafting: the renal side of the story. J Vasc Surg 41:181–190 64. Mehta M, Veith FJ, Ohki T et al (2001) Unilateral and bilateral hypogastric artery interruption during aortoiliac aneurysm repair in 154 patients: a relatively innocuous procedure. J Vasc Surg 33:S27–32 65. Chuter TA, Rapp JH, Hiramoto JS et al (2008) Endovascular treatment of thoracoabdominal aortic aneurysms. J Vasc Surg 47:6–16
Breast Surgery: Current Trends and Recent Innovations
68
Dimitri J. Hadjiminas
Contents 68.1
Breast Cancer ......................................................... 895
68.2
Diagnosis ................................................................. 896
68.3
Rationale for Loco-Regional Resectional Surgery in Breast Cancer Patients ....................... 897
68.4
Factors Determining Local Recurrence ............... 898
68.4.1 Importance of Margins ............................................. 898 68.4.2 The Importance of Post-Operative Radiotherapy ............................................................ 898 68.4.3 Other Pathological Parameters and the Age-Factor ................................................... 899 68.5
Ductal Carcinoma in Situ (DCIS) ......................... 899
68.6
Surgery for Breast Cancer..................................... 900
68.6.1 Oncoplastic Resections ............................................ 900 68.6.2 Reconstruction after Total Mastectomy ................... 900 68.6.3 The Management of the Axilla ................................ 900 68.7
Systemic Therapies in Breast Cancer ................... 901
68.7.1 Hormonal Manipulation ........................................... 901 68.7.2 Neo-Adjuvant Chemotherapy .................................. 902 References ........................................................................... 903
Abstract Although the incidence of breast cancer has increased substantially in the last 2 decades, mortality from the disease continues to decline. Possible explanations for this paradox include the fact that a substantial proportion of the increase in the incidence is due to earlier diagnosis of asymptomatic patients through screening and better and more appropriate treatment. The diagnostic process is widely known as triple assessment and has been proven very accurate and sensitive for the detection of the breast cancer, providing all three components i.e. clinical examination, imaging and cytology are in absolute concordance. The exact role of MRA in the prediction of tumour recurrence through the detection of multi-focal disease remains to be proven. With regards the local recurrence and survival, although in some cases cells from primary breast cancers may disseminate early, often during the pre-symptomatic stage, the potential of the cancer to establish clinical metastases evolves with time in many cases. It is therefore important to try to diagnose and treat the tumours as early as possible with treatments that can provide a very low incidence of local recurrence. In this chapter, we also outline the role of systemic therapies such as hormonal and neoadjuvant chemotherapy in the treatment of Breast Cancer.
68.1 Breast Cancer
D. J. Hadjiminas Department of Breast and Endocrine Surgery, St Mary’s Hospital NHS Trust, Praed Street, London W2 1NY, UK e-mail: [email protected]
Breast cancer is the commonest cancer in women in the western world. There are over 42,000 new cases per annum in the UK with just over 300 in men. Although the incidence of breast cancer has increased substantially in the last 2 decades, mortality from the disease continues to decline. Possible explanations for
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_68, © Springer-Verlag Berlin Heidelberg 2010
895
896
this paradox include the fact that a substantial proportion of the increase in the incidence is due to earlier diagnosis of asymptomatic patients through screening, better and more appropriate treatment, some cancers may have been induced by widespread use of HRT and these tumours could carry a better prognosis, changes in demographics account for some of these extra cancer diagnoses and these cancers may also carry a different prognosis.
68.2 Diagnosis For over a decade, diagnosis of breast cancer for symptomatic patients depends on the combination of clinical examination, breast imaging including mammography and ultrasound and fine needle aspiration cytology. This process, widely known as triple assessment, is very accurate and sensitive for the detection of breast cancer, providing all three components, i.e. clinical examination, imaging and cytology, are in absolute concordance. If one component of the triple assessment process is not definitively benign or definitively malignant, sensitivity for cancer becomes unacceptably low and further investigation is necessary. This is most often in the form of core biopsy, but rarely, open surgical biopsy may be necessary. Because of the near 100% sensitivity of triple assessment when all three components are in concordance, other imaging modalities are largely unhelpful for the symptomatic patient. MRI, though helpful in other settings, is so beleaguered by its relatively low specificity that it rarely, if ever, showed indications for the symptomatic undiagnosed patient. Mammography has been the mainstay for the initial imaging in population screening of asymptomatic patients. In this setting, mammography is supported by ultrasound and cytological or histological assessment when necessary. Mammographic screening in women between 40 and 75 years of age has been shown to reduce breast cancer mortality by at least 21% according to the most recent meta-analysis of the Swedish randomised trials [1]. This mortality advantage is likely to be an underestimate of the true benefit because it does not take into account the fact that women that presented symptomatically during the screening trial were more likely to receive systemic therapies compared with screen-detected patients; this would have
D. J. Hadjiminas
reduced the difference in breast cancer mortality that is attributable to screening alone. In addition, the control groups in these trials were also offered mammographic screening after the first 3–6 years, thus reducing the difference in outcome. Despite the unquestionable role of mammography in screening asymptomatic patients, its value has been disputed for the younger patients with radiologically denser breasts. Recent trials of MRI mammography in young, high-risk women, mostly with family history of breast cancer, have demonstrated that the combination of MRI and conventional mammography can almost double the yield of asymptomatic cancers diagnosed [2]. However, there are a significant number of cancers in these young, high-risk women that are diagnosed by mammography only and have normal MRI, therefore both imaging modalities are necessary to optimise yield. This latter group of women tends to present with areas of DCIS and micro-calcifications and has a high incidence of BRCA-2 mutations. An alternative way of obviating the difficulties of the dense breast of some younger women is tomomammography. Tomo-mammography is not as time consuming as contrast-enhanced breast MRI and does not require the administration of intravenous contrast. It allows the radiologist to “see” beyond the dense breast tissue that obscures the detail of the X-ray by focusing at one cross-section of the breast at a time. It may prove to be the most practical method of screening women with dense breasts in the future. A number of trials have also shown that breast MRI in patients already diagnosed with breast cancer often demonstrates multi-focal disease that was previously occult to mammography or ultrasound. Whether this will translate into a lower local recurrence rate or indeed whether this multi-focality can predict local recurrence remains to be seen. A randomised multicentre trial that addressed the latter question has not been reported yet (COMICE trial). Many authors believe that the multi-focality identified by MRI may simply highlight foci that would normally be eliminated by adjuvant radiotherapy. If this is the case, routine pre-operative assessment by MRI could increase the number of mastectomies unnecessarily without any reduction in the local recurrence rate. Clinicians must exercise caution when interpreting the findings of breast MRI before the outcome of trials, such as the COMICE, is reported.
68 Breast Surgery: Current Trends and Recent Innovations
68.3 Rationale for Loco-Regional Resectional Surgery in Breast Cancer Patients Breast cancer differs from other common adenocarcinoma tumours in that the local disease cannot itself cause the demise of the patient. This is because of the position of the breast away from important viscera and visceral functions essential for the patient’s life. As a result, breast cancer can be used to monitor the biology of the cancerous process in general. Over the years, the relationship between loco-regional control and systemic spread of the tumour has been hotly debated. Resection of the primary tumour has been the cornerstone of loco-regional control for breast cancer since the inception of treatment for this disease. Historically, William Halsted and others in the late nineteenth century popularised the en-block resection of the whole breast, together with pectoralis major and all three levels of axillary lymph nodes. This operation is known as radical mastectomy and continued to be the most popular operation for breast cancer until the early seventies. Halsted believed that breast cancer spread in an orderly fashion from the primary tumour via the lymphatics towards the regional lymph nodes and that the regional lymph nodes formed a barrier to further disease spread. This theory failed to explain why up to 30% of patients with documented node-negative disease subsequently die of metastatic deposits and was challenged by Bernard Fisher and others when the first trials comparing lumpectomy with mastectomy were analysed in the 1980s [3]. These trials showed that even groups of patients who had inadequate local control had approximately the same survival as those patients who had very good local control. The latter observation led Fisher to propose his theory of “predetermined” prognosis for breast cancer, irrespective of the adequacy of loco-regional control. According to the Fisherian theory, breast cancer metastasises very early, long before clinical presentation and involvement of regional lymph nodes merely represents a picture of a tissue other than the tumour-bearing breast that happens to receive the total burden of disseminating cells on their way from the primary tumour towards distant tissues. It is likely that many of these cells are not capable of establishing metastases in other tissues and therefore the presence of lymph node metastases
897
may simply distinguish tumours with a greater inherent potential to establish metastatic disease. Fisher used his hypothesis to support the view that improvement of prognosis for breast cancer patients could only be achieved with the introduction of effective systemic therapies. Unlike surgery and radiotherapy, systemic treatment modalities like chemotherapy or hormone therapy can potentially have an effect on disseminated cells and alter their capacity of establishing clinical metastases. More recently, longer follow up on trials that included groups with high and low local recurrence rates were reported and supported the view that high local recurrence rates, particularly after mastectomy, are associated with significantly higher mortality rates both in terms of breast cancer specific as well as overall survival [4]. In the Danish trial [4], the group randomised to receive post-operative radiotherapy and that had lower local recurrence actually received less chemotherapy than the no-radiotherapy group, supporting further the view that local recurrence can reduce patient’s survival and therefore should be avoided at all cost. Most groups with “low” local recurrence rates in these trials received radiotherapy as an adjunct to surgical ablation of the tumour [5]. These results brought Fisher’s theory into question at least for some of the more aggressive and larger breast cancers at the time of diagnosis. In addition, with the advent of sentinel lymph node biopsy, Halsted’s ideas of the “barrier-like” function of the lymph nodes have been revived. According to most sentinel lymph node studies, up to 40% of nodepositive patients have their nodal disease confined to the first echelon of lymph nodes, something that would not fit with Fisher’s theory of random metastases in the axillary lymph nodes. If, as according to Fisher’s theory, lymph node metastases were a random event, second and third echelon lymph nodes would be involved as frequently as first echelon lymph nodes, which is apparently not the case. Additional evidence from the randomised trials of breast screening also suggests that prognosis of breast cancer patients is not pre-determined. The latest analysis of the Swedish overview of screening trials [1] showed a 21% reduction in breast cancer-specific mortality. Contrary to changes in survival, this reduction in mortality cannot be attributed to stage migration or lead-time bias and should imply that the probability of developing metastases from a breast cancer is at least partially dependent on the time of diagnosis.
898
Further support for the view that there is a survival benefit attributable to resection of the primary, comes from studies comparing primary tamoxifen treatment with resectional surgery in the elderly. All of these trials showed that survival is equal with surgery alone or tamoxifen alone, implying that there must be a benefit from surgery since we do know that there is a survival benefit from tamoxifen treatment [6, 7, 8]. Trials that compared the combination of surgery and tamoxifen with tamoxifen alone showed a survival benefit in favour of those patients who received the combination of surgery and tamoxifen [9, 10, 11]. These results also indicate that prognosis is not pre-determined but evolves with the primary and worsens with time. In summary therefore, although in some cases cells from primary breast cancers may disseminate early, often during the pre-symptomatic stage, the potential of the cancer to establish clinical metastases evolves with time in many cases. It is therefore important to try to diagnose and treat the tumours as early as possible with treatments that can provide a very low incidence of local recurrence.
68.4 Factors Determining Local Recurrence
D. J. Hadjiminas
[13]. Veronesi was the first to establish that margin width was important with his Milan II trial [14]. In the latter trial, patients were randomised to have tumourectomy or quadrantectomy; patients having tumourectomy received more radiotherapy as a local boost to compensate for the narrower margins. Despite the larger radiotherapy dose, the tumourectomy group had significantly higher local recurrence. This was the only time where the importance of margin width was examined in a randomised fashion. Although the trial was criticised for not necessarily obtaining pathologically clear margins in the tumourectomy group, it provides reasonable evidence that supports the view that wider margins are associated with lower recurrence rates. Further support for this view comes from the exceptionally low local recurrence rates reported by the Milan III trial in the quadrantectomy alone without radiotherapy arm [5]. In this trial, Veronesi obtained very low local recurrence rates after quadrantectomy alone in older patients, comparable to local recurrence rates achieved after lumpectomy and radiotherapy. The addition of radiotherapy had no significant impact in these patients who had quadrantectomies as opposed to wide local excisions.
68.4.2 The Importance of Post-Operative Radiotherapy
68.4.1 Importance of Margins According to detailed pathological studies on mastectomy specimens performed by Holland et al. [12], up to 60% of breast cancers are associated with satellite tumour deposits over 2 cm away from the primary lesion within the breast parenchyma. These deposits of tumour cells can be found when the specimen is examined by meticulous serial sectioning but are not always obvious in routine pathological assessment of wide local excision and mastectomy specimens. Therefore, what is reported as margin after routine histopathological assessment of lumpectomy specimens is no more than a surrogate marker of the true margin. The best surrogate marker of the true margin is probably the margin width and the probability that a complete tumour excision increases with increasing margin width. The first trials of breast conserving surgery demonstrated that quadrantectomy and radiotherapy was very effective as treatment for clinical stage-I breast cancer
A large number of randomised controlled trials have demonstrated that following breast conserving surgery, radiotherapy is necessary to reduce local recurrence rates to acceptable levels [15]. Tumours treated with breast conserving surgery are a heterogenous group and the probability of local recurrence may differ between the various subgroups. The need for radiotherapy for small, low grade, lymph node negative cancers has been questioned by a number of authors. More recently, a trial conducted by the British Association of Surgical Oncology (BASO-II trial) demonstrated that even these tumours with excellent prognosis are associated with an unacceptably high 1.7% per annum chance of local recurrence if neither radiotherapy nor tamoxifen is given as adjuvant treatment to the patients. However, this local recurrence figure is much lower than those obtained in trials of breast conserving surgery in the non-radiotherapy arm, confirming the hypothesis that this is a subgroup of patients with a
68 Breast Surgery: Current Trends and Recent Innovations
relatively low local recurrence rate even without adjuvant radiotherapy. Based on the evidence from the Milan III trial [5], which showed that quadrantectomy could achieve local recurrence rates comparable to those obtainable after wide local excision and radiotherapy, some authors have proposed that intra-operative radiotherapy may be as effective as long post-operative courses. Intraoperative radiotherapy is more convenient for patients as it avoids prolonged periods of outpatient treatment but it only irradiates breast tissue adjacent to the cavity left after resection of the primary tumour. Practical issues regarding re-excision of margins are problematic with this technique and long-term local recurrence rates are still unknown pending the definitive report of the trial.
68.4.3 Other Pathological Parameters and the Age-Factor A great number of pathological factors have been repeatedly shown to correlate with the probability of local recurrence, including nodal status, tumour grade and vascular invasion. This is not unexpected. Young age, however, is also a factor that strongly predisposes to local recurrence even if corrected for all other pathological factors [5]. The reason for this is unknown. It is also interesting that despite the clear association between young age and local recurrence after breast conserving surgery, age is not an independent prognostic factor for overall survival when corrected for tumour size, grade and nodal status.
68.5 Ductal Carcinoma in Situ (DCIS) The management of patients with DCIS is one of the most controversial topics in breast cancer surgery. Not only do these patients have the best prognosis in terms of overall survival, they also have the highest local recurrence rates when their tumours are treated with breast conserving surgery. It sounds therefore paradoxical that a tumour with survival approaching 100% should be associated with high local recurrence rates when treated with breast conserving surgery.
899
Following the histopathological studies of Holland et al. [16], it became clear that for the complete excision of DCIS, margins in excess of 10 mm are required. Further randomised clinical studies comparing lumpectomy with or without radiotherapy for DCIS, showed that without radiotherapy recurrence rates were unacceptably high, merely confirming what was expected from Holland’s studies as these clinical trials included patients with a minimum of 1 mm margin width. Radiotherapy to the ipsilateral breast approximately halves the local recurrence rate after lumpectomy for DCIS with narrow margins [17, 18]. The question of whether radiotherapy can reduce the recurrence rate after excision with wider margins has been answered more recently [19]. Although recurrence rates are greatly reduced by obtaining a minimum of 10 mm margin [20], radiotherapy still improves local recurrence rates further, when used as adjuvant treatment after wide local excision. Anti-oestrogen adjuvant treatment has also been tried after resection for DCIS. In the NSABP-B24 trial [21], the tamoxifen group had a lower local recurrence rate compared with those patients that received placebo after surgery. The trial was criticised for including patients with involved margins. Therefore, it cannot be extrapolated from this trial alone that Tamoxifen will reduce recurrence rates even if DCIS is fully excised. The UK DCIS trial [22], which also tested tamoxifen vs. placebo, showed no difference in local recurrence attributable to tamoxifen. In this trial, most patients were older than in the NSABP-B24, more patients had high grade DCIS, which is more often ER negative, and margins were pathologically assessed as negative albeit many of them narrow. These differences probably account for the discrepancy in the efficacy of tamoxifen between the two trials. Neither study showed any difference in overall survival. Although DCIS is associated with excellent prognosis, even in mastectomy series there is 2% mortality from breast cancer among patients treated for DCIS. It is unknown if this low rate of metastatic disease represents true metastases from in situ disease or whether it is due to minute areas of micro-invasive disease that has escaped even the most meticulous pathological examination. Clearly, DCIS has a very long natural history, and metastatic events are very few in the relatively short follow-up period of trials of lumpectomy, with radiotherapy vs. without radiotherapy. Just under half of DCIS recurrences are invasive breast cancers and surely not all of these will be cured with further treatment.
900
Therefore, there are bound to be some deaths caused by the recurrent invasive disease, but of course, these are likely to occur 10–20 years after the patients were randomised into a DCIS trial. As a result, a small difference in overall survival – of the order of 2–3% – between groups of patients who have high local recurrence rates and those who have very low local recurrence rate cannot be ruled out despite the fact that the current trials with 5–7 years median follow-up show equal survival.
68.6 Surgery for Breast Cancer 68.6.1 Oncoplastic Resections Traditionally, breast cancer is treated either with mastectomy or wide local excision and radiotherapy. More recently, breast surgeons developed the principle of partial mastectomy in an attempt to create very wide margins and excise larger tumours that would normally require mastectomy. The resulting deficit in volume can be corrected by displacing tissue from the remaining ipsilateral breast using the principles of breast reduction mammoplasty operations. Alternatively, the volume deficit can be replaced by placing a de-epithelialised or a myo-subcutaneous latissimus dorsi flap inside the cavity. Whether the approach is oncologically safe for very large tumours is still uncertain and this can only be proved with long-term results, which have not yet been published. However, these techniques achieve very wide margins and excellent cosmetic results for tumours that standard breast conserving techniques would produce cosmetically unsatisfactory outcomes. In addition, they usually leave a sensate breast, which is their main advantage over mastectomy and total breast reconstruction operations.
68.6.2 Reconstruction after Total Mastectomy Primary reconstruction after mastectomy has the distinct advantages of retaining the skin envelope of the breast and avoiding a second big operation. In addition, patients spend no time without a breast, which may result in quicker and better psychological recovery.
D. J. Hadjiminas
Unfortunately, there is a suggestion that postmastectomy radiotherapy can adversely affect the reconstructed breast. The evidence to support this view is derived from retrospective series and non-randomised trials, and it suggests that radiotherapy may increase the rate of capsule formation for implant-based reconstructions while it can increase the incidence of significant areas of fat necrosis and volume changes on free TRAM and DIEP flaps. Whether it is better to proceed with immediate reconstruction on an individual patient and risk her developing these complications or to wait until after all oncological treatment is over is a very difficult decision to make. Needless to say, the patients who require post-mastectomy radiotherapy have in general poor prognosis, and by deferring their reconstruction they could be condemned in not having a reconstructed breast for much of their remaining life. On the other hand, severe capsule formation can sometimes recur even after capsulotomy and therefore permanently affect the cosmetic outcome, whereas large areas of fat necrosis in TRAM or DIEP free flaps sometimes require revision with a latissimus dorsi flap. A number of breast surgeons attempt to identify those patients who are likely to require post-mastectomy radiotherapy by assessing the lymph node status of the tumour pre-operatively, usually by performing sentinel lymph node biopsy. Patients that are likely to require post-mastectomy radiotherapy can be offered a temporising subcutaneous or sub-pectoral implant, which will preserve some of the skin envelope of the breast for the definitive reconstructive procedure after the end of the patient’s oncological treatment.
68.6.3 The Management of the Axilla Despite the advent of numerous immuno-histochemical and molecular prognostic factors, lymph node status remains the single most important determinant of prognosis in breast cancer. It is for this reason that accurate axillary staging is essential, particularly since there is evidence that the benefit from any systemic treatment such as chemotherapy or hormone therapy is proportional to the risk of death from breast cancer. Full axillary clearance is the gold standard of axillary staging and has the additional benefit of treating the regional disease and preventing axillary recurrences [23]. It is associated with significant morbidity, however,
68 Breast Surgery: Current Trends and Recent Innovations
including lymphoedema, shoulder stiffness, skin hypaesthesia. While axillary dissection may be justifiable for large tumours, most of which are node positive, it does not contribute to the treatment of the majority of patients who present with small tumours, two-thirds of which have node-negative disease. For these patients, there is a clear rationale for a less invasive test that could accurately predict their axillary node status, proceeding to further treatment only in those patients who have nodepositive disease. The Edinburgh breast unit first published their pioneering work on axillary sampling of a minimum of four discrete lymph nodes identified by the surgeon in the operating room in 1985. Their sampling method had 100% sensitivity in identifying a positive axilla in their initial series of 65 patients [24]. Subsequently, they conducted two randomised controlled trials comparing axillary clearance with four-node sampling, one in patients undergoing mastectomy and a second in patients undergoing breast-conserving surgery [25, 26]. Only patients with positive sampling underwent adjuvant radiotherapy to the axilla. Long-term results from both trials show that there is no difference in survival or axillary recurrence. The percentage of node-positive patients was the same in both groups, suggesting that the sampling technique can accurately identify those patients who have involved axillary lymph nodes. Ahlgren et al. reported a similar prospective series on 552 patients in 2002 [27]. Ahlgren sampled up to five lymph nodes from the lower axilla and then performed a back up axillary clearance to assess the sensitivity of one, two, three, four and five node sampling individually. He reported 96% sensitivity for four-node sampling and 97.3% for five-node sampling. With the advent of sentinel lymph node biopsy, many authors reported near 100% sensitivity in their initial small series. However, the larger multicentre trials, which are more likely to reflect clinical practice, suggest that the sensitivity of sentinel lymph node biopsy is 92–93.5% [28, 29]. Direct comparison between sentinel lymph node using the isotope technique only and fournode sampling showed that four-node sampling may be the better technique under certain circumstances [30]. It is clear from the latter and other studies that sentinel lymph node biopsy using the isotope technique alone cannot compete with four-node sampling performed by experienced surgeons that follow the principles of the Edinburgh standard technique. It is possible, however, that when sentinel lymph node biopsy is performed
901
using the combined blue dye/isotope technique, its sensitivity is as good as that of four-node sampling [31]. It is clear that we should be moving towards some form of selective axillary approach, particularly in patients with clinically and radiologically negative axilla. The evidence is unclear as to whether sentinel node biopsy or four-node sampling is more sensitive at the present time. A competent breast surgeon should be able to perform both these procedures that require very similar skills. Fine needle aspiration cytology under ultrasound guidance is probably the best method of establishing pre-operatively the presence of involved axillary lymph nodes, in which case the surgeon should proceed to a full clearance. For patients with radiologically and cytologically negative lymph nodes, sentinel lymph node biopsy will provide one or two lymph nodes that can be analysed rapidly intra-operatively by either imprint cytology, frozen section, immunohistochemistry or PCR. If sentinel node biopsy is unsuccessful, a sample of four to five lymph nodes could serve as a back-up plan for staging purposes. The rationale of performing a full dissection in patients that were initially scheduled for sentinel node biopsy when identification fails is not sound. If sentinel node with its 92–93.5% sensitivity is deemed appropriate for these patients, then four-node sampling should be the fallback plan. Patients with a positive sentinel node or sampled nodes can be treated by either radiotherapy to the axilla or clearance. However, only those who undergo clearance will have all the prognostic information available from their axillary dissection, and therefore this is the preferred option.
68.7 Systemic Therapies in Breast Cancer 68.7.1 Hormonal Manipulation Hormonal manipulation for breast cancer was possibly the first systemic treatment employed successfully for any cancer. It was initially reported by Sir George Beatson with regards the dramatic regression of a young woman’s breast cancer after bilateral oophorectomy in 1896. A large number of hormonal agents have been tried, including anti-oestrogens, androgens, progestogens as well as oestrogens. Tamoxifen was the most
902
widely used oestrogen receptor antagonist until recently, on the basis that it was the best tolerated drug rather than significantly more efficacious. Tamoxifen has been tried in many randomised placebo-controlled studies involving thousands of women. Comprehensive meta-analysis of these trials has been published by the EBCTCG, demonstrating a 23% reduction in all-cause mortality for patients with ER-positive tumours who took tamoxifen for 5 years [32]. The proportional reduction of the risk of death appears to be the same for all subgroups of patients providing they have ER-positive disease. Consequently, high-risk groups, such as those patients with node-positive disease, benefit more than those with good prognosis. With the advent of third-generation aromase inhibitors, tamoxifen is no longer the first choice for neoadjuvant/primary hormone treatment [33, 34, 35]. Letrozole was shown to be superior to tamoxifen in a randomised trial, whereas combined analysis of randomised trials of anastrozole vs. tamoxifen have suggested greater benefit from anastrozole as well. In the neo-adjuvant setting, magnitude of response and response rate are probably the most important parameters to judge the efficacy of these drugs. Similarly, time to progression is possibly the most important factor for trials in metastatic breast cancer where again aromatase inhibitors have shown superiority to tamoxifen. These results, however, do not necessarily translate to superiority in the adjuvant setting, where clearly, the overall survival and quality of life issues are very important. In the neo-adjuvant and metastatic settings, all patients receiving the drug can potentially benefit from it; in the adjuvant setting, however, only those patients who are not already cured can possibly benefit from any medication whereas everyone taking the drug is exposed to the associated risks. There is therefore theoretically the potential for no improvement in overall survival, particularly in groups of patients with already excellent prognosis, despite a more potent effect of the aromatase inhibitors on the actual tumour cells compared to tamoxifen. The randomised trial of adjuvant anastrozole vs. tamoxifen has not shown any difference in overall survival despite a significant improvement in disease-free survival. The trial was generally conducted in a group of patients with good prognosis with few metastatic events, and therefore longer follow up will be necessary before conclusions are drawn [36]. In a randomised trial of letrozole vs. tamoxifen, overall survival was not statistically improved by letrozole but there was a significant improvement in
D. J. Hadjiminas
the number of metastatic events, which should translate into a survival advantage given the fact that metastatic breast cancer is not yet curable. A number of trials have examined the role of a “switch strategy” or continuing with an aromatase inhibitor after the patient has started or finished a 5-year course of tamoxifen. While it is still the topic of ongoing research whether tamoxifen treatment for more than 5 years can be of any benefit, continuing after 5 years with letrozole significantly improves disease-free survival [37]. In the same trial, overall survival was found to be improved in a high-risk subgroup. Switching to exemestane or anastrozole appears to significantly improve disease-free survival [38, 39, 40]. Overall survival improvement of borderline significance was reported when patients are switched to exemestane or anastrozole after 2–3 years of tamoxifen [38, 40]. Meta-analysis of trials of patients switching from tamoxifen to anastrozole also showed improved overall survival [41]. In the adjuvant setting, aromatase inhibitors have demonstrated superiority in terms of increased diseasefree survival [36]. However, it is still debatable whether overall survival is improved with aromatase inhibitors in some groups of patients with very low risk of relapse. This could be due to possible protective cardiovascular and bone density effects of tamoxifen rather than specific adverse actions of aromatase inhibitors. With longer follow up, there is widespread belief that trials of aromatase inhibitors vs. tamoxifen in the adjuvant setting will eventually demonstrate an overall survival benefit in favour of aromatase inhibitors.
68.7.2 Neo-Adjuvant Chemotherapy The principle of neo-adjuvant chemotherapy evolved on the basis that theoretically at least, a systemic treatment should be more efficacious within the breast prior to surgery. In addition, micrometastases, which are what eventually causes the patient’s demise, are treated at the earliest possible stage and therefore this treatment modality may improve survival. Administration of neo-adjuvant chemotherapy enables clinicians to assess and monitor the effect of chemotherapy on the primary tumour of the individual patient, something that is simply not possible in the adjuvant setting. Trials of neo-adjuvant chemotherapy failed to show a survival advantage [42, 43]. Surgeons appear to be
68 Breast Surgery: Current Trends and Recent Innovations
able to perform breast-conserving procedures slightly more frequently after neo-adjuvant treatment, but there is a statistically non-significant trend towards higher local recurrence rates in the largest of these published studies [42]. With neo-adjuvant chemotherapy, the clinicians assume that the patient belongs to the highest risk group without having access to the full histology of the tumour. There is the potential therefore for “over-staging” a tumour and to avoid this, some authors advocate sentinel lymph node biopsy to confirm node-positive disease prior to neo-adjuvant treatment. Although knowledge of this usually does not change the decision to administer neo-adjuvant chemotherapy, it may well influence the definitive treatment to the axilla at the time of surgery as well as the administration of adjuvant radiotherapy.
References 1. Nystrom L, Andersson I, Bjurstam N et al (2002) Long-term effects of mammography screening: updated overview of the Swedish randomised trials. Lancet 359:909–919 2. Leach MO, Boggis CR, Dixon AK et al (2005) Screening with magnetic resonance imaging and mammography of a UK population at high familial risk of breast cancer: a prospective multicentre cohort study (MARIBS). Lancet 365: 1769–1778 3. Fisher B, Anderson S, Redmond CK et al (1995) Reanalysis and results after 12 years of follow-up in a randomized clinical trial comparing total mastectomy with lumpectomy with or without irradiation in the treatment of breast cancer. NEngl J Med 333:1456–1461 4. Overgaard M, Hansen PS, Overgaard J et al (1997) Postoperative radiotherapy in high-risk premenopausal women with breast cancer who receive adjuvant chemotherapy. Danish Breast Cancer Cooperative Group 82b Trial. N Engl J Med 337:949–955 5. Veronesi U, Marubini E, Mariani L et al (2001) Radiotherapy after breast-conserving surgery in small breast carcinoma: long-term results of a randomized trial. Ann Oncol 12: 997–1003 6. Fentiman IS, Christiaens MR, Paridaens R et al (2003) Treatment of operable breast cancer in the elderly: a randomised clinical trial EORTC 10851 comparing tamoxifen alone with modified radical mastectomy. Eur J Cancer 39:309–316 7. Gazet JC, Ford HT, Coombes RC et al (1994) Prospective randomized trial of tamoxifen vs surgery in elderly patients with breast cancer. Eur J Surg Oncol 20:207–214 8. Kenny FS, Ellis IO, Elston CW et al (1997) Long term follow-up of elderly patients randomized to primary tamoxifen
903 or wedge mastectomy as initial therapy for operable breast cancer. Breast 6:244 9. Bates T, Fennessy M, Riley DL et al (2001) Breast cancer in the elderly: Surgery improves survival. The results of a Cancer Research Campaign Trial. Eur J Cancer 37:7 10. Fennessy M, Bates T, MacRae K et al (2004) Late follow-up of a randomized trial of surgery plus tamoxifen versus tamoxifen alone in women aged over 70 years with operable breast cancer. Br J Surg 91:699–704 11. Mustacchi G, Ceccherini R, Milani S et al (2003) Tamoxifen alone versus adjuvant tamoxifen for operable breast cancer of the elderly: long-term results of the phase III randomized controlled multicenter GRETA trial. Ann Oncol 14: 414–420 12. Holland R, Veling SH, Mravunac M et al (1985) Histologic multifocality of Tis, T1-2 breast carcinomas. Implications for clinical trials of breast-conserving surgery. Cancer 56:979–990 13. Veronesi U, Zucali R, Luini A (1986) Local control and survival in early breast cancer: the Milan trial. Int J Radiat Oncol Biol Phys 12:717–720 14. Mariani L, Salvadori B, Marubini E et al (1998) Ten year results of a randomised trial comparing two conservative treatment strategies for small size breast cancer. Eur J Cancer 34:1156–1162 15. Van de Steene J, Soete G, Storme G (2000) Adjuvant radiotherapy for breast cancer significantly improves overall survival: the missing link. Radiother Oncol 55:263–272 16. Holland R, Hendriks JH, Vebeek AL et al (1990) Extent, distribution, and mammographic/histological correlations of breast ductal carcinoma in situ. Lancet 335:519–522 17. Fisher B, Dignam J, Wolmark N et al (1998) Lumpectomy and radiation therapy for the treatment of intraductal breast cancer: findings from National Surgical Adjuvant Breast and Bowel Project B-17. J Clin Oncol 16:441–452 18. Julien JP, Bijker N, Fentiman IS et al (2000) Radiotherapy in breast-conserving treatment for ductal carcinoma in situ: first results of the EORTC randomised phase III trial 10853. EORTC Breast Cancer Cooperative Group and EORTC Radiotherapy Group. Lancet 355:528–533 19. Macdonald HR, Silverstein MJ, Lee LA et al (2006) Margin width as the sole determinant of local recurrence after breast conservation in patients with ductal carcinoma in situ of the breast. Am J Surg 192:420–422 20. MacDonald HR, Silverstein MJ, Mabry H et al (2005) Local control in ductal carcinoma in situ treated by excision alone: incremental benefit of larger margins. Am J Surg 190: 521–525 21. Fisher B, Dignam J, Wolmark N et al (1999) Tamoxifen in treatment of intraductal breast cancer: National Surgical Adjuvant Breast and Bowel Project B-24 randomised controlled trial. Lancet 353:1993–2000 22. Houghton J, George WD, Cuzick J et al (2003) Radiotherapy and tamoxifen in women with completely excised ductal carcinoma in situ of the breast in the UK, Australia, and New Zealand: randomised controlled trial. Lancet 362:95–102 23. Hadjiminas DJ, Burke M (1994) Intraoperative assessment of nodal status in the selection of patients with breast cancer for axillary clearance. Br J Surg 81:1615–1616 24. Steele RJ, Forrest AP, Gibson T et al (1985) The efficacy of lower axillary sampling in obtaining lymph node status in
904 breast cancer: a controlled randomized trial. Br J Surg 72: 368–369 25. Chetty U, Jack W, Prescott RJ et al (2000) Management of the axilla in operable breast cancer treated by breast conservation: a randomized clinical trial. Edinburgh Breast Unit. Br J Surg 87:163–169 26. Forrest AP, Everington D, McDonald CC et al (1995) The Edinburgh randomized trial of axillary sampling or clearance after mastectomy. Br J Surg 82:1504–1508 27. Ahlgren J, Holmberg L, Bergh J et al (2002) Five-node biopsy of the axilla: an alternative to axillary dissection of levels I-II in operable breast cancer. Eur J Surg Oncol 28: 97–102 28. Clarke D, Khonji NI, Mansel RE (2001) Sentinel node biopsy in breast cancer: ALMANAC trial. World J Surg 25: 819–822 29. Wong SL, Chao C, Edwards MJ et al (2002) Frequency of sentinel lymph node metastases in patients with favorable breast cancer histologic subtypes. Am J Surg 184:492–498; discussion 498 30. Macmillan RD, Barbera D, Hadjiminas DJ et al (2001) Sentinel node biopsy for breast cancer may have little to offer four-node-samplers. results of a prospective comparison study. Eur J Cancer 37:1076–1080 31. Agarwal T, Kakkos SK, Cunningham DA et al (2005) Sentinel node biopsy can replace four-node-sampling in staging early breast cancer. Eur J Surg Oncol 31:122–127 32. Early Breast Cancer Trialists’ Collaborative Group (2001) Tamoxifen for early breast cancer. Cochrane Database Syst Rev (1):CD000486 33. Ellis MJ, Coop A, Singh B et al (2001) Letrozole is more effective neoadjuvant endocrine therapy than tamoxifen for ErbB-1- and/or ErbB-2-positive, estrogen receptor-positive primary breast cancer: evidence from a phase III randomized trial. J Clin Oncol 19:3808–3816 34. Mouridsen H, Gershanovich M, Sun Y et al (2003) Phase III study of letrozole versus tamoxifen as first-line therapy of advanced breast cancer in postmenopausal women: analysis
D. J. Hadjiminas of survival and update of efficacy from the International Letrozole Breast Cancer Group. J Clin Oncol 21: 2101–2109 35. Smith IE (2003) Letrozole versus tamoxifen in the treatment of advanced breast cancer and as neoadjuvant therapy. J Steroid Biochem Mol Biol 86:289–293 36. Howell A, Cuzick J, Baum M et al (2005) Results of the ATAC (Arimidex, Tamoxifen, Alone or in Combination) trial after completion of 5 years’ adjuvant treatment for breast cancer. Lancet 365:60–62 37. Ingle JN, Tu D, Pater JL et al (2006) Duration of letrozole treatment and outcomes in the placebo-controlled NCIC CTG MA.17 extended adjuvant therapy trial. Breast Cancer Res Treat 99:295–300 38. Coombes RC, Kilburn LS, Snowdon CF et al (2007) Survival and safety of exemestane versus tamoxifen after 2–3 years’ tamoxifen treatment (Intergroup Exemestane Study): a randomised controlled trial. Lancet 369:559–570 39. Jakesz R, Jonat W, Gnant M et al (2005) Switching of postmenopausal women with endocrine-responsive early breast cancer to anastrozole after 2 years’ adjuvant tamoxifen: combined results of ABCSG trial 8 and ARNO 95 trial. Lancet 366:455–462 40. Kaufmann M, Jonat W, Hilfrich J et al (2007) Improved overall survival in postmenopausal women with early breast cancer after anastrozole initiated after treatment with tamoxifen compared with continued tamoxifen: the ARNO 95 Study. J Clin Oncol 25:2664–2670 41. Jonat W, Gnant M, Boccardo F et al (2006) Effectiveness of switching from adjuvant tamoxifen to anastrozole in postmenopausal women with hormone-sensitive early-stage breast cancer: a meta-analysis. Lancet Oncol 7:991–996 42. Fisher B, Bryant J, Wolmark N et al (1998) Effect of preoperative chemotherapy on the outcome of women with operable breast cancer. J Clin Oncol 16:2672–2685 43. Makris A, Powles TJ, Ashley SE et al (1998) A reduction in the requirements for mastectomy in a randomized trial of neoadjuvant chemoendocrine therapy in primary breast cancer. Ann Oncol 9:1179–1184
Thyroid Surgery: Current Trends and Recent Innovations
69
Charlie Huins and Neil Samuel Tolley
Contents 69.1
Introduction ............................................................ 905
69.2
Innovation ............................................................... 906
69.2.1 Surgical Robotics ..................................................... 906 69.3
New Surgical Techniques ....................................... 907
69.3.1 Minimally-Invasive Video-Assisted Thyroidectomy (MIVAT) ......................................... 907 69.3.2 Instrumentation: The Harmonic Scalpel .................. 907 69.4
Molecular and Biological Developments .............. 908
69.4.1 RET PROTO-Oncogene ........................................... 908 69.4.2 RAF Proteins ............................................................ 908 69.4.3 Multiple Endocrine Neoplasia.................................. 909 69.5
Imaging and Diagnostics........................................ 909
69.5.1 Ultrasound and the Thyroid ..................................... 909 69.5.2 PET CT Scanning ..................................................... 910 69.6
Training ................................................................... 910
69.7
Future Development and Research Focus ............ 911
Abstract The management of thyroid disease has evolved rapidly within the past decade. Laparoscopic, video assisted and robotic thyroidectomy has been developed in order to reduce the cosmetic impact of surgery in the neck. Apart from surgical techniques, our understanding of the molecular genetics of cancer continues to progress, and with it come new techniques for diagnosis and treatment, such as genetic testing of family members of patients with medullary thyroid carcinoma (MTC). Radiological advances are changing the way in which we manage certain head and neck conditions and continue to aid in the diagnosis and treatment of head and neck cancer. New imaging techniques have been implemented to facilitate the management of thyroid cancer by multidisciplinary teams and cancer networks. The therapeutic potential of stem cell therapy for replacement therapy is in its infancy, but the possibility of transplanting follicular cells as substitute thyroid tissue would have an enormous impact on our current management of thyroid cancer.
References ........................................................................... 911
69.1 Introduction
C. Huins () Department of Ear, Nose and Throat Surgery, St Mary’s Hospital NHS Trust, Praed Street London W2 1NY, UK e-mail: [email protected]
The management of thyroid disease has evolved rapidly within the past decade. With the development of endoscopes and hence laparoscopic techniques, minimally invasive techniques are increasingly being used, such as video-assisted and endoscopic thyroidectomy in order to reduce, principally, the cosmetic impact of surgery in the neck. As surgical technologies continue to progress and develop, robotics has entered the arena of thyroid surgery, with thyroid and parathyroid excisions, together with thoracic thyroid surgery, being performed using robotic technology via remote access sites.
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_69, © Springer-Verlag Berlin Heidelberg 2010
905
906
But, it is not just in the operating theatre that thyroid management is evolving. Our understanding of the molecular genetics of cancer continues to progress, and with it come new techniques for diagnosis and treatment, such as genetic testing of family members of patients with MTC. Radiological advances are changing the way in which we manage certain head and neck conditions and continue to aid in the diagnosis and treatment of head and neck cancer. Thyroid cancer is now being managed in a MultiDisciplinary Team (MDT) setting, bringing together specialists from all aspects of Head and Neck care, including surgeons, endocrinologists, oncologists, radiologists and histopathologists, to manage this important group of patients as a unit rather than individual specialities.
69.2 Innovation 69.2.1 Surgical Robotics Surgical robotics broadly refers to the use of robotic technology in the operating theatre. Robotics as we know it today emerged from the research conducted during the mid to late 1980s at the National Aeronautics and Space Administration (NASA), where ideas of a surgeon-controlled handpiece were developed, together with the birth of “virtual reality” [1]. This idea of “remote-control” surgery attracted the attention of the US Department of Defense, who envisaged this technology being utilized in the battlefield, to bring the surgeon to the wounded soldier through “telepresence”. A robot could conduct the potentially life-saving surgery near the front line, being controlled by the surgeon operating from a remote and safe location. The robot in the field hospital would mirror the actions of the surgeon, seated at a virtual reality console with the operating field projected via a 3D imaging system. This not only protected the surgeon from the obvious dangers of the battlefield, but potentially offered the services of all surgical specialties to the soldiers. At the same time, another team comprising an orthopaedic surgeon and a veterinary surgeon was developing a similar system to precisely core out the femur of dogs to exactly match the prosthetic stem in total hip replacement surgery, and the first robotic operation was carried out in the mid 1990s. Laparoscopic surgery was also developing rapidly around this time,
C. Huins and N. S. Tolley
and the two technologies were complementary. The first telerobotic surgery on a human – the surgeon at a remote console – occurred in 1997, when a team in Brussels successfully performed a telesurgical laparoscopic cholecystectomy [2]. The vision of the truly remote surgeon was achieved in 1993 by an Italian team who successfully performed a transatlantic telepresence liver biopsy on a pig, with the surgeon’s station being in NASA’s laboratory in California and the pig in a laboratory in Milan. More recently, a laparoscopic cholecystectomy was performed on a patient in Strasbourg by a surgeon seated 3,800 miles away in New York [3]. Since this time, robotics have progressed considerably and have been used successfully in many surgical specialties – cardiac surgery, orthopaedics, general surgery, urology and gynaecology, for procedures including gastric bypass, colectomy, laparoscopic cholecystectomy, laparoscopic radical prostatectomy, nephrectomy and bladder suspension. The application of robotic technology to Otolaryngology has been slower than other surgical specialties, in a large part due to the relative ease of open access to most structures of the head and neck through conventional approaches. Also, a number of limitations have been encountered, such as the occasional interference of the surgical arms and camera due to the small size of the operative field, lack of tactile feedback [4], exhaustive system setup and costs and a formidable learning curve [5]. Despite these, telerobotic surgery conveys the advantages of minimal access surgery such as reduced scarring and blood loss with the improvement in functional capability [6], and its use has been demonstrated experimentally in thymectomy, parotidectomy, submandibular gland excision and selective neck dissection [4]. One major advantage of surgical robotic technology is the ability of the system to filter out regular oscillations, such as resting hand tremors, which confers a huge advantage in the delicate anatomical regions, such as those found in Otolaryngology, specifically otology and neurotology. Indeed, a robot has been shown experimentally to be a true, programmable system to drill cadaveric human temporal bones as preparation for cochlear implants [7]. More recently, robotic surgery has been employed successfully in head and neck surgery. Excision of an ectopic benign retrosternal goitre – one that resides within the chest behind the sternum, and is separate from the rest of the thyroid tissue in its usual position in the neck – was first reported in 2004. A staged approach was employed
69 Thyroid Surgery: Current Trends and Recent Innovations
907
on the 72-year-old patient, excising the cervical thyroid first through a standard approach and, following full recovery from this, intrathoracic robotic excision was successfully performed employing a lateral approach via the right posterior axillary line and single left lung ventilation to remove the secondary mediastinal goitre [8]. Robotic-assisted techniques have also been employed in the anterior mediastinum for complete thymectomy in a patient with known papillary thyroid carcinoma [5], excision of ectopic mediastinal parathyroidectomy and resection of lymph node metastasis of thyroid carcinoma [9]. Further reports have shown successful excision of an intrathoracic parathyroid adenoma, a retrosternal goitre extending to the aorta [10] and a hemithyroidectomy [11]. This latter case was performed for a common patient request – to avoid a cervical incision for cosmetic reasons. An approach from the mid-axillary line was employed, and the tunnel created using CO2 insufflation – a technique taken from endoscopic surgery. Following this, the two lateral 5 mm endoscopic trocars were replaced with the robotic trocars, and a 12 mm 0° dual channel telescope provided the surgeon with 3D vision. The clavicular head of the sternocleidomastoid was partially divided for access, and a robotic harmonic scalpel (of which more in the following section) used to dissect out the lobe of the thyroid. The subsequent histological examination confirmed a benign thyroid nodule. Operating time was 4.5 h – considerably longer than a standard hemithyroidectomy, which normally takes around 84 min [12]. The arena of robotic-assisted head and neck surgery continues to evolve and, as described above, has many potential applications and benefits in many areas of Otolaryngology. As technology continues to miniaturise, we may well see the emergence of robotic otology and rhinology such as pituitary and skull base surgery – regions, which benefit from a particularly steady “hand”.
of the body that have natural cavities into which endoscopes can be passed. Common general surgical laparoscopic procedures include cholecystectomy, appendicectomy, inguinal hernia repair and staging of malignant disease [13]. The development of specific technologies such as balloon dissectors, external lift devices and ultrasonic coagulators have enhanced the potential of endoscopic surgery in virtual spaces such as the neck [14]. The first reports of endoscopic approaches to the thyroid and parathyroid appeared in 1996 [15] and 1997 in animal and cadaveric studies [14]. Clinical studies soon afterwards showed endoscopic thyroid and parathyroid surgery to be successful and safe using low-pressure CO2 insufflation via small access points in the neck [16, 17], with a 5-year series showing success rates comparable to open surgery [18]. Other techniques employ central access and external retraction without gas insufflation; 5–6-year series show good success rates with minimal complications, and an operative time comparable to that of conventional surgery with the additional benefit of improved cosmesis [19, 20]. MIVAT has even been used to successfully treat micropapillary carcinoma of the thyroid, achieving the oncological clearance of a hemithyroidectomy with ipsilateral lymph node dissection via a 3 cm incision under endoscopic vision [21]. The constant goal of a superior cosmetic outcome results in other approaches to the thyroid being attempted. Endoscopic approaches via the anterior chest wall [22, 12], axillo-breast [23] and axillary [24, 25] routes have all been described, using either gasless lifting techniques or carbon dioxide insufflation at low pressures of 4 mmHg to minimise potential complications such as hypercapnia, respiratory acidosis, subcutaneous emphysema and air embolism. Whilst approaches from the axilla are scored extremely favourably by patients from a cosmetic viewpoint, the disadvantages are the time required – double that of the conventional open approach – and the techniques’ invasiveness, which can be more uncomfortable postoperatively [12].
69.3 New Surgical Techniques 69.3.1 Minimally-Invasive Video-Assisted Thyroidectomy (MIVAT)
69.3.2 Instrumentation: The Harmonic Scalpel
Since the 1980s, endoscopic surgery has become commonplace in specialities such as general surgery, orthopaedics, thoracic surgery, urology and gynaecology – regions
The Harmonic Scalpel uses high-frequency mechanical energy to cut and coagulate tissues and vessels at the same time. First reported in thyroid surgery in 1998
908
[26], the system consists of a generator, producing a natural harmonic frequency of 55,000 Hz [27]. When the active blade of the handpiece contacts tissues the transmitted acoustic wave causes cavitational fragmentation and cutting rather than electrical or thermal coagulation, as with standard cautery. Less heat is generated than that seen with unipolar or bipolar cautery, and therefore far less thermal energy is transmitted to the surrounding structures, reducing the chance of thermal injury. The hand piece of the Harmonic Scalpel consists of a long tip and shaft, giving it the advantage of accessing tight spaces, such as the superior pole of the thyroid and its blood supply. It reliably seals the feeding vessels, being able to seal and cut vessels up to 3 mm in diameter, hence enabling smaller incisions to be used. Other hand pieces will soon be available, which have been specially designed for thyroid surgery, being a smaller size and easier shape for use in the neck. Given the numerous small vessels that are encountered during thyroidectomy, the ability to divide and seal these using the Harmonic Scalpel has been shown to significantly reduce operating times [28]. Another advantage stems from the fact that comparatively little heat is transmitted to the surrounding structures, making the Harmonic Scalpel safer to use than conventional diathermy technology.
69.4 Molecular and Biological Developments The development and application of molecular technologies over the past two decades has shed considerable light on the genetic abnormalities associated with major thyroid tumour types [29]. Approximately, 90% of thyroid tumours are derived from follicular cells, with papillary thyroid cancer (PTC) being the most common thyroid tumour. Although thyroid tumours are uncommon in childhood, PTCs represent the most common paediatric thyroid malignancy. Numerous variants of PTC are recognized including, amongst others, follicular, oncocytic, clear cell and solid, the latter comprising approximately 8% of sporadic PTCs and is relatively common in children following radiation exposure. This variant is associated with a slightly higher frequency of distant metastases and a less favourable prognosis than conventional PTC.
C. Huins and N. S. Tolley
69.4.1 RET PROTO-Oncogene Oncogenes and other proteins have been identified as playing a significant role in the development of different forms of thyroid cancer. A variety of different genetic alterations, including rearrangements and point mutations have been implicated in the development of PTC. Targets of these genetic events include RET and TRK (rearrangements) and BRAF and RAS (point mutations). Generally speaking, rearrangements have been linked with radiation exposure whilst the origin of point mutations remains unknown. All of these alterations are involved with signalling along the mitogen-activated protein kinase (MAPK) pathway, which is involved in signalling from a variety of growth factors and cell surface receptors. Mutations or rearrangements of these genes are present in approximately 70% of PTCs, and they rarely overlap in the same tumour [29]. The most notable is the RET proto-oncogene, a chimeric gene generated by the fusion of the RET tyrosine kinase domain (TK) with the terminal region of CCDC6, and has been denominated RET/PTC. The inappropriate expression of RET/PTC in thyroid follicular cells, the mislocalisation of the RET TK from the membrane to the cytoplasm, the absence of the extracellular regulatory region of RET, and the presence of certain domains in the RET-partner coding sequences, which favour the dimerisation process, account for the oncogenic activity of the chimeric RET/PTC gene [30]. The role of RET/PTC in the development of PTC has been demonstrated convincingly in transgenic mice, with prevalence of RET/PTC quoted in different studies as 0–87%, whereas RT-PCR and Southern Blot have recently been shown to be the most reliable techniques to detect RET/PTC activation, giving PTC cases positive for RET/PTC at around 20% [31]. The most common rearranged forms of RET are RET/PTC1 and RET/PTC3, with the former more common in classic PTCs, papillary microcarcinomas and the diffuse sclerosing variant than in other subtypes. RET/PTC3, on the other hand, has been associated with the solid variant, with this strong correlation being shown after the Chernobyl power plant disaster.
69.4.2 RAF Proteins The RAF proteins are serine/threonine protein kinases that play critical roles in cell proliferation, differentiation
69 Thyroid Surgery: Current Trends and Recent Innovations
909
and apoptosis by signalling the mitogen-activated protein kinase pathway (MAPK). One isoform, BRAF, is the predominant isoform in thyroid follicular cells and is the most potent activator of the MAPK pathway. BRAF mutations have been demonstrated in 40–70% of papillary carcinomas, being associated with conventional PTC and microcarcinomas, but uncommonly with the follicular variant. There is a substantial body of evidence to indicate that BRAF mutational status is a significant predictor of clinical outcome [29].
remains limited. Timing of surgery is also essential as the frequency of nodal metastases has been reported to be over 50%, and survival of patients with MTC is strongly correlated with stage [33].
69.4.3 Multiple Endocrine Neoplasia MTC comprises 10–15% of all thyroid malignancies and arises from the parafollicular or C-cells of neuroendocrine origin, which make up 1% of thyroid cells. Up to 25% of cases are heritable and are associated with multiple endocrine neoplasia (MEN2A) and MEN2B or as isolated heritable tumours in the familial medullary thyroid carcinomas (FMTC) syndrome [32]. MEN2A is the most common syndrome, being present in up to 80% of hereditary cases and is characterised by multi-focal, bilateral MTC with nearly 100% penetrance. MEN2B is also characterised by MTC with 100% penetrance. The mutations leading to these syndromes are the result of mutations in the RET protooncogene, and are inherited in an autosomal dominant fashion, and thus confer a 50% risk of transmission on children of MEN2 gene carriers. New cases of MTC should therefore undergo genetic testing to evaluate for a hereditary syndrome. MEN2A-associated tumours are usually present in early childhood or late adolescence, whereas MEN2B-associated tumours may present in infancy or early childhood. In patients at risk, identification of a mutation allows diagnosis and treatment prior to the C-cell proliferation necessary to elevate calcitonin levels. Therefore, in patients at risk for MEN2B, genetic testing is performed immediately after birth, since MTC is usually already established by that time and early total thyroidectomy is necessary to increase the chance for cure. In patients at risk for FMTC or MEN2A, screening should take place in early childhood, before age 5 or 6, to allow planning for preventative thyroidectomy. Surgery is the most effective primary treatment for MTC as the C-cells do not concentrate radioactive iodine, and the role of external beam radiation therapy
69.5 Imaging and Diagnostics 69.5.1 Ultrasound and the Thyroid Ultrasound has assumed a primary role in the management of thyroid cancer, being able to provide essential information with regard to the characteristics of nodules such as its echogenicity, margins, vascularity, size cystic vs. solid content, and the presence of calcifications. The use of ultrasound-guided fine-needle aspiration (FNA) of the nodules for cytological examination is a routine extension to the procedure, and use of ultrasound in the out-patient setting is becoming increasingly commonplace [34]. Thyroid nodules are common in adults, with a prevalence of 40–50%. In children, nodules are uncommon, with an incidence of 0.2–1.5%. Of these, 85% are benign and 15% malignant. FNA of paediatric nodules has a 90–95% accuracy of diagnosis [35]. Development of high-resolution ultrasound with doppler capability has greatly enhanced the diagnostic ability of sonography. A 7.5 MHz transducer is universal for sonographic thyroid imaging since it provides good resolution and satisfactory penetration. More superficial lesions with involvement of the strap muscles can be optimally imaged using 10–15 MHz transducers that offer greater resolution. Lower frequencies, such as 5 MHz, allow better penetration at the expense of resolution, such as with large goitres [36]. The frequencies of sound waves reflected from a moving target are altered depending upon whether the target is moving toward or away from the transducer. Colour flow doppler detects this change in frequency, and can determine the direction and velocity of blood flow. Power doppler detects the power of the shifted signal, or the peak systolic flow, and is therefore more sensitive to small amplitude flow, such as that in the tissues of the neck. Power doppler ultrasound can differentiate benign from malignant nodules with a sensitivity of 89% and a specificity of 75–82% [37]. Benign thyroid nodules demonstrate a blood flow in the perinodular area, whereas malignant nodules are more likely to have intranodular blood flow.
910
In the last decade, the development of new probes has aided the early detection of recurrent thyroid cancer. These include high-resolution ultrasound for early lymph node recurrence, the development of recombinant thyroid stimulating hormone that allows scanning and thyroglobulin (Tg) stimulation without the need for thyroid hormone withdrawal, and sensitive and reliable Tg assays to detect the earliest sign of recurrence. These have a great impact on surveillance of this group of patients, since physical examination of the post-thyroidectomy neck is seldom helpful in the early detection of recurrence, and 90% of recurrent thyroid cancer is in the neck [37].
69.5.2 PET CT Scanning Positron Emission Tomography (PET) is a molecular imaging technique, which provides images of physiological processes. By detecting a coincidence – a simultaneous detection of two photons by any two detectors in the ring of the scanner – the distribution of radiotracer can be reconstructed. PET imaging is the most sensitive way to detect even small concentrations of tracer, with typical tumours detected having approximately picomolar concentrations of PET pharmaceuticals [38]. The standard method for staging and follow-up of patients with well-differentiated thyroid neoplasms is with total body radioiodine scanning and measurement of serum thyroglobulin (Tg) levels. However, patients with an elevated Tg but with a negative radioiodine scintigraphy scan can present a diagnostic dilemma. In this setting, PET has been shown to be useful for the detection of disease, being 95% sensitive and 88% accurate for identification of disease sites [1]. In the detection of an unknown primary, which has caused lymph node metastasis, FDG-PET has a high sensitivity of 86% but a low specificity of 69% and positive predictive value of 60%. The negative predictive value is high at 90%, and can result in a therapeutic change in 25% of patients [39]. FDG-PET, however, provides only limited anatomical information. Combining this with Computed Tomography (CT) to create dual-modality FDG-PET/ CT, an accurate map of tracer activity, and hence site of potential tumour, can be achieved. Using this technology, sensitivities of 99% and positive predictive
C. Huins and N. S. Tolley
value of 98% have been achieved for the re-staging of differentiated thyroid cancer [40].
69.6 Training Current training for ENT Specialist Registrars in the UK includes exposure to all aspects of Otolaryngology, including thyroid surgery, over a 6-year period. Over this time, trainees will gain experience in working in a MDT setting. Indeed, in accordance with the British Thyroid Association guidelines on the management of thyroid cancer [41], the management of patients with thyroid cancer should be the responsibility of a specialist MDT. This Team would normally comprise a surgeon, endocrinologist and oncologist (or nuclear medicine physician) with support from pathologist, medical physicist, biochemist, radiologist and specialist nurse, all with expertise and interest in the management of thyroid cancers. The patient would be seen by one or more members of the Team in a combined clinic, within the Department of Health Cancer Waiting Target of 2 weeks from time of referral, and operated on within 62 days of referral, if appropriate. Criteria for urgent Target referral include a solitary nodule increasing in size, a family history of an endocrine tumour, unexplained hoarseness or voice changes, very young (pre-pubertal) patients or those aged 65 years and older [42]. In 2007, the British Association of Endocrine Surgeons (BAES, renamed following this report as the British Association of Endocrine and Thyroid Surgeons – BAETS) published their second audit of thyroid procedures performed countrywide over the preceding 2-year period [43]. The BAETS guidelines state that any approved training unit would normally consist of an annual operative workload of greater than 50 cases performed by the trainer, and 20 by any trainee. This is in contrast to the American Association of Endocrine Surgeons whose guidelines are 25 thyroid operations per year per trainer. The audit demonstrated that only 14% of consultants met this standard, with the mean number of operations performed per annum to be approximately 24 (not including paediatric work). The majority of surgeons were performing under ten operations per year. The Department of Health has published guidelines with regard to networks for cancer management, with regional MDTs centralising the care of this group of patients to specialist centres with all members of the
69 Thyroid Surgery: Current Trends and Recent Innovations
MDT under one roof. This will inevitably benefit the patient, being managed in such a setting by specialists with appropriate expertise, and will also benefit the specialists and trainees themselves in terms of exposure and experience. However, for the majority of specialists not in those establishments, this will inevitably impact on the exposure of trainees to head and neck cancer, since fewer positions covering such work would be available, to the detriment of their training.
69.7 Future Development and Research Focus Professor Lord Ara Darzi’s vision of the future of the National Health Service, as health minister at the Department of Health, is for centralisation of cancer management to specialist centres. Fewer but larger tertiary hospitals would be established, providing all facets of the care of thyroid cancer patients under one roof. This would not only benefit the patient, having exposure to complete MDT teams with appropriate experience, but would also enhance the opportunity for research in terms of numbers treated and hence volume of material available. Focus continues on clinical material for translational research – translating findings from laboratory studies into clinical practise. Translational research has proven to be a powerful process that drives the clinical research engine, important in all branches of medicine. Finally, the potential of stem cells in regenerative medicine has been presented in ever-increasing numbers of publications. Since thyroid follicular cells arise from endoderm, similar strategies could be employed to stimulate their differentiation from human embryonic stem cells [44]. Thyroid cancer is the most common endocrine malignancy, with approximately 1,757 new cases being diagnosed every year in the UK [45]. If cancer stem cells were confirmed to exist in certain thyroid cancers, this would have significant clinical implications for the way in which we diagnose and treat this group of patients. The therapeutic potential of stem cell therapy for replacement therapy is in its infancy, but the possibility of transplanting follicular cells as substitute thyroid tissue would have an enormous impact on our current management of thyroid cancer.
911
References 1. Satava RM (2002) Surgical robotics: the early chronicles: a personal historical perspective. Surg Laparosc Endosc Percutan Tech 12:6–16 2. Himpens J, Leman G, Cadiere GB (1998) Telesurgical laparoscopic cholecystectomy. Surg Endosc 12:1091 3. Marescaux J, Leroy J, Gagner M et al (2001) Transatlantic robot-assisted telesurgery. Nature 413:379–380 4. Haus BM, Kambham N, Le D et al (2003) Surgical robotic applications in otolaryngology. Laryngoscope 113:1139–1144 5. Savitt MA, Gao G, Furnary AP et al (2005) Application of robotic-assisted techniques to the surgical evaluation and treatment of the anterior mediastinum. Ann Thorac Surg 79:450–455 6. Gourin CG, Terris DJ (2004) Surgical robotics in otolaryngology: expanding the technology envelope. Curr Opin Otolaryngol Head Neck Surg 12:204–208 7. Federspil PA, Geisthoff UW, Henrich D et al (2003) Development of the first force-controlled robot for otoneurosurgery. Laryngoscope 113:465–471 8. Bodner J, Fish J, Lottersberger AC et al (2005) Robotic resection of an ectopic goiter in the mediastinum. Surg Laparosc Endosc Percutan Tech 15:249–251 9. Augustin F, Schmid T, Bodner J (2006) The robotic approach for mediastinal lesions. Int J Med Robot 2:262–270 10. Tanna N, Joshi AS, Glade RS et al (2006) Da Vinci robotassisted endocrine surgery: novel applications in otolaryngology. Otolaryngol Head Neck Surg 135:633–635 11. Lobe TE, Wright SK, Irish MS (2005) Novel uses of surgical robotics in head and neck surgery. J Laparoendosc Adv Surg Tech A 15:647–652 12. Ikeda Y, Takami H, Sasaki Y et al (2002) Comparative study of thyroidectomies. Endoscopic surgery versus conventional open surgery. Surg Endosc 16:1741–1745 13. Soper NJ, Brunt LM, Kerbl K (1994) Laparoscopic general surgery. N Engl J Med 330:409–419 14. Brunt LM, Jones DB, Wu JS et al (1997) Experimental development of an endoscopic approach to neck exploration and parathyroidectomy. Surgery 122:893–901 15. Gagner M (1996) Endoscopic parathyroidectomy. Br J Surg 83:87 16. Hüscher CS, Chiodini S, Napolitano C et al (1997) Endoscopic right thyroid lobectomy. Surg Endosc 11:877 17. Yeung HC, Ng WT, Kong CK (1997) Endoscopic thyroid and parathyroid surgery. Surg Endosc 11:1135 18. Henry JF, Sebag F, Tamagnini P et al (2004) Endoscopic parathyroid surgery: results of 365 Consecutive Procedures. World J Surg 28:1219–1223 19. Miccoli P, Berti P, Materazzi G et al (2004) minimally invasive video-assisted thyroidectomy: five years of experience. J Am Coll Surg 199:243–248 20. Miccoli P, Berti P, Materazzi G et al (2004) Results of videoassisted parathyroidectomy: single institution’s six-year experience. World J Surg 28:1216–1218 21. Ikeda Y, Takami H, Sasaki Y et al (2002) Minimally invasive video-assisted thyroidectomy and lymphadenectomy for micropapillary carcinoma of the thyroid. J Surg Oncol 80:218–221 22. Takami HE, Ikeda Y (2006) Minimally invasive thyroidectomy. Curr Opin Oncol 18:43–47
912 23. Choe JH, Kim SW, Chung KW et al (2007) Endoscopic thyroidectomy using a new bilateral axillo-breast approach. World J Surg 31:601–606 24. Ikeda Y, Takami H, Niimi M et al (2002) Endoscopic thyroidectomy and parathyroidectomy by the axillary approach. A preliminary report. Surg Endosc 16:92–95 25. Duncan TD, Rashid Q, Speights F et al (2007) Endoscopic transaxillary approach to the thyroid gland: our early experience. Surg Endosc 21:2166–2171 26. Voutilainen PE, Haapiainen RK, Haglund CH (1998) Ultrasonically activated shears in thyroid surgery. Am J Surg 175:491–493 27. Shemen L (2002) Thyroidectomy using the harmonic scalpel: analysis of 105 consecutive cases. Otolaryngol Head Neck Surg 127:284–288 28. Koutsoumanis K, Koutras AS, Drimousis PG et al (2007) The use of a harmonic scalpel in thyroid surgery: report of a 3-year experience. Am J Surg 193:693–696 29. Delellis RA (2006) Pathology and genetics of thyroid carcinoma. J Surg Oncol 94:662–669 30. Fusco A, Santoro M (2007) 20 years of RET/PTC in thyroid cancer: clinico-pathological correlations. Arq Bras Endocrinol Metabol 51:731–735 31. Zhu Z, Ciampi R, Nikiforova MN et al (2006) Prevalence of RET/PTC rearrangements in thyroid papillary carcinomas: effects of the detection methods and genetic heterogeneity. J Clin Endocrinol Metab 91:3603–3610 32. Fialkowski EA, Moley JF (2006) Current approaches to medullary thyroid carcinoma, sporadic and familial. J Surg Oncol 94:737–747 33. Moley JF, Fialkowski EA (2007) Evidence-based approach to the management of sporadic medullary thyroid carcinoma. World J Surg 31:946–956 34. Charous SJ (2004) An overview of office-based ultrasonography: new versions of an old technology. Otolaryngol Head Neck Surg 131:1001–1003
C. Huins and N. S. Tolley 35. Babcock DS (2006) Thyroid disease in the pediatric patient: emphasizing imaging with sonography. Pediatr Radiol 36: 299–308 36. Senchenkov A, Staren ED (2004) Ultrasound in head and neck surgery: thyroid, parathyroid, and cervical lymph nodes. Surg Clin North Am 84:973–1000 37. Baskin HJ (2004) New applications of thyroid and parathyroid ultrasound. Minerva Endocrinol 29:195–206 38. Rohren EM, Turkington TG, Coleman RE (2004) Clinical applications of PET in oncology. Radiology 231:305–332 39. Johansen J, Buus S, Loft A et al (2008) Prospective study of 18FDG-PET in the detection and management of patients with lymph node metastases to the neck from an unknown primary tumor. Results from the DAHANCA-13 study. Head Neck 30:471–478 40. Freudenberg LS, Frilling A, Kühl H et al (2007) Dualmodality FDG-PET/CT in follow-up of patients with recurrent iodine-negative differentiated thyroid cancer. Eur Radiol 17:3139–3147 41. British Thyroid Association (2007) Report of the thyroid cancer guidelines update group. In: Perros P (ed) Guidelines for the management of thyroid cancer, 2nd edn. Royal College of Physicians, London. Available at: http://www. british-thyroid-association.org/news/Docs/Thyroid_cancer_ guidelines_2007.pdf 42. National Institute for Health and Clinical Excellence (2005) Referral for suspected cancer. A clinical practice guideline. Available at: http://www.nice.org.uk:80/nicemedia/pdf/cg027niceguideline.pdf 43. British Association of Endocrine and Thyroid Surgeons (2007) Second national audit report. Dendrite Clinical Systems, UK 44. Lin RY (2007) New insights into thyroid stem cells. Thyroid 17:1019–1023 45. Cancer Research UK (2005) UK thyroid cancer statistics. Available at: http://info.cancerresearchuk.org/cancerstats/ types/thyroid
Orthopaedic Surgery: Current Trends and Recent Innovations
70
Andrew Carr and Stephen Gwilym
Contents 70.1
Introduction .......................................................... 913
70.2
Innovation Within Specialty ............................... 914
70.2.1 70.2.2 70.2.3 70.2.4
Pre-Disease: Genetic Markers ............................... Epidemiology ......................................................... Late Disease ........................................................... Delivery of Treatment ............................................
70.3
New Surgical Techniques Within Specialty ....... 916
70.4
Molecular and Biological Developments Within Specialty ................................................... 916
70.5
Imaging and Diagnostics Within Specialty ........ 918
70.6
Imaging Modalities Utilised in Early Disease Monitoring .............................................. 918
914 914 915 916
70.7
Biochemical Imaging ........................................... 919
70.8
Biomarkers in Early Osteoarthritis ................... 919
70.9
Utilising Imaging Technologies to Understanding the Link Between Pathology and Pain ................................................................ 919
70.10
Training Within Specialty ................................... 920
70.10.1 OCAP and the Improved Objective Assessment of Surgical Skills .............................. 920 70.11
Future Developments and Research Focus ........ 921
References ........................................................................... 922
A. Carr () Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Nuffield Orthopaedic Centre, Windmill Road, Oxford, OX3 7LD, UK e-mail: [email protected]
Abstract The orthopaedic surgeon is well supported in the development of surgical practice with key scientific collaborators across areas of biochemistry, genetics, bioengineering and epidemiology. This chapter presents the current trends of orthopaedic research, which may now be considered to span areas of disease prevention, early disease strategies based on tissue repair and late disease treatments based on palliation of symptoms and restoration of function. In addition to these areas, there is active interest in improving treatment delivery and surgical training/assessment.
70.1 Introduction Musculoskeletal disorders are the leading cause of disability in the western world and account for more that one-half of all chronic conditions in people over 50 years of age in developed countries. The economic impact of these conditions is also staggering: in 2004, the sum of the direct expenditures in health care costs and the indirect expenditures in lost wages was estimated to be $849 billion dollars in the United States alone. These figures, in association with projections that by the year 2030 a quarter of the adult population will experience a painful musculoskeletal condition, present a compelling argument for greater understanding and expanding research. Orthopaedic research is addressing the issues raised by the figures above. Clinical researchers are increasingly applying research methodology to improve clinical practice. A review of contemporary orthopaedic literature shows an explosion in translational research, bringing bench-side developments to bed-side medicine. The reason for this increase in recent research output is due to the wide range of interest from basic
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_ 70, © Springer-Verlag Berlin Heidelberg 2010
913
914
science research groups into problems managed by the orthopaedic surgeon. These scientific interests are now being applied to orthopaedic pathology with resultant improved patient care. Modern orthopaedics continues to build on its strengths in the fields of joint replacement, optimising fracture management and soft tissue reconstruction. However, in addition to this, there is increasing activity in areas of disease epidemiology and prevention, as well as attempting tissue repair technologies as an alternative to tissue replacement. Finally, there has been greater awareness of the importance of accurate pain assessment and a shift of outcome assessment methods, from surgeon/technical based assessments to more patient-centred tools. The orthopaedic surgeon is well supported in the development of surgical practice; key scientific collaborators cross areas of biochemistry, genetics, bioengineering and epidemiology. Orthopaedic research may now be considered to span areas of disease prevention, early disease strategies based on tissue repair and late disease treatments based on palliation of symptoms and restoration of function. In addition to these areas, there is active interest in improving treatment delivery and surgical training/assessment.
70.2 Innovation Within Specialty 70.2.1 Pre-Disease: Genetic Markers Significant progress has been made in the field of genetic susceptibility to musculoskeletal diseases such as osteoarthritis (OA) [1, 2] and Rheumatoid arthritis (RA) [3, 4] in recent years. This has largely been achieved through two investigative strategies: genome-wide linkage analysis in families with a history of OA, and candidate gene association studies in unrelated individuals. Genes that predispose to OA have been identified, the majority of which encode for regulatory rather than structural proteins. The increased risk of OA of the hip and knee to siblings of patients with the disease is well established, and this risk is also passed on to their offspring. A few studies have highlighted a familial link in not only the development of OA but perhaps more importantly, in disease progression. Having identified a gene, its function and interactions (functional genomics) may then be explored, leading to improved understanding of OA pathology on
A. Carr and S. Gwilym
the molecular level and also the identification of potential therapeutic targets. Additionally, the validation of genetic markers as prognostic biomarkers provides the opportunity to genetically screen an individual and determine their probability of developing progressive disease. Such genetic markers have been achieved for RA but are not specified as yet for OA.
70.2.2 Epidemiology Areas of current research in orthopaedic epidemiology focus on the aetiology, natural history and prevention of osteoporosis [5] and OA. Programmes of research into the epidemiology of osteoporosis (the commonest metabolic bone disorder worldwide) and OA (the commonest joint disorder worldwide) are extensive, with many multi-national cohort groups. Translational extension of these programmes are critical to drive the research, and recent significant advances include an understanding that environmental influences acting during critical periods of intrauterine and early postnatal life alter the later risk of osteoporosis [6, 7], with the demonstration that low birth size [8] and poor childhood growth predict low adult bone mineral content and risk of hip fracture and that maternal nutrition (particularly, maternal vitamin D insufficiency) is associated with reduced intrauterine and childhood bone mineral accrual in the offspring. These observations will be extended in two future directions; (1) does maternal vitamin supplementation in pregnancy improve bone mineral accrual in the offspring and (2) interactions between the early environment and adult vitamin status, as determinants of bone structure and fracture risk.
70.2.2.1 The Epidemiology of Osteoarthritis Whilst population-based epidemiological studies are able to provide incidence and prevalence information, the aetiological components of disease development have recently utilised a more focussed approach by identifying subjects at high risk of developing arthritis, by virtue of their family history. These patients have recently been used to form cohorts under longitudinal study such as the Oxford Family study (a cohort of 3,000 cases of severe joint disease (hip, knee and shoulder) who have undergone surgery, of whom 800 have a sibling who has also undergone surgery). In an
70
Orthopaedic Surgery: Current Trends and Recent Innovations
attempt to understand progression in OA, the disease is being followed in these large family studies using Magnetic Resonance Imaging (MRI), high-resolution Ultrasound, DEXA scanning and genetic data. This data is also being used to test a set of biomarkers as part of a European Union funded collaboration (TREAT-OA). Combining risk factors to search for important interactions will aim to produce a risk assessment model to predict patients at high risk for developing progressive and severe disease. In the future, this and other similar programmes may produce a clinical tool to identify subjects at high risk of developing OA. This will also enable the selection of highrisk populations and the best biomarkers of outcome for future randomised clinical trials and experimental medicine studies, which may aim to halt or reverse the disease process.
915
70.2.3.2 The Unicompartmental Knee The rationale behind the development of a prosthesis designed to replace only one of the three articulations in the knee joint was the observation that nearly a third of patients with knee arthritis have disease limited to the medial compartment. Long-term results are now available for designs such as the “medial Oxford unicompartmental knee arthroplasty” [10, 11] (Biomet, UK) and the use of these prostheses have increased with the support of research outcome measures [12]. Research in the use of these prostheses in the future will focus on the extended indications in situations such as the anterior cruciate ligament deficient knee, and replacement of the lateral side only [13, 14].
70.2.3.3 Tendon Repair
70.2.3 Late Disease As stated in the introduction to this chapter, orthopaedics has a long history of bio-engineering and implant design, with the total hip and total knee replacements having evolved the treatment of millions of patients worldwide. More recent research has investigated two variations of these well-established orthopaedic prostheses; the hip resurfacing prosthesis and the unicompartmental knee replacement.
70.2.3.1 Hip Resurfacing The idea of resurfacing, rather than replacing the hip joint was proposed in the 1950s by Sir John Charnley. The recent revival of this approach follows the application of modern bio-engineering technologies in order to circumvent the historical problems of this treatment [9]. Previously, the procedure was beset by problems associated with the load bearing surfaces, which needed to be thin, hard and resistant to fatigue. The development of cobalt-chrome alloys with improved manufacturing techniques to prevent shape and surface irregularities have resulted in impressive functional results in the young active patient. Future research will focus on modified designs to address concerns of bone/implant impingement and attempts to reduce metal ion release from the articulating surfaces and subsequent systemic distribution.
Finally, there is increasing interest in the development of biologic augmentation of tendon tears, specifically in the rotator cuff of the shoulder. Shoulder pain is both common and debilitating, with between 30 and 70% of shoulder pain reported as due to disorders of the rotator cuff [15]. The standard treatment for painful and disabling disorders of the rotator cuff is with glucocorticoid injection and, if symptoms fail to resolve or pain and weakness are severe, with surgical repair. While surgical treatment is often effective at improving pain, up to 60% of the repaired tendons re-rupture within 6 months of surgery. In these patients, there is persistent weakness and disability. The failure of surgical repair to maintain integrity is due to the chronic tendinopathic processes, which most often predispose to the tear. As rotator cuff tendons degenerate, the cell population changes, as does the density of blood vessels. Damaged tissue displays a profound loss of tenocytes that progresses with disease severity and the cause of this loss is unknown. Research has shown that reactive oxygen species, hypoxia, hyperoxia and matrix degradation are all possible causal factors [16]. Current and future research will focus on novel materials/matrices to support tissue apposition while repair takes place. Repair may be encouraged through the local release of growth factors to induce tendon regeneration and enhance integration of host functional tissue. These local factors should also prevent inflammatory responses to a level, which may cause failure of the implants.
916
The challenge remains to find a matrix material, which is strong and bio-absorbable during the normal tissue regeneration process. Over the past few years, many biologic patches have been developed from either allograft or xenograft [17]. One emerging candidate may be synthesised spider silk, which has been shown to have promising properties. Research to find a suitable matrix, which can support and promote prorepair cellular factors and a satisfactory method of attaching this matrix to the tendon/bone interface offers an exciting avenue of progress in the treatment of this common condition.
70.2.4 Delivery of Treatment Over the last decade, there has been a shift in emphasis when assessing the outcome of orthopaedic surgery. This has altered the focus from the surgeon’s perception of outcome to the patient’s perception. While this may seem a logical form of assessment, this paradigm shift necessitated the development of a new type of assessment tool; patient reported and joint specific. Scores, such as the Oxford hip, knee and shoulder scores were devised as joint-specific instruments aimed at minimising the influence of co-morbidity [18]. They underwent rigorous assessment of reliability, validity and responsiveness in prospective studies and are now in widespread use. The use of these scores aims to make longitudinal studies of disease and intervention more quantitative and as such, more comparable between institutions or implants. Research continues in the development of scores for the other joints affected by arthritis, as well as scores specific to traumatic injury.
70.3 New Surgical Techniques Within Specialty Computer-assisted orthopaedic surgery is a field of research and development, which covers a range of devices, including three-dimensional navigation systems and robotic assistance tools. Navigation systems are designed to guide the free hand cutting by the surgeon while robotic systems both navigate and cut [19]. The drive behind such developments is to improve the
A. Carr and S. Gwilym
precision and accuracy of surgical techniques, with the ultimate aim to improve outcomes. Robots in surgery were first developed in the 1980s in order to aid stereotaxic neurosurgery, with the first orthopaedic robot (ROBODOC (Integrated Surgical Systems, Davis CA) ) being developed to assist in knee replacement. Since their introduction, robotic technologies have been the subject of a near exponential research interest, with a review of medline articles addressing this area increasing 10-fold over the last 20 years. There is an increasing uptake of contemporary navigation systems, and subsequently, a wide range of processes and outcome research has focussed on these devices. Research suggests that navigation systems increase the accuracy of bone cuts, at the expense of operative time. The ultimate effect on patient outcome of these two conflicting findings has yet to be clarified and will form a focus of future work in kinematics and patient-centred outcome measures [20]. Robotic devices, such as the Acrobot (Active constraint robot, Acrobot company limited, London, UK) continue to be developed and future research will be needed to assess the validity of these developments, as well as the relative merits and pitfalls of each.
70.4 Molecular and Biological Developments Within Specialty Areas of active research interest include the physiology and pathophysiology of human tendons, the biochemistry and regenerative capacity of human cartilage and the potential augmentation of fracture healing. Degenerative and traumatic disorders of human tendons are a source of extensive musculoskeletal morbidity. Research in this area extends from the increasingly sophisticated investigation of the “normal tendon” in terms of its structure, biomechanics and biochemistry, through attempts to understand the degeneration process and on to options for both tendon regeneration and repair. Tendons primary role is the transfer of force from the contracting muscle to the target joint. As such, the tendon structure is subjected to high demand stress/ strain environments and our current understanding of tendon structure reflects these environments; tendon consists of a highly ordered hierarchy of successively
70
Orthopaedic Surgery: Current Trends and Recent Innovations
larger structural units. While this organisational structure is well established, what remains to be identified are the relative levels of gene expression in tendon for each of the contributory cellular, structural and biochemical components and the relationship these have to the biomechanical integrity of the tissue. Current research into tendon structure and function aims to look closer at both the composition and biomechanical behaviour of tendons using techniques such as: 1) Immunohistochemistry – using PCR and specific assays for GENE MARKERS (such as BMP-2 and SMAD-8) and markers of collagen I breakdown products 2) Microscopy – investigating structure, and including techniques such as confocal and fluorescence microscopy to assess 3D structure 3) Biomechanical testing to allow measurement of displacement (extension) and force (stress), stiffness, modulus and the effects of temperature. 4) Specialist imaging techniques such as Fourier Transform Infra Red imaging and Raman spectroscopy – used to investigate structural information regarding the chemical composition and orientation of tendon Parallel work on normal tendon and abnormal tendon aims to offer areas of biochemical intervention, which may halt or reverse the degeneration process, offer insights into optimal tissue repair or guide the development of tendon synthetics with similar structural properties to endogenous tissue. While animal models of OA have been useful in outlining some of the matrix and cellular changes that occur during the clinical stages of OA progression, human articular cartilage has, to date, been less informative. Most tissue samples were retrieved from patients undergoing joint replacement at the end stage of disease and while this offered insight into the late matrix and cellular changes seen in OA, there is currently a relative paucity of knowledge about the process of cartilage degeneration. What is known is that there is a gradual loss of proteoglycan, matched by a relative increase in water content. Chondrocyte apoptosis occurs and the ratio of collagen I to collagen II increases, reflecting a late reparative response of those remaining chondrocytes to matrix degeneration. Investigating the longitudinal process of cartilage degeneration in humans offers the opportunity to identify areas of potential intervention, and hence halt progression of the disease.
917
Research currently aims to circumvent the ethical considerations of removing “normal” tissue for investigation by utilising tissue obtained as a byproduct of tissue resection during procedures such as unicompartmental knee replacement and limb amputations for traumatic indications in order to establish a bank of tissue, which may represent the spectrum of phenotypes from truly non-arthritic, through non-symptomatic degenerative change, to frank OA. In parallel to this research into the process of degeneration, research continues into methods of inducing cartilage repair by the application of chondrocytes/ matrix sourced endogenously, but potentially expanded in vitro. This research is driven by virtue of the presence of localised defects of the articular cartilage of the knee and other joints. As there is almost no potential for spontaneous repair of cartilage, defects reaching the subchondral are in effect, final and are a source of pain. In an attempt to revert this process, two techniques (Autologous chondrocyte implantation and matrixinduced autologous chondrocyte implantation (ACI and MACI) –have been developed and are the subject of ongoing research [21, 22]. In the future, chondrocyte colonies and matrix scaffold “implants” may offer a method of true tissue repair (Fig. 70.1). Since the ground-breaking descriptions by Marshall R Urist in 1965, this area of research has continued to offer valuable clinical application in the study of inducing new bone formation with the aim of improved
Fig. 70.1 Intra-operative photograph of an ACI procedure, where harvested condrocytes are placed into a cartilage defect, protected by a collagen membrane. Future research will further assess this technology and compare it to emerging techniques such as MACI
918
fracture healing. This area of research spans scientific areas of biomechanics and biochemistry as the field attempts to develop the optimal combination of osteoconduction and osteoinduction [23]. Current approaches to promote bone regeneration include; Osteogenic methods (autologous or allogenic bone grafts), osteoinductive methods (BMPs and other growth factors) and osteoconductive methods (calcium-based grafts and bioactive ceramics) [23]. Bone Morphogenic Proteins (BMPs) are glycoprotein growth factors, which belong to the Transforming Growth Factor-beta (TGF-b) superfamily. In vivo BMPs are formed by a number of cells and, as with all growth factors, act as cell-to-cell messenger molecules via cell surface receptors. Extracting useful amounts of BMP from human donors is impractical, and as such, the majority of the substrate used in clinical trials is derived from recombinant gene technology. These are given the prefix “hr” (human recombinant). Many BMPs have been used in clinical trials in attempts to accelerate fracture healing, treat established non-union and treat large bone-loss defects. Currently, BMP2 and BMP7 have the most research directed towards their clinical application, suggesting BMP-induced bone formation is at least equivalent to autologous bone grafting in treating tibial non-unions and encouraging spinal fusions. Thus, the benefits are achieved without the morbidity of graft harvest [24]. The future of this area is both the refinement of the biomechanical and biochemical processes of BMP recombination and the replication of observations in animal fracture healing in the clinical context.
70.5 Imaging and Diagnostics Within Specialty Treatment strategies for OA most commonly involve removal or replacement of damaged joint tissue. Currently, relatively few treatments attempt to arrest, slow down, or reverse the disease process. This remains a goal for the future. In order to develop this line of research, it is necessary to develop methods of tracking and quantifying the progression of OA, which has led to recent developments of imaging and diagnostics within orthopaedic research, specifically with research into OA. Diagnostic techniques that sensitively detect early OA and reliably monitor its progression would identify
A. Carr and S. Gwilym
patients who may benefit from joint-preserving intervention for entry into clinical trials, with the ultimate aim of reducing the number of patients in whom arthroplasty is required. Identifying subjects who are likely to progress rapidly would be particularly useful for trial design. Radiography is currently the recommended imaging modality for assessment of disease progression. Wellestablished systems grade disease severity based on joint space narrowing, osteophyte formation, subchondral sclerosis and cyst formation. For the hip and knee, progression of disease is assessed by measurement of the minimum joint space width. With respect to early disease, these radiographic methods are flawed. They rely on an indirect assessment of the status of articular cartilage, which is prone to error. Joint space narrowing will only be apparent when disease has advanced to the erosive stage. A more sensitive method is necessary to monitor changes in early disease. Alternative imaging techniques provide the opportunity to examine the structure (morphology) of cartilage in detail, to assess other relevant structures (e.g. synovium, meniscus), and to image the function or biochemistry of cartilage prior to erosive changes.
70.6 Imaging Modalities Utilised in Early Disease Monitoring There has been increasing interest in improving the resolution and dimensional assessment of joints which are, or may be predisposed to degeneration. To address these demands, both CT and MRI technologies have been used, with and without contrast agents. These modalities have led to fascinating discoveries about both the altered morphology of diseased joints and, more recently, methods of assessing cartilage before macroscopic changes occur. MRI has the advantage over CT in that it can image all tissues within the joint and can use specific cartilage sequences to provide a detailed and quantitative assessment of cartilage morphology. Quantitative analysis of cartilage requires detailed spatial resolution (<1.5 mm slice thickness) and a scanner with sufficient field strength (at least 1.5 T). However, while quantitative MRI is more sensitive to change than standardised radiography, the detailed information provided estimates the annual loss of cartilage volume in knee OA to be around 5%, indicating that assessment of disease progression using such
70
Orthopaedic Surgery: Current Trends and Recent Innovations
techniques will be slow. Furthermore, early intervention will probably need to be prior to any significant structural cartilage loss. For these reasons, there is great interest in the application of MRI to assess change in the biochemical composition of cartilage before structural damage has occurred.
70.7 Biochemical Imaging The biochemistry of the extracellular matrix may be assessed using a number of specific MR techniques. T2 relaxation time mapping demonstrates collagen fibril orientation, quantity and molecular structure (but is also sensitive to hydration), while sodium MRI, T1rho (T1 in the rotating frame) and dGEMRIC (delayed Gadolinium Enhanced MRI of Cartilage) assess the GAGs content. dGEMRIC involves the intravenous administration of the routinely used ionic contrast agent Gd(DTPA)2−, which diffuses into articular cartilage through the subchondral bone. The distribution of Gd(DTPA)2− inversely reflects the concentration of negatively charged GAGs within the cartilage matrix. Recent research has investigated the potential of dGEMRIC to diagnose early OA prior to structural deterioration, which is obviously of great interest for the assessment of treatments in early disease. This area of diagnostic research will undoubtedly expand as both our knowledge of cartilage pathophysiology increases and the use of more powerful MRI magnets becomes routine in clinical practice as opposed to being purely research tools.
70.8 Biomarkers in Early Osteoarthritis A biomarker is “a structural or physical measure of cellular, molecular, or genetic change in a biologic process that can be identified and monitored, with resulting diagnostic or prognostic utility”. Validating a biomarker as a surrogate outcome measure would be extremely useful for the evaluation of new treatments, and is of particular interest in early disease, because of the potential to provide instant information about cartilage metabolism. There are a number of requirements for the validation of a biomarker. The biology of the marker must be understood in terms of its origin, tissue specificity, spatial tissue distribution (e.g. zone of cartilage),
919
metabolism, release from the tissue and clearance. Its assay should measure serum or urine levels (as synovial fluid is less practical), be robust, reproducible, readily available, convenient and cheap. Finally, the biomarker must be statistically robust as assessed by its sensitivity, specificity, positive predictive value and likelihood ratios, and relative risks or odds ratios. For a list of researched biomarkers, the reader is referred to reviews by Garnero and Lohmander. OA affects the metabolism of bone, cartilage and synovium, thus potential candidates include the matrix components, their propeptides, which are cleaved off during the conversion to the functioning protein (providing a marker of synthesis), their breakdown products (degradation markers), and cytokines and proteases of these tissues.
70.9 Utilising Imaging Technologies to Understanding the Link Between Pathology and Pain The majority of patients who present to orthopaedic surgeons do so with pain and an associated loss of function. It is the perception of pain, which troubles the patient and is the principle reason for seeking redress [25]. Musculoskeletal pain in general represents the primary cause of chronic pain worldwide. As far back as1952, Kellgren and Lawrence quantified the poor relationship between disease (radiographical OA) and pain in a cohort of coal miners. Only 24% of those with radiological OA of the knee had pain and 8% of “normal” knees were painful. This finding of a poor correlation between radiologically determined OA and pain has subsequently been highlighted by a number of other authors. The reason for this poor correlation is multi-factorial, involving the sensitivity of radiographs to quantify the disease, the heterogeneity of the disease process and an individual’s interpretation and behaviour towards a potentially painful stimulus. This heterogeneity of pain response has historically been perceived by clinicians as a nuisance and difficult to assess and quantify. What is clear is that the amount of “disease” a patient has does not relate in a linear manner with the amount of pain they feel. These situations draw into question the concept of illness-disease syllogism, on which much of orthopaedic practice is based. This concept was described in the 1700s by Sydenham who described illness in terms of
920
Fig. 70.2 Multiple sites of potential pain modulation from periphery to perception
symptoms and signs, which in turn are symbolic of an underlying pathoanatomical disorder, the disease. This leads to the view that disease causes illness and pain, and surgically removing disease or tissue will therefore remove illness and pain. Research is now refining this view. Current and future research will explore the multiple sites of potential pain modulation from periphery to perception (Fig. 70.2). This research will involve collaborations with the social sciences and research groups specialising in pain research. Exciting new methods of exploring pain are now being increasingly utilised, including imaging technologies such as Functional MRI of the brain [25]. This research will lead to clinicians making a more informed enquiry into patient’s pain, its facets and the sites, which may be amenable to intervention. From this assessment, a biomedical, psychosocial or combined treatment regime may offer the greatest chance of success whilst minimising potential morbidity and mortality.
70.10 Training Within Specialty 70.10.1 OCAP and the Improved Objective Assessment of Surgical Skills The Orthopaedic Competence Assessment Project (OCAP), a combined initiative of the Education Committee of the British Orthopaedic Association and the
A. Carr and S. Gwilym
Specialist Advisory Committee (SAC) in Trauma & Orthopaedics was developed with the aim of adding objective assessment to the perceived subjective nature of trainee monitoring [26]. It was given the remit to improve training through a competency-based portfolio of coaching and assessment tools. It was the first speciality wide project of its sort and since its inception has provided a blue-print for other surgical specialities to develop similar curriculae. In addition to the development of a knowledgebased syllabus, OCAP has been strongly focused on the development of structured skills training and objective surgical skills assessment. The complex relationship between formalising the process of teaching surgical skills and the inevitable development of methods to assess skill acquisition is an ongoing process. Training programmes no longer lend themselves so easily to the process of passing down skills by apprenticeship and “assessment” by a so-called master surgeon who would delay or restrict practice until perceived competency had been reached. Surgical practice is a combination of cognitive (knowledge)/dexterity/personality. One active area of research is the assessment of dexterity by validated skills models. This process is not new, with Apley having arranged a craft course for internal fixation of bones in 1978, and the mandatory attendance of a Basic surgical skills course before surgical college membership exams may be sat. Recent orthopaedic research in skills training and assessment has focussed on the acquisition and retention of arthroscopic skills [27]. Arthroscopy has become an irreplaceable diagnostic and interventional tool in orthopaedics and its breadth of use is continuing to increase. The need for specific arthroscopic training for orthopaedic trainees has been identified for some years. Complications during arthroscopy are more common from junior trainees at the early part of their learning curve, often secondary to problems with adequate joint visualisation and triangulation. Studies into arthroscopic training have used psychomotor assessments, dry and wet box trainers and various virtual reality simulators. Box trainers provide a simple, cost-effective and accessible form of training, which has been adopted for many arthroscopic skills training courses internationally. They allow for development of visualisation, triangulation and tactile feedback skills, however they are less adequate for positioning and they require supervision and guidance
70
Orthopaedic Surgery: Current Trends and Recent Innovations
921
An arthroscopic skills simulator is being used to both teach, and assess surgeon’s technical skills (Fig. 70.3)
70.11 Future Developments and Research Focus In the future, Orthopaedic research will focus on six key areas:
Fig. 70.3 An arthroscopic skills simulator being used to both teach, and assess surgeon’s technical skills
by a senior surgeon. Virtual Reality simulators have been developed for a variety of arthroscopic procedures. Some have been validated as effective training tools, provide good anatomic reproduction and can teach visualisation, triangulation and positioning. Haptic feedback systems continue to evolve to provide a degree of tactile feedback. One area of promise in this field lies in motion analysis to assess surgical skill in terms of precision and economy of movement. Recent studies aim to assess the construct validity of motion analysis as a valid means of assessment of the psychomotor skills necessary for arthroscopic surgery, and future research will investigate the impact of simulated training on operative performance in theatre [27].
• Prevention of disease; incorporating genetic, anatomical and environmental influences/manipulation • Repair of biological tissues; utilising autograft or biologically active allograft • Analgesia; through a more thorough understanding of the causes and variation seen in musculoskeletal pain • Replacement; new designs to improve kinematics, assessment of robotic surgical aids and attempts to improve implant longevity • Outcome assessment; ensuring sensitive, specific outcome tools are applied to interventions used in treating musculoskeletal disease • Assessment of surgical training and competency This increased focus on translational research will increase the role of Orthopaedic clinical scientists both during the development and application of these research areas. Key themes in orthopaedic research are presented in Fig. 70.4.
Research Themes
Predisease
Epidemiology/Genetics of disease Disease prevenion
Early disease
Understanding pain Tissue regeneration and Biological Therapy Prophylactic surgery
Late disease
Fig. 70.4 Key themes in orthopaedic research
Delivery Of treatment
Patient Cohorts Histopathology Genetics
Joint replacement Development and Evaluation Imaging/ Bioengineering Surgical Technology Skill
922
References 1. Kraus VB, Jordan JM, Doherty M et al (2007) The Genetics of Generalized Osteoarthritis (GOGO) study: study design and evaluation of osteoarthritis phenotypes. Osteoarthritis Cartilage 15:120–127 2. Loughlin J (2003) Genetics of osteoarthritis and potential for drug development. Curr Opin Pharmacol 3:295–299 3. Deighton C, Criswell LA (2006) Recent advances in the genetics of rheumatoid arthritis. Curr Rheumatol Rep 8: 394–400 4. Turesson C, Matteson EL (2006) Genetics of rheumatoid arthritis. Mayo Clin Proc 81:94–101 5. Dennison E, Mohamed MA, Cooper C (2006) Epidemiology of osteoporosis. Rheum Dis Clin North Am 32:617–629 6. Cooper C, Walker-Bone K, Arden N et al (2000) Novel insights into the pathogenesis of osteoporosis: the role of intrauterine programming. Rheumatology (Oxford) 39:1312–1315 7. Cooper C, Javaid MK, Taylor P et al (2002) The fetal origins of osteoporotic fracture. Calcif Tissue Int 70:391–394 8. Javaid MK, Arden N, Cooper C (2004) Association of birth weight with osteoporosis and osteoarthritis in adult twins. Rheumatology (Oxford) 43:401 9. Schmalzried TP (2007) Why total hip resurfacing. J Arthroplasty 22:57–60 10. Goodfellow J, O’Connor J, Murray DW (2002) The Oxford meniscal unicompartmental knee. J Knee Surg 15:240–246 11. Price AJ, Waite JC, Svard U (2005) Long-term clinical results of the medial Oxford unicompartmental knee arthroplasty. Clin Orthop Relat Res 435:171–180 12. Murray DW (2007) Mobile bearing unicompartmental knee replacement. Orthopedics 30:768–769 13. Pandit H, Van Duren BH, Gallagher JA et al (2008) Combined anterior cruciate reconstruction and Oxford unicompartmental knee arthroplasty: In vivo kinematics. Knee 15:101–106 14. Price AJ, O’Connor JJ, Murray DW et al (2007) A history of Oxford unicompartmental knee arthroplasty. Orthopedics 30:7–10
A. Carr and S. Gwilym 15. Mitchell C, Adebajo A, Hay E et al (2005) Shoulder pain: diagnosis and management in primary care. BMJ 331: 1124–1128 16. Matthews TJ, Hand GC, Rees JL et al (2006) Pathology of the torn rotator cuff tendon. Reduction in potential for repair as tear size increases. J Bone Joint Surg Br 88:489–495 17. Coons DA, Alan Barber F (2006) Tendon graft substitutesrotator cuff patches. Sports Med Arthrosc 14:185–190 18. Murray DW, Fitzpatrick R, Rogers K et al (2007) The use of the Oxford hip and knee scores. J Bone Joint Surg Br 89: 1010–1014 19. DiGioia AM III (2007) C.T. Brighton/ABJS Workshop on Computer-assisted Orthopaedic Surgery. Clin Orthop Relat Res 463:2 20. Bargar WL (2007) Robots in orthopaedic surgery: past, present, and future. Clin Orthop Relat Res 463:31–36 21. Bartlett W, Skinner JA, Gooding CR et al (2005) Autologous chondrocyte implantation versus matrix-induced autologous chondrocyte implantation for osteochondral defects of the knee: a prospective, randomised study. J Bone Joint Surg Br 87:640–645 22. Gooding CR, Bartlett W, Bentley G et al (2006) A prospective, randomised study comparing two techniques of autologous chondrocyte implantation for osteochondral defects in the knee: Periosteum covered versus type I/III collagen covered. Knee 13:203–210 23. Vaibhav B, Nilesh P, Vikram S et al (2007) Bone morphogenic protein and its application in trauma cases: a current concept update. Injury 38:1227–1235 24. Gautschi OP, Frey SP, Zellweger R (2007) Bone morphogenetic proteins in clinical applications. ANZ J Surg 77: 626–631 25. Gwilym SE, Pollard TC, Carr AJ (2008) Understanding pain in osteoarthritis. J Bone Joint Surg Br 90:280–287 26. Pitts D, Rowley DI, Sher JL (2005) Assessment of performance in orthopaedic training. J Bone Joint Surg Br 87: 1187–1191 27. Howells NR, Brinsden MD, Gill RS et al (2008) Motion analysis: a validated method for showing skill levels in arthroscopy. Arthroscopy 24:335–342
71
Plastic, Reconstructive and Aesthetic Surgery: Current Trends and Recent Innovations Marios Nicolaou, Matthew D. Gardiner, and Jagdeep Nanchahal
Contents 71.1
Innovations in Plastic and Reconstructive Surgery .................................. 923
71.2
New Surgical Techniques Within the Specialty ............................................... 924
71.2.1 71.2.2 71.2.3 71.2.4 71.2.5 71.2.6 71.2.7 71.2.8
Reconstructive Techniques ....................................... Wound Management ................................................ Composite Tissue Allotransplantation ..................... Minimally Invasive Surgery ..................................... Aesthetic Surgery ..................................................... Obesity Surgery ........................................................ The Multidisciplinary Team ..................................... Cleft Lip and Palate Service .....................................
71.3
Molecular and Biological Developments Within the Specialty ............................................... 930
924 926 927 928 928 929 930 930
71.3.1 Wound Repair and Scar-Free Healing ...................... 930 71.3.2 Tissue Engineering and Regenerative Medicine ...... 932 71.3.3 Gene Therapy ........................................................... 935 71.4
Imaging and Diagnostics Within the Specialty ............................................... 936
71.4.1 Imaging .................................................................... 936 71.4.2 Free Flap Monitoring ............................................... 937 71.4.3 Sentinel Node Biopsy for Malignant Melanoma ..... 937 71.5
Future Developments and Research Focus .......... 938
References ........................................................................... 939
M. Nicolaou () Imperial College London, Queen Elizabeth the Queen Mother (QEQM) Building, Imperial College Healthcare NHS Trust, St Mary’s Hospital Campus, Praed Street, London W2 1NY, UK e-mail: [email protected]
Abstract The principles and surgical techniques on which plastic surgery is based have evolved over hundreds of years. However, modern day plastic and reconstructive surgery emerged as a specialty during the two World Wars. As a surgical specialty, it is seldom limited by anatomy or disease and so it remains one of the few interface specialties where surgeons work closely with colleagues across a wide range of fields. Plastic surgery continues to benefit from technical advancement. However, biomedical research is likely to yield the next breakthroughs in the treatment of plastic surgery patients.
71.1 Innovations in Plastic and Reconstructive Surgery Although plastic and reconstructive surgery is a relatively modern entity, many of the surgical techniques in everyday use have been developed over many years. Innovation has been driven by the need to restore form and function following injury and disease. For instance, in recent history, the two World Wars led to major advances in facial reconstructive surgery and the management of burn wounds. Since then, the pace of innovation has accelerated. Microvascular surgery has enabled free tissue transfer; modern burn surgery enables some patients to survive 100% burns, and biomedical science has made scar-free wound healing a real possibility. Plastic surgery is seldom limited by anatomy or disease and so it remains one of the few interface specialties where surgeons work closely with other specialties across a wide range of fields, including orthopaedics, trauma, breast, head and neck cancer and gynaecology. This multidisciplinary approach, alongside technical innovations, has greatly improved
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_71, © Springer-Verlag Berlin Heidelberg 2010
923
924
patient outcomes. Finally, the rapid growth of aesthetic surgery cannot be ignored as it has become the public face of plastic surgery. This chapter aims to give an overview of recent innovations in plastic surgery and an insight into how current research is likely to translate into future advances in patient care.
71.2 New Surgical Techniques Within the Specialty 71.2.1 Reconstructive Techniques 71.2.1.1 Microvascular Surgery The advent of microvascular surgery in the 1950s was an important landmark for plastic surgery as it enabled free tissue transfer (Fig. 71.1). It was made possible by advances in microscope technology, suture material and instrumentation. Further technical innovations and improved understanding of microvascular anatomy led to increasingly sophisticated application of the technique.
M. Nicolaou et al.
However, the onus is now on reducing flap failure rates and minimising donor site morbidity as well as exploiting modern biomedical advances, such as tissue engineering, to develop or improve flaps in the laboratory. Technological advances have included stereoscopic microscopes that can be head mounted (Varioscope®), and anastomotic devices (couplers), which aim to reduce the anastomosis time while improving the patency rates. Venous anastomotic devices now have similar patency rates to hand-sewn anastomoses. However, arterial couplers are less successful due to calcification at the site of anastomosis. Another approach has been to combine fibrin glue with conventional suturing. It shortens the anastomotic repair time by reducing the number of sutures placed, while having an equal patency rate to conventional techniques [3]. These advances have lowered the limit of what can be achieved with microsurgery. Koshima has introduced the principle of supermicrosurgery by anastomosing vessels of less than 0.5 mm in diameter [4]. A major advance in microsurgery was the development of perforator flaps in the late 1980s. These flaps are supplied by blood vessels that arise from named, axial vessels and perforate through or around overlying
Fig. 71.1 Timeline outlining key landmarks in the history of microvascular surgery. Adapted from [1, 2]
71 Plastic, Reconstructive and Aesthetic Surgery: Current Trends and Recent Innovations
muscles via septa to vascularise the overlying skin and fat. During flap harvesting, these vessels are meticulously dissected from the surrounding muscle, which is left behind. Perforator flaps are now routinely used for breast reconstruction. These include flaps based on the deep inferior epigastric perforator artery (DIEP), superficial inferior epigastric perforator (SIEP) and gluteal artery perforator (GAP). The use of perforator flaps has resulted in reduced donor site morbidity (e.g. abdominal wall weakness) and postoperative pain. However, they are technically demanding, with a steep learning curve of 50–100 procedures [5]. Reconstructive surgery is often limited by the type and quality of tissue available. Engineering composite tissues “to order” is still experimental. However, flap prefabrication is a technique that bridges the gap between conventional surgery and tissue engineering [6]. Different tissues such as bone, cartilage, muscle or skin are combined to create a vascularised composite that can be designed to fit any defect. An example of this is burying a piece of cartilage beneath the forearm skin and later harvesting it as a free flap to reconstruct a nasal defect. Creating a vascularised graft remains one of the key challenges both for flap prefabrication and tissue engineered flaps. This is discussed in more detail in the Sect. 71.3.2.5.
925
Fig. 71.2 Unidirectional tissue expanders [Images courtesy of Mr Marc Swan]
[7]. This provided uniform expansion in a more controlled manner and proved to be invaluable in orbital reconstruction. More recently, a self-inflating tissue expander has been developed, with controlled expansion in a single plane (Fig. 71.2). Potential applications of this technology include cleft palate repair and nasal tip reconstruction [8].
71.2.1.3 Tissue Distraction 71.2.1.2 Tissue Expansion Tissue expansion is a mechanical process, which increases the surface area of soft tissue overlying an artificially expanding device. Most expanders are silicone balloons that are inserted under the skin and inflated via a filling port, usually situated away from the main device. They are most effective when used in an area where the soft tissue overlies a solid buttress (e.g. chest wall, skull). As a consequence, they are commonly used in breast reconstructive and head surgery. Tissue expansion can also be used before or after the transfer of a free flap to increase its size. If done before transfer, it can make the flap more robust as tissue expansion promotes vascular proliferation. In 1982, Austed proposed a self-inflating osmotic tissue expander consisting of a silicone membrane containing a hyperosmolar solution of sodium chloride. It had limited application due to the uncontrolled rate of expansion. In 1993, Wiese developed an osmotically active expander based on a hygroscopic polymer
In 1905, Codivilla reported lengthening of a femur by axial forces, but it was not until the 1950s that distraction osteogenesis became a reality when Ilizarov developed the technique of treating traumatic limb injuries using circular external fixation frames connected with rods. When a patient accidentally lengthened the rods, Ilizarov noted that callus had formed in the gap. The principle of gradual distraction of living tissues causing stresses that stimulate and maintain active regeneration of certain structures came to be known as the tensionstress effect. Ilizarov also noted that osteotomised bone segments could be moved axially in a soft tissue envelope to close a bone gap, leaving a trail of newly generated bone in its path (bone transport) [9]. The stretching effect, in addition to allowing bony lengthening, causes an expansion of the surrounding soft tissues (distraction histogenesis). In plastic surgery, this technique is mainly used in craniofacial reconstruction, such as for mandibular or maxillary lengthening and widening or symmetisation of hemifacial microsomia [10]. It has
926
also been used successfully to treat congenital hand deformities that result in bone shortening, such as symbrachydactyly and hypoplastic thumb [11], as well as radial longitudinal deficiency [12]. Recently, a distraction device (Digit Widget®) has been described to treat interphalangeal joint contractures by applying a continuous torque on the contracted tissue [13].
71.2.2 Wound Management 71.2.2.1 Wound Debridement Debridement of contaminated or necrotic tissue is essential for successful wound healing. Techniques commonly used include surgical debridement, autolytic dressings and larval therapy. Surgical debridement of large wounds is a particular challenge. The procedure may be complicated by significant blood loss, and in debilitated patients, multiple procedures under general anaesthesia are suboptimal. Larvae therapy has found a place for selective debridement of relatively small, non-healing wounds. Enzymatic debridement and hydrotherapy are two other techniques that have great clinical potential, especially in the management of large burn wounds.
M. Nicolaou et al.
Enzymes used in wound debridement include collagenases, papain, sutilains and fibrinolysin. All are sensitive to changes in pH, temperature and other wound bed conditions. Therefore, their efficacy is dependent on the wound environment in which they are used. The standard products are applied topically and changed on a regular basis over a number of weeks. They are not widely used due to the long treatment period and equivocal results. More recently, continuous streaming of protolytic enzymes across a wound bed has been described in a porcine model. This reduced the treatment time for small wounds to less than a day. VERSAJET® is a hydrotherapy device that has been successfully used to manage sub-acute and chronic wounds (Fig. 71.3). The system includes a source of sterile saline, a console for pressurisation of the saline, a hand piece and a collection system. Pressurised saline flows to the hand piece, where it is then forced through a tiny nozzle across the operating window into the collecting system. The high pressure saline stream (up to 15,000 psi) passes parallel to the wound and creates a Venturi effect. This cuts and aspirates nonviable tissue and debris. The pressure and angle at which the device is used controls the relative strength of debridement and aspiration. Benefits of the device include quick and easy use, control of waste without vaporisation and improved ability to debride contoured or
Fig. 71.3 (a) Cross-sectional diagram of the VERSAJET® hand piece. (b) A diagram of how VERSAJET works on a wound and (c, d) in action on a leg and hand wounds [Images courtesy of Smith & Nephew]
71 Plastic, Reconstructive and Aesthetic Surgery: Current Trends and Recent Innovations
irregularly shaped tissue. One of the drawbacks is the cost of the disposable hand piece, although this may be offset by the reduction in healthcare costs associated with faster wound healing.
71.2.2.2 Negative Pressure Therapy Negative pressure therapy is the application of controlled sub-atmospheric pressure to a wound. An opencell foam dressing is applied to the wound, which is then sealed with an occlusive dressing and attached to a pump via a drain tube. The exact mechanism by which sub-atmospheric pressure improves wound healing remains unclear. It may be through the removal of tissue fluid and reduction in tissue oedema, although wounds with minimal exudate also benefit. The cyclical negative pressure may also promote tissue growth through release of molecular mediators, alterations in gene expression, increase in mitotic rate of stretched cells and angiogenesis [14]. Negative pressure therapy is now routinely used in a diverse group of acute and chronic wounds and as a bolster dressing for split thickness skin grafts [15]. It may also have a role in reducing burn wound progression. The indications for use have to be carefully considered as conventional wound management with the application of standard dressings may be equally efficacious and more cost effective.
71.2.3 Composite Tissue Allotransplantation Composite tissue allotransplantation (CTA) is the transplantation of a cadaveric graft made up of multiple tissues. It has developed over the last 45 years out of the necessity for new sources of tissue for reconstructive surgery [16]. The major barrier to allotransplantation has been immunosuppression. The first hand transplant was performed in 1963 but, unfortunately, was rejected 3 weeks later. Further allotransplantation was not attempted again until the early 1990s, by which time immunosuppressive therapy had greatly improved. More recently, the prospect of facial allotransplantation has re-ignited the debate surrounding this form of reconstructive surgery [17].
927
71.2.3.1 Hand Allotransplantation In 1998, the first hand transplant of the modern era took place. The immunosuppressive regime (tacrolimus/ mycophenolate mofetil/corticosteroid) was based on the protocols used for solid organ transplantation. Since then, over twenty hand transplants have been performed and have continued to add to the lively debate surrounding allotransplantation. Although the majority of transplants have survived, patients have experienced episodes of acute rejection, immunological complications (e.g. infection and malignancies) and nonimmunological complications such as poor hand function [18].
71.2.3.2 Facial Allotransplantation Facial disfigurement associated with loss of form and function is a common surgical problem. Standard reconstructive techniques are successful in reconstructing different units of the face but still fail to give an adequate aesthetic or functional outcome. In 2005, the first partial facial transplant was performed in France. This has been followed by a number of more extensive facial transplants at centers in France and the United States. The public and medical communities have reacted positively, and it is now only a matter of time before full facial allotransplantation becomes more widely available. A Royal College of Surgeons of England Working Party has laid out the minimal requirements for teams undertaking this surgery in the United Kingdom [19]. The requisite surgical techniques are well established in animal models of facial tissue transplantation and there have been two successful replants of face and scalp tissue in humans. Studies of facial vascular anatomy suggest that bilateral anastomosis of the facial artery and vein would be sufficient to perfuse the entire flap, although additional veins would make it more robust. The final appearance of the transplant is difficult to predict. Cadaveric studies and computer modelling suggest that the face will look like neither the donor nor the recipient but somewhere in between [20]. However, donor–recipient matching with regards to factors such as age, colour and sex is important. The improvement in immunosuppressive therapy has been the key to the progression of allotransplantation. Allografts raise an immune response in the host mediated by cells and antibodies. Skin is one of the most immunogenic tissues in the body and so presents
928
a particular challenge. The recipient will need lifelong immunosuppressive therapy and will have to cope with the associated risks. Fortunately, monitoring for rejection is easy as the tissue is visible. Episodes of acute rejection are inevitable and can be treated with increases in therapy. The risk of chronic rejection has been estimated to be in the region of 30–50% over 5 years, and presents a greater challenge. The ability to induce and maintain transplant tolerance has been achieved in some animal models and remains the holy grail of transplant immunology. As with all reconstructive procedures, the surgeon must have a rescue strategy. A repeat allograft would be the ideal replacement. Other options include skin substitutes as either a temporising measure before repeat allograft or permanent resurfacing, or more standard reconstructive techniques, such as skin grafting and free tissue transfer. Psychological profiling of the patient is crucial to the long-term success of transplant surgery and will be even more pertinent to facial transplantation. The face is central to a person’s identity and contributes considerably to communication and social interaction. Although psychological stress is not predicted by level of disfigurement, there will still be significant psychological morbidity. Suitable candidates will have to demonstrate motivation, appropriate behavioural characteristics, good cognitive function and have a social and family support network. Strategies have been developed for both the postoperative course in hospital and the longer-term rehabilitation at home. The well-being of the donor family will also need to be addressed. Consent for donation of the face is likely to go beyond that which is legally required and the family will have to be more involved. The ethical debate surrounding facial transplantation has centred on the risk–benefit ratio for the patient in addition to the concerns over identity. It is argued that the procedure and its aftermath carry unjustifiably high risks without reciprocal benefit. Informed consent for facial transplantation is difficult and can only be based on previous experience of transplantation and research. Legally, it has to be fully informed, noncoerced and the patient must be competent. Also, the risk of the surgery should be acceptable to the surgeon who has a duty to protect his patients as well as respect their autonomy. A dependent relationship may develop between the patient and the surgeon as a patient with severe facial disfigurement may be keen to proceed, “what ever the risk”. Ethical approval for recipient selection has been granted in the UK.
M. Nicolaou et al.
71.2.4 Minimally Invasive Surgery Minimally invasive surgery (MIS) involves the use of small skin incisions and specialist equipment to access the operative field. The benefits include reduced operative morbidity, minimal scarring and a shortened recovery time. It was first developed for gynaecological surgery during the 1970s, and since then, most surgical specialties have adopted the techniques involved. The main tools of MIS are endoscopes. Plastic surgeons have been slow to adopt them as relatively little work is performed in areas where there is sufficient space for instrumentation. However, the technique is now used commonly for carpal tunnel decompression and brow lifts. Table 71.1 outlines MIS applications in plastic surgery [21].
71.2.5 Aesthetic Surgery 71.2.5.1 Face and Scalp Endoscopic brow lift has been a very successful application of MIS. It greatly reduces the length of incisions required, improves visualisation of forehead nerves and is associated with less postoperative oedema and discomfort. Once the brow has been lifted, it is then held in place by resorbable plates or screws, the Endotine device or by using the cortical-tunnel technique. Table 71.1 MIS applications in plastic surgery Reconstructive surgery Insertion of tissue expanders Flap harvest Retrieval of veins, tendons and nerve grafts Congenital surgery Torticollis correction Trauma Fixation of facial fractures Cosmetic surgery Brow lift Breast surgery Other Carpal tunnel decompression
71 Plastic, Reconstructive and Aesthetic Surgery: Current Trends and Recent Innovations
Variations of the endoscopic brow lift can also be used to address mid-face problems. The soft tissue envelop of the face changes with aging and a number of medical conditions (e.g. antiretroviral associated lipodystrophy syndrome). In the case of lipodystrophy syndrome, there is loss of submalar soft tissue, a deepening of the nasolabial fold and an exaggerated prominence of the zygoma. There are a number of approaches to restoration of the soft tissue contour. Autologous (Coleman) fat transfers in nonimmunosupressed patients have good short-term results, although the 1-year graft survival is 40–60% and so repeat injections are often required. Rhytidectomy ameliorates the disfigurement due to loss of facial volume by tightening the skin envelope. In extreme cases, the use of rotational flaps or free flaps has been described in an attempt to contour/fill large defects. Finally, a variety of soft tissue fillers have also been described. Temporary fillers, such as collagen or those based on hyaluronic acid, do improve the facial contour, but their effects are usually short lived. Semipermanent fillers, such as polyalkylimide gels, have also been successfully used to address contour defects but complications such as infection, capsule formation and migration limit their use. Silicone implants have also been described with good results, but the reduced immunity of these patients and the risk of peri-implant infection is a major concern [22]. Male pattern baldness affects 50% of men over 50 years old. Hair restoration surgery has become increasingly popular in recent years. Follicular transplantation involves harvesting hair follicles from a hair-rich region and individually transplanting them to the affected areas. Up to 1,250 hair grafts are typically transplanted in a single session and 3–4 sessions may be required to recreate a frontal forelock [23]. Recent studies in adult mice show that follicular stem cells can be induced to create new hair follicles. This raises the prospect of using gene therapy to treat baldness and other related conditions.
71.2.5.2 Breast Breast augmentation is the most common aesthetic procedure in Europe and North America, with over 350,000 performed annually in the United States alone. In the UK, the licensed implants contain either silicone
929
gel (semi-liquid or cohesive) or normal saline inside a silicone elastomer shell. The main concern with breast implants has been their potential to cause cancer, and indeed silicon implants were removed from the US market for 14 years before being re-instated for breast reconstruction in 2006. Following a significant amount of research and a number of independent reviews, the conclusion is that silcone gel breast implants are not associated with greater health risks than other surgical implants and elicit a biological rather than a toxic reaction. Implant manufacturers continue to strive to design implants that have a natural shape and feel, maintain their size for longer and have a low risk of capsular contracture and rupture. Textured implants, salinefilled implants and those coated in polyurethane foam may reduce capsular contracture. Finally, breast implants can interfere with screening mammography. This problem has been resolved with an alteration in screening technique, which enables adequate examination of the whole breast.
71.2.6 Obesity Surgery Obesity is a growing public health problem, which places a significant socio-economic burden on society. Bariatric surgical procedures, such as gastric banding and gastric bypass surgery, are well established in the treatment of morbidly obese patients. When successful, they result in massive weight loss, leaving the patients with areas of lax skin, especially around the abdomen and back. Both surgical excision and liposuction can be used to manage functional and aesthetic complications of bariatric surgery. Total abdominal liposuction before pannus resection allows selective upper flap undermining with preservation of vessels and nerves, reducing the need for extensive surgical undermining. This technique appears to reduce morbidity and speeds recovery time [24]. Other procedures, such as abdominal panniculectomy, can help relieve urinary incontinence, and liposuction combined with the release of the suspensory ligament can help improve buried penis syndromes. Overall, the changes in self-esteem, self-confidence and social interaction, emotional stability and sexual performance can be immense following this type of surgery [25].
930
M. Nicolaou et al.
71.2.7 The Multidisciplinary Team The UK Department of Health defines a multidisciplinary team (MDT) as a “group of people of different health-care disciplines, which meets together at a given time (whether physically in one place, or by video or teleconferencing) to discuss a given patient and who are each able to contribute independently to the diagnostic and treatment decisions about the patient”. In the UK, MDTs were developed for the provision of cancer services with the aim of delivering high-quality diagnosis, evidence-based decision making and coordinated planning and delivery of care. Plastic surgeons are involved in many different MDTs, ranging from cancer services (e.g. skin, head and neck, breast cancer) to specialist reconstructive services (e.g. cleft lip and palate (CLP), craniofacial, lower limb trauma, burns). For many of the specialist reconstructive services, MDTs are a reflection of a broader organisational structure in which care is provided by nationally designated centres. In some teams, the relative contribution of plastic surgeons is diminishing as surgeons from other specialties gain the skills required for tissue reconstruction. These include head and neck cancer surgery, breast cancer surgery and CLP surgery. The benefits and barriers of MDTs are summarised in Table 71.2.
71.2.8 Cleft Lip and Palate Service CLP has an incidence of approximately 1 in 700 live births in the UK. The prevalence varies with sex, Table 71.2 Potential benefits and barriers for MDTs Benefits Barriers Improved patient outcomes
Dysfunctional team working
Enhanced referral system for specialist opinion
Poor attendance
Improved data collection for clinical audit and clinical governance
Poor administrative support
Informed “corporate” decision making
Poor funding and bureaucracy
Improved inter-specialty communication and working environment Improved recruitment into clinical trials
ethnicity, geography and socio-economic status. CLP patients have a challenging spectrum of problems that are difficult to manage. The MDT for CLP is large and includes cleft surgeons, cleft specialist nurses, speech and language therapy, orthodontists, audiologists, paediatricians, geneticists, ENT surgeons, oral and maxillofacial surgeons, paediatric dentists, dental hygienists, paediatric radiographers, psychologists and administrative staff (including audit and research). The input of this MDT is needed from before birth through to adulthood, often stretching over 20 years. In the early 1990s, concerns were raised regarding the quality of UK cleft care compared to European standards [26]. The MDTs were often poorly defined and spread over large geographical areas. The large number of teams resulted in low-volume surgeons delivering care, often only performing a handful of primary repairs each year. This also had an impact on training, record keeping and clinical audit. A national study was commissioned, which resulted in the Clinical Standards Advisory Group report of 2001 [27]. The reconfiguration led to centralisation of services, which reduced the number of centres from 57 to below 20. The MDTs are now well defined and act in high-volume centres that are well funded and can support high quality provision of patient care and health care training. Record keeping is now standardised, enabling improved research and clinical audit. It remains difficult to provide good evidence for the effectiveness of MDTs. The data from Europe suggest that high-volume, centralised cleft units have the best outcomes. However, there are many confounding factors such as differing surgical technique, improved facilities and standardised care protocols that make it difficult to quantify the contribution of MDTs. Further studies are underway to improve the evidence base for MDT.
71.3 Molecular and Biological Developments Within the Specialty 71.3.1 Wound Repair and Scar-Free Healing 71.3.1.1 Normal Tissue Repair and Scarring Scar formation is the usual end point of normal mammalian tissue repair. This is likely to result from natural selection favouring the survival benefits of rapid wound
71 Plastic, Reconstructive and Aesthetic Surgery: Current Trends and Recent Innovations
931
Fig. 71.4 A schematic representation of skin [28]
epidermis
dermis
hair follicle hypodermis
adipose tissue nerves blood vessels sweat gland
healing over slower, more aesthetically and functionally acceptable results. Wound healing is a complex process involving the interplay of many different cell types and bioactive molecules. Epithelial injury alone usually results in scar-free regeneration but injury to both the epidermis and dermis leaves an abnormal dermis with loss of epidermal-associated skin appendages, such as hair follicles (Fig. 71.4). Cutaneous wound healing can be artificially divided into a number of phases. Initial injury usually results in bleeding and subsequent fibrin-fibronectin clot formation. This acts to stop bleeding, cover the wound and provides a reservoir of growth factors and cytokines. Inflammatory cells are then rapidly recruited to the wound by numerous chemotactic signals such as tumour necrosis factor alpha (TNF-a), platelet derived growth factor (PDGF), transforming growth factor beta (TGF-b) and basic fibroblast growth factor (bFGF). Re-epithelisation occurs in parallel and is stimulated by other cytokines such as epidermal growth factor (EGF), keratinocyte growth factor (KGF) and insulin-like growth factor (IGF-1). Resident fibroblasts also respond to these growth factors and start to proliferate and synthesise new extracellular matrix. In addition to the neo-matrix, angiogenesis is promoted by various angiogenic growth factors, including vascular endothelial growth factor (VEGF), bFGF, PDGF and TGF-b. Some of the fibroblasts change phenotype to become contractile myofibroblasts, which help close the wound. Over the following
weeks and months, the wound undergoes remodelling and the resulting scar matures. Hypertrophic and keloid scarring represent the worse outcomes on the spectrum of scar formation [29].
71.3.1.2 Modulation of Wound Repair and Scar-Free Wound Healing Conventional approaches to modulating wound healing concentrate on managing established scarring. They include application of pressure, topical silicone, local corticosteroid injection, surgical excision and radiotherapy. However, to achieve improved wound healing, interventions have to be made early in the healing process. Research into fetal tissue repair led to major advances in this field. Early fetal healing is essentially a regenerative process, resulting in scar-free wound healing. The main differences from adult wound healing are a lack of platelet degranulation and clot formation, a shorter and less aggressive inflammatory response and a higher level of bioactive molecules involved in skin development and growth. Various molecules, which can modulate wound healing are undergoing clinical trials including PDGF, TGF-b3, IL-10, mannose-6-phosphate, and 17 beta estradiol. The addition of recombinant growth factors has highlighted a number of practical problems, including delivery, dosing and duration of action. Tissue engineering and gene therapy may offer
932
M. Nicolaou et al.
solutions to some of these problems and are discussed in more detail in the next section. Recombinant PDGF (Regranex®) is the only growth factor currently licenced for use in wound healing and is specifically for use in diabetic ulcers of the lower limb. PDGF is released by a variety of cells and is most potent as the homodimer PDGF-BB. In wound healing, it is thought to act as a chemoattractant. It promotes cell recruitment and further release of growth factors resulting in extracellular matrix deposition, angiogenesis and promotion of re-epithelialisation. However, the most studied growth factor related to wound healing is TGF-b, which has three isoforms (TGF-b1, 2, 3). In fetal wounds, TGF-b1 and TGF-b2 concentrations are low whereas TGF-b3 is high. This is reversed in adult wounds, where an increase in TGF-b1 and TGF-b2 is associated with platelet degranulation and clot formation. Modulation of this ratio in adults appears to be beneficial and can be achieved by reducing TGF-b1 and TGF-b2 or increasing TGF-b3. Recombinant TGF-b3, marketed as Juvista®, is now in advanced clinical trials.
have the potential to differentiate into many different cell types. For instance, adult mesenchymal stem cells can give rise to skin, bone, cartilage, muscle and fat. Embryonic stem cells are totipotent, but their use is associated with greater technical and ethical barriers. An ideal matrix scaffold should not only physically retain the cells but also allow cell migration and diffusion of nutrients and bioactive molecules. The matrix should also be nonimmunogenic, biodegradable and have mechanical and biological properties closely matched to the natural tissue. Biomaterials used to generate matrix scaffolds can be natural (e.g. collagen, fibronectin, alginate and hyaluronan) or synthetic (e.g. polyglycolic acid, polylactic acid and polyester). Bioactive molecules may be added to the construct or secreted by the cells within it. The use of stem cell technology and gene therapy may improve the controlled delivery of these substances [31]. Finally, vascularisation of tissue engineered constructs has become an important field of research with the increase in size and complexity of constructs.
71.3.2 Tissue Engineering and Regenerative Medicine
71.3.2.2 Skin
71.3.2.1 Introduction Langer and Vacanti [30] defined tissue engineering as “an interdisciplinary field that applies the principles of engineering and life sciences towards the development of biological substitutes that restore, maintain or improve tissue function or a whole organ”. The term regenerative medicine is often used synonymously but generally refers to the use of stem cell technology. There is a growing need for autologous tissue to reconstruct defects caused by disease, trauma or congenital anomaly. Tissue engineering may provide a means to overcome this lack of autologous tissue and improve the overall functional and aesthetic outcome of reconstructions. The three major components of tissue engineering are cells, a biocompatible and mechanically suitable scaffold or matrix and appropriate bioactive molecules. Normal primary cells in tissue culture tend to divide a certain number of times and then become senescent. This problem was overcome by the development of immortalised cell lines and, more recently, the discovery of stem cells. These cells can undergo an unlimited number of cell divisions and in the case of stem cells
The need for skin substitutes to treat acute and chronic wounds is considerable. The major challenge in skin tissue engineering is to replicate the full structure and function of mature skin, including the associated adnexal structures such as hair follicles and sebaceous glands (Fig. 71.5). There are a number of skin substitutes currently used in clinical practice (Table 71.3) (Fig. 71.6). They face similar challenges encountered by all tissue engineered constructs, including mechanical and degradation deficiencies, lack of cell–matrix interaction and mechanotransduction of signals. During wound healing, the interaction between cells, matrix and bioactive molecules fluctuates in both a temporal and spatial fashion. Integration of some of the key bioactive molecules, such as PDGF, TGF, FGF, EGF and VEGF, would greatly enhance the ability of a construct to contribute to skin regeneration. An alternative to adding bioactive molecules to the construct is to attempt to stimulate their release through embryonic-like regeneration. This might be achieved by enabling epithelial cell–stem cell interactions that are responsible for initiating the appropriate signalling cascades. Skin stem cells are found on the basement membrane and in the hair follicle bulge. The later compartment is likely to be the key to the
71 Plastic, Reconstructive and Aesthetic Surgery: Current Trends and Recent Innovations
933
+/– natural/synthetic matrix
keratinocytes + growth factors and cytokines important for regeneration fibroblasts/fibrocytes endothelial cells/pericytes next generation, fully functional skin substitute
preadipocytes
progenitor cells
hair follicle cells
Fig. 71.5 Components required for a fully functional skin substitute [32]
Fig. 71.6 Sheet of autologous skin substitute based on a benzyl ester of esterified hyaluronic acid (HYAFF 11) scaffold. In this instance, it is being used to cover a wound following excision of a giant congenital naevus [23]
regeneration of skin adnexae. Other sources of stem cells include those derived from bone, adipose tissue and the circulation (fibrocytes). Although autologous stem cells would be ideal, the need for off the shelf products (e.g. Dermagraft®) means that allogenic cells may have a role (Table 71.3).
71.3.2.3 Musculoskeletal and Soft Tissue Tissue engineering of tissues other than skin is also well advanced. Reconstructive surgery often demands multiple tissue types or more autologous tissue than is available.
Adipose tissue is abundant and accessible and is frequently lost through injury or resection. Transfer of vacularised local or distant tissue can often provide an acceptable reconstruction. However, autologous fat grafts have been less successful due to progressive absorption of the tissue with time. Recently, it has been discovered that adipose tissue-derived stem cells can differentiate towards adipogenic, osteogenic, chrondrogenic, myogenic and neurogenic lineages. Adipose tissue can be easily harvested with minimal morbidity and so represents an excellent source of autologous stem cells for tissue engineering and regenerative medicine. Loss of skeletal muscle poses a significant problem. Transfer of muscle from local or distant sites is possible but may be associated with significant donor site morbidity. Skeletal muscle is composed of strictly orientated dense multinucleated muscle cells tightly packed in an extracellular matrix. Scattered along the basement membrane is a myoblast subpopulation called satellite cells. These cells have the potential to regenerate and can be harvested and then expanded in vitro. The myoblast cell cultures are then returned to the defect in a transport matrix or seeded onto a 3D scaffold to generate a muscle tissue in vitro before delayed transplantation. No clinical applications have been developed yet and muscle engineering remains a challenge. Finally, cartilage and bone have been engineered both separately and together. They have different biological and physical requirements for their scaffolds. In craniofacial surgery, the temporomandibular joint is a particular target for osteochondral tissue engineering.
934
M. Nicolaou et al.
Table 71.3 Examples of epidermal, dermal and combined skin substitutes. Adapted from Metcalfe [28] Type of skin Commercial Epidermis Dermis Notes substitute name Epidermal
Dermal
Composite
Epicel®
Cultured epidermal autograft from a patient skin biopsy
None
Takes at least 3 weeks to achieve confluent sheets ready for transplantation
Cellspray®
Suspension of cultured autologous keratinocytes from a patient skin biopsy
None
Takes 5 days before the cells can be sprayed on to the wound
Laserskin®
Cultured autologous keratinocytes from skin biopsy in a perforated hyaluronic acid membrane
None
More stable than Epicel but still takes 3 weeks to grow
EpiDex®
Cultured autologous outer root sheath hair follicle cells
None
Fragile product, which takes upto 6 weeks to manufacture
Myskin®
Cultured autologous keratinocytes on a PVC polymer coated with a plasma-polymerised surface
Under development
PVC appears to be a fairly stable delivery platform. Multiple applications needed to improve outcome
Alloderm®
None
Cadaveric allograft skin
Potential disease transmission and transplant rejection
Dermagraft®
None
Allogeneic neonatal fibroblasts on a biosorbable scaffold
Potential disease transmission and transplant rejection
Integra®
Silicone
Bovine type I collagen and shark chondroitin sulphate
Potential disease transmission and transplant rejection. Epidermal analogue is replaced with an autologous skin graft after 2 weeks
Transcyte®
Silicone
Nylon mesh coated with porcine dermal collagen and bonded to a polymer membrane (silicone) seeded with neonatal human fibroblasts
Potential disease transmission and transplant rejection. Nylon mesh does not degrade
Permacol®
None
Porcine-derived acellular dermal matrix
Nonimmunogenic. Supports some revascularisation
Apligraf®
Human allogeneic neonatal keratinocytes
Human allogeneic neonatal foreskin fibroblasts in bovine type I collagen, extracellular matrix proteins and cytokines
Repeat applications to improve outcome. Risk of disease transmission and transplant rejection
OrCel®
Human allogeneic neonatal keratinocytes
Human allogeneic neonatal foreskin fibroblasts in a bovine collagen sponge
Not strictly a skin substitute but a biological dressing
71.3.2.4 Nerve A normal peripheral nerve has an axon surrounded by a myelin sheath, which is maintained by Schwann cells.
Following nerve injury, they initiate an inflammatory response to clear up the axonal and myelin debris before reverting to the secretion of growth factors to provide a suitable environment for nerve regeneration. They also
71 Plastic, Reconstructive and Aesthetic Surgery: Current Trends and Recent Innovations
rapidly proliferate to create a supportive framework along which the regenerating nerve can grow. Nerve gaps are usually reconstructed with autologous nerve graft (e.g. sural nerve). There are few options for autologous nerve grafts in the human body. Harvesting of the nerve and subsequent transplantation is associated with significant morbidity as a result of neuroma formation and altered sensation. Tissue engineering and stem cell research is beginning to show some promise in developing engineered nerve conduits for nerve reconstruction. Primary autologous Schwann cells are difficult to harvest and grow. However, pluripotent neural crest stem cells have recently been discovered in hair follicles. These are readily accessible and can differentiate into Schwann cells. The cells can then be used to coat the inside of an artificial nerve conduit and used to bridge a large peripheral nerve gap. Several biodegradable scaffolds have been used, including polylactic acid and polylactic-co-glycolic acid (PLGA), although the optimal physical properties of a conduit are yet to be resolved. Nerve growth factor (NGF) appears to be the most active molecule for the survival and maintenance of peripheral nerves, and this and other factors can also be incorporated into a nerve conduit or may be delivered using gene therapy.
935
The gold standard remains autologous vein grafts although prosthetic conduits such as expanded polytetrafluoroethylene (ePTFE) and polyethylene terephthalate (Dacron®) have been used with success. Synthetic grafts are not suitable at lower flow rates due to their increased thrombogenicity and lack of compliance, which results in intimal hyperplasia and reduced patency rates. Newer materials such as poly(carbonate-urea)urethane (CPU) are being used in vessels of smaller diameter due to their improved haemodynamics. These grafts can be further enhanced by lining them with antiplatelet agents or cells (e.g. endothelial progenitor cells). In addition to microvessels and a suitable matrix, engineered tissue needs to encourage inosculation and angioinduction. For inosculation to occur, the engineered tissue needs to have endothelial cells, smooth muscle cells and fibroblasts available in the matrix. In addition, this process can be augmented by endothelial progenitor cells that are either added to the matrix or cultured ex vivo and then introduced to the circulation, where they find their way to sites of neovascularisation. Initial angiogenisis is mediated by TGF-b. Further maturation of vessels is encouraged by PDGF pathways and angiopoietin-1 and -2. VEGF, bFGF and nitric oxide also play important roles in vasculogenesis and are discussed in relation to gene therapy below.
71.3.2.5 Vascularisation
71.3.3 Gene Therapy An effective blood supply is fundamental to the success of most reconstructive procedures and this principle also applies to engineered tissue. Vascularisation is not only key to the survival of larger constructs but it also allows the engineered tissue to integrate with the host through induction of ingrowing vessels (angioinduction) as well as outgrowth of its own microvasculature. Although this can be enhanced by adding recom binant angiogenic factors, gene therapy offers the potential for targeted site-specific and long-term delivery. Currently, tissues engineered in vitro have to be so thin that they can gain a nutrient supply initially by diffusion to survive in vivo. The properties of blood vessels change as they move from a high-flow arterial system to a low-flow capillary system within the vascular tree. However, there are some common properties, such as compliance and being nonthrombogenic. Vascular bypass procedures requiring arterial conduits are commonly performed to treat peripheral vascular disease and coronary artery disease.
71.3.3.1 Introduction Gene therapy is the introduction of genetic material into cells to alter their protein synthesis. There has been significant progress over the last 20 years since its development as a means to treat inherited genetic disorders. The main gene delivery techniques are broadly divided into viral or nonviral (Table 71.4) and can be Table 71.4 Viral and nonviral gene delivery techniques Viral Nonviral Adenoviruses
Direct injection
Adeno-associated viruses
Liposomes
Retroviruses
Electroporation
Herpes simplex virus
Particle bombardment (“gene gun”) Antisense oligonucleotides
936
performed in vivo or ex vivo. The key parameters that need to be considered in relation to the different techniques include transfection efficiency, specificity, toxicity, transfection technique, DNA insert size, type of target cell (dividing/nondividing), immune response and the length of transfection.
71.3.3.2 Tissue Healing and Flap Survival The skin is a good target for gene therapy. It is easily accessible and has many potential applications from enhancing wound healing to delivering therapeutic gene products for treating systemic disease. As discussed above, wound healing can be improved by the delivery of key growth factors such as EGF, IGF-1, TGF-b3 and PDGF-B. Gene therapy approaches include vectormediated overexpression of the growth factors or their receptors and implantation of ex vivo transfected cells. Gene therapy also has potential to improve graft success rates [33]. Angiogenic factors that are being investigated include vascular endothelial growth factor (VEGF), FGF and nitric oxide. VEGF has already been used clinically in the management of coronary artery disease and peripheral vascular disease. Plastic surgery applications include improving angiogenesis during tissue expansion, which enables fast, high-volume expansion. In animal models, delivery of VEGF via adenovirus and liposome vectors induced a significant increase in angiogenesis and improved flap survival. Controlled release of thrombolytic agents may also improve microvascular blood flow across anastomoses and perfusion of free tissue transfers.
71.3.3.3 Skin Cancer Cutaneous malignant melanoma has been one of a number of cancers targeted for gene therapy. Passive immunotherapy approaches, such as cytokine therapy (e.g. interferon-a), have had limited success and so active approaches through immunisation or adoptive transfer of a cellular immune response are being developed. Cancer gene therapy has yet to deliver clinical benefit and has been beset with problems, including low gene transfer rates, poor vector technology and limited selectivity. There is good evidence that melanoma is an “immunogenic” tumour and cell-mediated immunity appears to
M. Nicolaou et al.
be critical to disease progression. Melanoma cells have a number of mechanisms for avoiding detection by T cells, including loss of expression of antigen or human leukocyte antigen (HLA) molecules. One strategy is to genetically modify the tumour cells to increase their immunogenicity. This is achieved by making them secrete immunologically relevant cytokines or express new or increased levels of cell-membrane molecules. Another immune-related strategy is the adoptive transfer of ex vivo expanded melanoma-specific effector T cells. Tumour suppressor genes are often missing or damaged. Reintroduction of these genes, such as p53, aims to restore or enhance apoptosis of tumour cells. Alternatively, oncogenes have been targeted by blocking the translation of their mRNA. These can be achieved by active cleavage of the mRNA, oncogene antisense oligonulceotides and introduction of a gene encoding an antibody to the oncogene product. Suicide gene therapy (or gene-directed enzyme prodrug therapy) is the introduction of a gene into tumour cells, which can convert a nontoxic prodrug into a toxin. An example is the herpes simplex virus thymidine kinase (HSVtk) gene. Ganciclovir is given to the patient and is subsequently phosphorylated by HSVtk to a molecule that stops elongation of DNA during the S phase of the cell cycle. Viruses themselves can also be used as therapeutic agents (oncolytic viruses). Some are naturally oncolytic and selective for tumours, although direct injection is most successful. Their action may be enhanced by the addition of transgenes as described above.
71.4 Imaging and Diagnostics Within the Specialty 71.4.1 Imaging Advances in computer technology and electronics have enabled rapid, high-resolution CT, MRI and ultrasound scanning. These mainstream imaging modalities are routinely used in diagnosis and operative planning, particularly for craniofacial procedures [34]. Imaging techniques that document the external threedimensional structure of soft tissue, such as 3D stereoscopic photography [35] and 3D LASER scanning have
71 Plastic, Reconstructive and Aesthetic Surgery: Current Trends and Recent Innovations
937
also found a role both in research and clinical practice. Applications include assessment and planning before maxillofacial surgery, measurement of breast volumes and assessment of facial volume changes in lipoatrophic HIV patients following injection of fillers [36].
71.4.2 Free Flap Monitoring Advances in flap design and surgical technique have reduced failure rates for free tissue transfer to as low as 5% [37]. Anastomotic problems are the commonest cause of flap failure and need to be identified early and reexplored for salvage to be successful. There are a number of approaches to flap monitoring in addition to clinical observation. Over 90% of microsurgeons routinely use an adjuvant method for flap monitoring [38]. The ideal monitoring system should be noninvasive, accurate, continuous, instantaneous, inexpensive and easy to use. External handheld Doppler is often used for flap monitoring, although it can be difficult to differentiate between flap recipient vessels and the vascular pedicle. Internal Doppler probes have also been described but may lead to a high rate of false positive re-explorations. More sophisticated LASER Doppler probes measure the flap perfusion but the variability in tissue perfusion dynamics and differences between patients means that users have to be highly trained. Other devices include the LICOX tissue oxygenation microprobe, which measures the flap’s partial pressure of oxygen. However, they have to be very accurately positioned to be effective. Similarly, microdialysis catheter implantation allows the detection of flap ischaemia (glucose drops while glycerol and lactate concentration increases), but this technique may detect local tissue ischaemia rather than global flap ischaemia. A flap monitoring system, which works using nearinfrared spectroscopy (NIRS) has been recently proposed for flap monitoring. It uses a probe attached to the flap and works principally by delivering light (700–1,100 nm) in the biological tissue and detecting the amount of absorption by oxygen-dependent chromophore, such as haemoglobin. Computer algorithms analyse the degree of attenuation and the percentage of oxygenated/deoxygenated haemoglobin is calculated. A recent study by Repez et al. showed the efficacy of NIRS for detecting early flap failure in 50 flaps before any clinical signs [39].
Fig. 71.7 The O2C system that uses simultaneous LASER– Doppler flowmetry and tissue spectrophotometry [Image from LEA Medizintechnik Giessen]
Another noninvasive system that uses simultaneous LASER–Doppler flowmetry and tissue spectrophotometry is O2C (Oxygen-to-see, LEA-Medizintechnik GmbH, Gießen, Germany) (Fig. 71.7). It is capable of measuring blood flow/oxygenation at depths of 2 and 8 mm simultaneously, and has been shown to predict both anastomotic venous congestion and arterial occlusion before these problems are clinically apparent [40].
71.4.3 Sentinel Node Biopsy for Malignant Melanoma The incidence of malignant melanoma is increasing. Surgical excision is the mainstay of therapy as melanoma is relatively resistant to chemotherapy and radiotherapy. Recent developments in surgical management of melanomas include further proof of the significance of sentinel node biopsy (SLNB) when planning the treatment of intermediate thickness melanoma (1.2– 3.5 mm Breslow thickness). SLNB is based on the hypothesis that cutaneous melanoma will first metastasise via the lymphatics to one or more lymph nodes in the regional lymph basin serving the primary site. If the sentinel node is free of disease, the rest of the nodes are also expected to be clear. A blue dye, containing radioisotope, is injected at the site of the primary lesion. The sentinel lymph node is then identified using a Geiger counter and excised. A large randomised control trial by Morton et al. demonstrated that SLNB can be a prognostic indicator for melanoma-specific mortality (26.2% if SLNB positive vs. 9.7% if SLNB negative) and that disease-free survival may be prolonged owing to a lower
938
M. Nicolaou et al.
burden of disease following the procedure. Although this still remains a controversial subject, many experts have adopted SLNB for the intermediate thickness group [41]. More work is currently being done in investigating the benefits of lymph node dissection after a positive SLNB (MSLT-II trial). The results from chemotherapy for metastatic melanoma continue to be disappointing with generally poor response rates. Immunotherapy (e.g. interferon or anti-CTLA-4) resulted in more encouraging results and research is underway to further elucidate the role for metastatic disease [42].
71.5 Future Developments and Research Focus
Fig. 71.8 The daVinci robot in a laboratory at Imperial College London. Note the master-slave console
Innovations in surgical technique and instrumentation have led to the major advances in plastic surgery over the last 50 years, but biomedical science is likely to provide the greatest innovations in the future. Microsurgery and free tissue transfer will continue to develop, albeit at a slower pace as many flaps and their reconstructive uses have been described. Further improvements in performance will come through better understanding of tissue physiology and perioperative management of patients undergoing flap surgery. In the longer term, tissue engineering and gene therapy may improve flap survival as well as potentially become therapeutic tools themselves. Robotic surgery may also lead to technical improvements in plastic surgery (Fig. 71.8). It offers a number of benefits, including fine motor control of surgical instruments, improved precision, elimination of tremor and 3D visualisation of the operative field. There is some evidence to support its use in microvascular surgery [43] but mainstream use will only come with miniaturisation of the instruments and reduced cost. The fields of tissue engineering, regenerative medicine and gene therapy have shown considerable promise but have delivered limited clinical applications for plastic surgery. A product aimed at inducing scar-free wound healing through the local delivery of growth factors is likely to be first to market. Skin substitutes continue to improve but still fall short of providing a real alternative to autologous skin grafting. The major challenges for tissue engineering include gaining angiogenic control, stem cell science, cell sourcing, and improved manufacture and commercialisation of
products [44]. Technologies such as micro-contact printing are likely to enable the manufacture of custom constructs with the careful integration of the cells and bioactive molecules within the tissue scaffold. Current attempts at tissue engineering often focus on in vitro manufacture of the construct, although with improved knowledge of stem cell biology and the processes involved in regeneration, this may well shift towards using the body as a “bioreactor” for generation of new tissue. Gene therapy will broaden its applications through continued advances in vector transfer, cell targeting and regulation of gene expression. It is likely that cutaneous delivery of gene products for both local and systemic diseases will become a clinical reality. For plastic surgery, this may translate to improved skin cancer treatment, modulation of the immune system for allotransplantation and improved wound healing and flap survival. Finally, improved understanding of the complex mechanisms controlling cell behaviour may lead to the development of biological therapies. This approach has been successfully applied to medical conditions, for example anti-TNF-a therapy for rheumatoid disease [45], and the equivalent approach may be productive for surgery. Plastic surgery is an exciting field, which continues to benefit from the cutting edge of scientific innovation. Biomedical research is finally delivering clinical applications, and this is likely to accelerate over the coming years. However, public perception of plastic surgery remains centred on aesthetic surgery and more needs to be done to improve the understanding of the broad range of work that plastic surgeons perform.
71
Plastic, Reconstructive and Aesthetic Surgery: Current Trends and Recent Innovations
References 1. Evans BC, Evans GR (2007) Microvascular surgery. Plast Reconstr Surg 119:18e–30e 2. Mathes SJ, Hentz VR (2005) Plastic surgery, 2nd edn. Saunders, Philadelphia, PA 3. Cho AB, Junior RM (2008) Application of fibrin glue in microvascular anastomoses: comparative analysis with the conventional suture technique using a free flap model. Microsurgery 28:367–374 4. Koshima I (2008) Atypical arteriole anastomoses for fingertip replantations under digital block. J Plast Reconstr Aesthet Surg 61:84–87 5. Granzow JW, Levine JL, Chiu ES et al (2007) Breast reconstruction with perforator flaps. Plast Reconstr Surg 120: 1–12 6. Tan BK, Chen HC, He TM et al (2004) Flap prefabrication – the bridge between conventional flaps and tissue-engineered flaps. Ann Acad Med Singapore 33:662–666 7. Wiese KG (1993) Osmotically induced tissue expansion with hydrogels: a new dimension in tissue expansion? A preliminary report. J Craniomaxillofac Surg 21:309–313 8. Swan MC, Goodacre TE, Czernuszka JT et al (2008) Cleft palate repair with the use of osmotic expanders: a response. J Plast Reconstr Aesthet Surg 61:220–221 9. Saleh M, Yang L, Sims M (1999) Limb reconstruction after high energy trauma. Br Med Bull 55:870–884 10. Swennen G, Schliephake H, Dempf R et al (2001) Craniofacial distraction osteogenesis: a review of the literature: part 1: clinical studies. Int J Oral Maxillofac Surg 30: 89–103 11. Matsuno T, Ishida O, Sunagawa T et al (2004) Bone lengthening for congenital differences of the hands and digits in children. J Hand Surg Am 29:712–719 12. Nanchahal J, Tonkin MA (1996) Pre-operative distraction lengthening for radial longitudinal deficiency. J Hand Surg Br 21:103–107 13. Slater RR, Agee JM, Goss BC (2002) Dynamic extension torque for the reversal of PIP contractures. Oper Tech Plast Reconstruct Surg 9:161–168 14. Morykwas MJ, Simpson J, Punger K et al (2006) Vacuumassisted closure: state of basic research and physiologic foundation. Plast Reconstr Surg 117:121S–126S 15. Korber A, Franckson T, Grabbe S et al (2008) Vacuum assisted closure device improves the take of mesh grafts in chronic leg ulcer patients. Dermatology 216:250–256 16. Hettiaratchy S, Randolph MA, Petit F et al (2004) Composite tissue allotransplantation – a new era in plastic surgery? Br J Plast Surg 57:381–391 17. Hettiaratchy S, Butler PE (2002) Face transplantation – fantasy or the future? Lancet 360:5–6 18. Whitaker IS, Duggan EM, Alloway RR et al (2008) Composite tissue allotransplantation: a review of relevant immunological issues for plastic surgeons. J Plast Reconstr Aesthet Surg 61:481–492 19. Morris PJ, Bradley JA, Doyal L et al (2004) Facial transplantation: a working party report from the Royal College of Surgeons of England. Transplantation 77:330–338 20. Siemionow M, Agaoglu G, Unal S (2006) A cadaver study in preparation for facial allograft transplantation in humans:
939
part II. Mock facial transplantation. Plast Reconstr Surg 117:876–885; discussion 886–888 21. Hallock GG (2008) A brief history of minimally invasive plastic surgery. Semin Plast Surg 22:5–7 22. Nelson L, Stewart KJ (2008) Plastic surgical options for HIV-associated lipodystrophy. J Plast Reconstr Aesthet Surg 61:359–365 23. Eisenmann-Klein M, Neuhann-Lorenz C (2007) Innovations in plastic and aesthetic surgery. Springer, Berlin 24. Espinosa-de-los-Monteros A, de la Torre JI, Rosenberg LZ et al (2006) Abdominoplasty with total abdominal liposuction for patients with massive weight loss. Aesthetic Plast Surg 30:42–46 25. Datta G, Cravero L, Margara A et al (2006) The plastic surgeon in the treatment of obesity. Obes Surg 16:5–11 26. Shaw WC, Dahl E, Asher-McDade C et al (1992) A sixcenter international study of treatment outcome in patients with clefts of the lip and palate: part 5. General discussion and conclusions. Cleft Palate Craniofac J 29:413–418 27. Bearn D, Mildinhall S, Murphy T et al (2001) Cleft lip and palate care in the United Kingdom – the Clinical Standards Advisory Group (CSAG) Study. Part 4: outcome comparisons, training, and conclusions. Cleft Palate Craniofac J 38: 38–43 28. Metcalfe AD, Ferguson MW (2007) Bioengineering skin using mechanisms of regeneration and repair. Biomaterials 28:5100–5113 29. Miller MC, Nanchahal J (2005) Advances in the modulation of cutaneous wound healing and scarring. BioDrugs 19: 363–381 30. Langer R, Vacanti JP (1993) Tissue engineering. Science 260:920–926 31. Andreadis ST (2007) Gene-modified tissue-engineered skin: the next generation of skin substitutes. Adv Biochem Eng Biotechnol 103:241–274 32. Metcalfe AD, Ferguson MW (2007) Tissue engineering of replacement skin: the crossroads of biomaterials, wound healing, embryonic development, stem cells and regeneration. J R Soc Interface 4:413–437 33. Tepper OM, Galiano RD, Kalka C et al (2003) Endothelial progenitor cells: the promise of vascular stem cells for plastic surgery. Plast Reconstr Surg 111:846–854 34. Hassfeld S, Brief J, Raczkowsky J et al (2003) Computerbased approaches for maxillofacial interventions. Minim Invasive Ther Allied Technol 12:25–35 35. Lee S (2004) Three-dimensional photography and its application to facial plastic surgery. Arch Facial Plast Surg 6:410–414 36. Paton NI, Yang Y, Sitoh YY et al (2007) Validation of threedimensional laser scanning for the assessment of facial fat changes. HIV Med 8:498–503 37. Kroll SS, Schusterman MA, Reece GP et al (1996) Choice of flap and incidence of free flap success. Plast Reconstr Surg 98:459–463 38. Hirigoyen MB, Urken ML, Weinberg H (1995) Free flap monitoring: a review of current practice. Microsurgery 16:723–726; discussion 727 39. Repez A, Oroszy D, Arnez ZM (2008) Continuous postoperative monitoring of cutaneous free flaps using near infrared spectroscopy. J Plast Reconstr Aesthet Surg 61: 71–77
940 40. Holzle F, Loeffelbein DJ, Nolte D et al (2006) Free flap monitoring using simultaneous non-invasive laser Doppler flowmetry and tissue spectrophotometry. J Craniomaxillofac Surg 34:25–33 41. Morton DL, Thompson JF, Cochran AJ et al (2006) Sentinelnode biopsy or nodal observation in melanoma. N Engl J Med 355:1307–1317 42. Wolchok JD, Saenger YM (2007) Current topics in melanoma. Curr Opin Oncol 19:116–120
M. Nicolaou et al. 43. van der Hulst R, Sawor J, Bouvy N (2007) Microvascular anastomosis: is there a role for robotic surgery? J Plast Reconstr Aesthet Surg 60:101–102 44. Johnson PC, Mikos AG, Fisher JP et al (2007) Strategic directions in tissue engineering. Tissue Eng 13: 2827–2837 45. Feldmann M, Maini RN (2003) Lasker Clinical Medical Research Award. TNF defined as a therapeutic target for rheumatoid arthritis and other autoimmune diseases. Nat Med 9:1245–1250
Neurosurgery: Current Trends and Recent Innovations
72
David G.T. Thomas and Laurence Watkins
Contents 72.1
Introduction ............................................................ 941
72.2
Neurotrauma........................................................... 942
72.3
Brain and Spinal Tumours .................................... 943
72.4
Neurovascular ......................................................... 946
72.5
Functional Neurosurgery ....................................... 946
72.6
Spine ........................................................................ 947
72.7
The Future .............................................................. 948
References ........................................................................... 949
D. G. T. Thomas () The National Hospital for Neurology and Neurosurgery, Institute of Neurology, Queen Square, London, WC1N 3BG, UK e-mail: [email protected]
Abstract There is no doubt that neurosurgery is an evolving technology related speciality. Neurotrauma, brain and spinal tumours and cerebrovascular disease remain important clinical problems throughout the world. Neurosurgery continues to contribute both standard management and also to make research advances in these areas. In this chapter, we outline new developments in neurovascular treatments in the fields of functional neurosurgery and spinal surgery.
72.1 Introduction Neurosurgery deals with the management of a wide range of conditions affecting the brain, spine and peripheral nerves both in children and adults. Trauma to the head or spine frequently requires urgent neurosurgical intervention to evacuate extradural or subdural haematomas, which are causing progressive deterioration following the primary injury. Increasingly, severe primary brain injuries are being managed aggressively with intracranial pressure (ICP) monitoring coupled with non-operative and operative procedures to control ICP. The metabolic changes, which occur in the injured brain, are becoming better understood with the possibility of new therapeutic approaches to reducing progressive secondary brain damage. There are many well-established open neurosurgical operations for lesions in the brain or spine, including those to treat primary or secondary tumours, as well as cerebral aneurysms and arteriovenous malformations [1]. There has been a trend towards minimally invasive neurosurgery employing novel routes of access, as well as routinely using CT or MRI scans for accurate neuronavigation [2, 3]. Using focussed radiation, it has become possible to perform radiosurgery on brain
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_ 72, © Springer-Verlag Berlin Heidelberg 2010
941
942
lesions through the intact scalp and skull. A parallel trend has been the growth of interventional radiology, so that many vascular lesions can be treated by endovascular methods without requiring open surgery. Functional neurosurgery is, at present, a rapidly expanding field. This is because of technical developments of implantable electrodes, which can be left, for the long term, deep in the brain or on the surface of the spinal cord in order to change neurological function in a favourable way. Unlike earlier lesional methods, deep brain stimulation (DBS) is reversible and generally safe. The indications for DBS include movement disorders, including Parkinson’s disease and dystonia, as well as intractable pain, drug resistant epilepsy and severe obsessional and affective psychoses. In operative treatment of spinal conditions and peripheral nerve problems, there is considerable overlap between neurosurgery and orthopaedic surgery. Both types of surgeons may specialise in complex spinal problems, where often major bone work with specialised instrumentation for fixation of the spine may be required. This latter field is also currently expanding. Many advances in neurosurgery have depended on developments in radiology. Thus, the invention of CT brain scanning in the 1970s and MRI scanning in the 1980s revolutionised neurosurgical practise. Since then, neurosurgeons have been among the front runners in developing image-directed minimally invasive technology. This technology is now applied as a routine in elective craniotomies, or brain biopsy, in cases of intracranial tumours as well as in endoscopic surgery of the ventricular system.
72.2 Neurotrauma Injury to the head and spine due to trauma remains a leading cause of long-term neurological disability, particularly in young adults. About one million patients attend hospital in the UK each year, following a head injury. A significant number of these patients are under age of 16 years. Head injuries lead to about 9 deaths per 100,000 population per year. Future developments in neurosurgery and intensive care for severe head injury are currently focussed on improving invasive monitoring to predict deterioration. In addition, therapies to improve survival and recovery are being developed and then subjected to
D. G. T. Thomas and L. Watkins
multicentre trials. For example, decompressive craniectomy to reduce pressure on the swollen, injured brain is already widely practised and is currently the subject of a multicentre trial. In the field of monitoring, ICP monitoring is already well established. This technique allows continuous readings from a transducer implanted into the brain parenchyma, so that pressure can be measured continuously and other parameters modified to optimise cerebral blood flow. Other monitoring techniques are being developed to enable measurement of regional blood flow at the bedside in an intensive care unit. Such techniques are currently based either on laser Doppler or infrared absorption. A further monitoring technique, which is increasingly used in head injury is microdialysis. This in vivo method is already one of the most important sampling techniques in physiological and pharmacological research. A probe with a semi-permeable membrane is suffused with a dialysis solution as close as possible to the composition of cerebral spinal fluid. As this dialysis solution passes through the probe, it absorbs molecules from the extracellular space of the tissue, allowing these to be recovered and changes in concentration to be monitored. Microdialysis is already being used in some neurosurgical units for monitoring the head-injured patient. The usual focus is on the lactate:pyruvate ratio as a measure of tissue ischaemia, as well as glycerol and glutamate as indicators of cellular damage. Microdialysis can also be used to monitor tissue penetration of therapeutic agents. As monitoring techniques improve our ability to measure progressive secondary brain damage and even to predict deterioration, in parallel, there will be a need for improved therapy to intervene in the ongoing process of damage. Decompressive craniectomy was mentioned above. In addition, there has been a long history of a largely fruitless search for neuroprotective drugs. Steroids were used for many years and it took a very large multicentre trial (the CRASH trial) to show that this approach was not helpful. Trials will no doubt continue in the search for possible neuroprotective agents, following the catalogue of previous agents, such as in NMDA blockers, magnesium, progesterone, nimodipine and dexanabinol, all of which have proved disappointing in multicentre trials. Introduced hypothermia appears to be a very promising technique to reduce neuronal damage. It is one of
72
Neurosurgery: Current Trends and Recent Innovations
the most reproducible techniques for limiting damage in animal models. In the 1980s, moderate hypothermia of a few degrees below normal was shown to produce significant improvement in outcome in models of both ischaemia and head injury. Over the last decade, numerous clinical trials have been conducted in an attempt to translate laboratory findings into clinical techniques, but in general, benefit to the brain appears to be outweighed by increased cardiac and infective complications. The current hope is that techniques can be developed for regional cooling of the brain, preferentially protecting the brain, while minimising cooling of the rest of the body, to reduce the general complications. This approach has shown some promising results in premature neonates, where a cooling cap can reduce brain temperature through their relatively thin scalp and skull. However, this may not be possible in the adult or older child, and so development is proceeding with such techniques as intravascular cooling coils, for example. A further theme in head injury research is based on the realisation that prompt treatment of cases with intracranial haematoma can improve outcome. Some progress in improving the speed of treatment has been made by the adoption of national guidelines and algorithms to streamline assessment of head-injured patients in the accident and emergency department. However, even if that were optimised, there remains a problem in that the specialist skills of a neuro-intensive care unit and the neurosurgical team are concentrated in a relatively few large hospitals. Thus, most injured patients arrive first at a district hospital without specialist neuroscience facilities and are then required transfer before definitive treatment can be performed. In future, increasing use of telemedicine will enable better information transfer between the initial assessment hospital and the neurosurgical unit. At a simple level, this includes transfer of scan data via a computer image link. In the future, there may be greater interaction via telemedicine audio and visual links, soon after the patient arrives in hospital, or perhaps even at the scene of injury.
72.3 Brain and Spinal Tumours Primary brain tumours are the most common solid tumours occurring in children. They arise commonly in the cerebellum or brain stem, and the majority are
943
malignant (medulloblastoma, ependymoma and brain stem glioma), although some are benign (cerebellar astrocytoma). Primary malignant brain tumours, which are most commonly high-grade gliomas in adults, are the 10th most common tumour in males, with a peak incidence at 45 years (in females, some other tumours are more common). The majority of these tumours occur in the supratentorial cerebrum, most commonly in the frontal, temporal or parietal regions. Metastatic brain tumours are also not uncommon and may be single or multiple. Primary benign brain tumours in adults, that is, meningiomas, pituitary adenomas and acoustic schwannomas are also important tumour types in adults. The technology for treating brain tumours has improved with the use of peroperative “neuronavigation” (Fig. 72.1). Thus, the surgeon has more confidence, and the immediate results of surgical excision in terms of fewer post operative neurological deficits and shorter length of time in ITU are improved [4, 5]. Unfortunately, with malignant tumours, in children and in adults, the longer-term results can be poor. For example, with the highest grade 4 gliomas (also known as glioblastoma), the median survival, even with surgery and adjuvant chemotherapy is just over a year. However, there are indications that novel techniques using fluorescent substances, which can help the surgeon when the operating microscope is fitted with an
Fig. 72.1 Neuronavigation. Set up of the system prior to applying surgical drapes. The essential components for 3D digitisation are: (a) hand held LED transmitter (b) LED Camera for digitisation and registration of head relative to pseudo 3D structure of scan shown on monitor screen (c). Close up of screen display in axial, saggital and coronal planes (d)
944
ultraviolet light source, hence maximising resection, may increase survival by a small extent. Image guidance technology used to carry out brain biopsy with great precision, where resection is not indicated because of the site and the likely hazards, has both increased the success rate of brain biopsy in achieving a definite histological diagnosis (about 95%) as well as reduced the mortality (about 0.3%) (Fig. 72.2) [6, 7]. In spite of refinements in brain imaging, a histological diagnosis is usually necessary to plan the optimum further management (up to 10% of suspected cases of brain tumour turn out, on biopsy, not to be tumours of the expected type or not to be tumours at all). In the case of lower grade gliomas (grade 2), which present typically with epilepsy in young adults, there have been recent changes in management. Previously, if there were no other symptoms apart from epilepsy, with scan findings of a homogeneous low density lesion in the brain, a conservative “wait and see” approach was usually recommended, with surgery performed only many years later when the tumour had expanded or progressed to a higher grade. Recent
Fig. 72.2 Stereotactic brain biopsy. Neurosurgeon inserting 1.2 mm biopsy cannula, guided by stereotactic frame, through burrhole into brain. Small core of tissue 1 × 10 mm recovered. Typically, four cores are taken from each of 2–4 target sites within a lesion
D. G. T. Thomas and L. Watkins
experience with neuronavigation controlled extensive accurate resection of these tumours, even when they are close to eloquent brain areas, shows good results both in epilepsy control as well as in increasing the progression-free survival. Neurosurgical approaches to meningiomas, pituitary adenomas and acoustic schwannomas have also shown a trend towards less invasive procedure. Thus, while relatively easily accessible vault meningiomas are removed by craniotomy, many of those involving the skull base are inaccessible and their surgical removal may involve important side effects, for example, injury to the third or sixth cranial nerves, causing diplopia. It is therefore increasingly common to treat such tumours with focussed radiation using the gamma knife unit, which is a purpose multi-beamed device, built for radiosurgery. Alternatively, modified linear accelerators, where the single beam is controlled by computers to produce high-dose, accurate fields of radiation extending only up to the tumour margin, can be used. Tumours up to about 2.5 cm in diameter can be treated; in some cases, the superficial part of the meningioma is debulked to reduce the tumour below this size before radiosurgery. The effect of radiosurgery is to cause tumour cell death. Initially, the dead cells are swollen and larger than the living ones. Thus, the whole tumour commonly increases in size, before, after a few months, shrinking and remaining static indefinitely. As a consequence, follow-up brain scans are usually done in the first few years to assess tumour control (Fig. 72.3). Radiosurgery is becoming the method of choice to treat acoustic schwannomas up to 2.5 cm in diameter. These benign tumours usually grow from one of the two vestibular nerves, which are in close relation to both the acoustic nerve and the facial nerve. Deafness is common prior to surgery, and facial nerve weakness is a serious and not infrequent complication of microsurgical dissection and removal of the tumour. Patient choice, when offered either for open surgery or radiosurgery, where there is equipoise between the two methods, is increasingly to opt for the less invasive method. Currently, about 70% of acoustic neuromas are treated by radiosurgery. Spine tumours involve a different range of challenges to the neurosurgeon. The primary spinal tumours include intramedullary astrocytomas and ependymomas, while meningiomas arising intradurally or schwannomas arising from the spinal nerve roots are
72
Neurosurgery: Current Trends and Recent Innovations
945
Fig. 72.3 Gamma knife radiosurgery. The Gamma knife is a hemispherical device, which contains 201 cobalt sources focussed at one point. The beams are collimated by internal helmets, which can collimate the beams down to 4–8 mm diameter. Snapshot showing left posterior fossa tumour. “Wire Frame” depiction of skull geometry. Dose volume histogram to show sharp fall off radiation dose at tumour edge. Quantitative data on tumour size at time zero of treatment, 3, 12 and 24 months
also encountered. There are several technical adjuncts, which are helpful, including the universal use of the operating microscope as well as bipolar diathermy and often the ultrasonic surgical aspirator. Where appropriate, the approach to the vertebral canal may be through the oro-pharynx, the chest or the abdomen rather than in a posterior or postero-lateral route. In these cases, the neurosurgeon can benefit from the assistance of other surgeons who are more familiar with operating in the oro-pharynx, chest or abdomen. Largely for technical reasons, the use of neuronavigation has been introduced more slowly in the spinal field than has been the case in the head. One reason for this is that the head, based on the skull, is more easily placed in a fixed position while the vertebral column is mobile. However, neuronavigation systems have begun to overcome these problems and it is now possible to use both neuronavigation and focussed radiation for spinal tumours. Metastatic spinal tumours are more common than primary ones. There is no single agreed treatment for these. If the spine is stable and the source of the metastasis is radiosensitive or chemosensitive, non-operative adjuvant treatment may be best. However, in other cases, multiple level vertebral surgery with instrumentation for fixation, may be indicated.
Pituitary surgery has also been progressively affected in a major way by technical advances. Thus, nearly all pituitary lesions can be approached transphenoidally. Where there is significant suprasellar extension, modified techniques can be used to produce an extended approach to accommodate this. Neuronavigation as well as endoscopic techniques are also helpful in this area. The most recent technical development is the installation of dedicated interventional MRI scanners within an otherwise normally equipped operating theatre suite. Non-magnetic surgical instruments have been devised to allow simple neurosurgical procedures, such as lesion biopsy or cyst drainage, under virtually direct MR imaging. If more sophisticated surgery requiring conventional steel neurosurgical instruments is to be done, this can be carried out beyond the five Gauss line, which is typically about 3 m from the scanner. Arrangements have been devised to transfer the anaesthetised patient, maintaining sterile conditions at the wound site. This enables, for example, three to four times, repeated examinations to be done per-operatively during complex neurosurgery, for example to control total lesion removal. Only a few installations have been made world wide, including the one at The National Hospital for Neurology & Neurosurgery.
946
72.4 Neurovascular Vascular neurosurgery deals with vascular abnormality of the brain and spine, typically following a subarachnoid haemorrhage, which is often the first sign that such an abnormality is present. As CT and MRI become even more widely available and utilised, the trend is for more and more incidental vascular abnormalities to be discovered. Trials are therefore in progress to define the optimal management of such vascular abnormalities as cerebral aneurysms, when discovered in an otherwise well patient. The treatment of cerebral and spinal vascular abnormalities has been revolutionised by endovascular techniques, such as occlusion of cerebral aneurysms by coils of platinum into the lumen of the aneurysm via an endovascular microcatheter (Fig. 72.4). Once the ISAT trial had shown a significant improvement in recovery following endovascular treatment of cerebral aneurysms compared to traditional open surgery, the trend in the UK has been dramatically towards endovascular techniques and away from craniotomy. In parallel with this trend, the technology of delivering endovascular coils continues to improve and is increasingly combined with modified techniques such as balloon “modelling” of the coils or stenting, allowing successful treatment of an ever larger range of cerebral aneurysms. Improvements in microcatheter technology and improving angiographic imaging are also driving the development of better endovascular techniques for treating cerebral and spinal arteriovenous malformations. There are also continued developments of bioactive coatings for endovascular components, for example to encourage endothelialisation. Apart from the technical developments, there are efforts using a multidisciplinary approach to utilise fluid dynamic mathematical modelling to predict flow and shear stress for a given vascular anatomy. This
Fig. 72.4 Endovascular aneurysm treatment. (a) Cerebral angiogram prior to endovascular treatment, (b) Partial occlusion. (c) Complete occlusion of aneurysm
D. G. T. Thomas and L. Watkins
may eventually allow better prediction of optimal intervention for vascular abnormalities, particularly arteriovenous malformations.
72.5 Functional Neurosurgery Many of the earlier operations of functional neurosurgery have become obsolescent. Thus, cervical cordotomy was a very valuable technique for controlling severe pain due to malignant disease, before more modern techniques of pain control were developed. There does remain a place for long-established lesional pain procedures, like trigeminal thermocoagulation for trigeminial neuralgia, as well as dorsal root entry zone coagulation for pain due to avulsion of the brachial plexus. There is a hope that the latter procedure will become redundant when neural repair techniques, based largely on stem cell technology, become effective. Spinal cord stimulation is a long-standing and relatively effective method in carefully selected cases, which have been screened by a multidisciplinary pain clinic assessment prior to implantation. There are also more central brain targets for DBS, including the sensory thalamus and the periventricular grey matter. As with lesional pain procedures, so with functional neurosurgery for movement disorders, lesional procedures such as thalamotomy or pallidotomy have largely been phased out. DBS has been successfully applied not only to improve tremor of Parkinson’s disease but also to lessen drug treatment related to dyskinesia as well as improving paucity of movement. The most common target currently is the subthalmic nucleus, which is also effective for dystonia (Fig. 72.5). To apply the DBS method requires a multidisciplinary team including not only neurosurgeons but also neurologists, neurophysiologists, neuropsychologists and the whole rehabilitation team. Close follow-up is
72
Neurosurgery: Current Trends and Recent Innovations
947
Fig. 72.5 Deep brain stimulation (DBS). (a) Axial T2 weighted planning MRI scan showing subthalamic Nucleus Target (red). (b) Coronal scan confirming target site. (c) Track of DBS electrode
necessary to keep the stimulation of optimum benefit. It is also necessary to surgically replace the batteries on a relatively urgent basis when they fail. It is therefore an expensive treatment with long term costs. However, there has been acceptance that in patients with appropriate indications, DBS represents the standard of care. New deep brain targets are being identified through human and primate studies for conditions ranging from other more rare movement disorders to obsessive compulsive states and affective psychosis. The quality of MRI has steadily improved and includes spectroscopy as well as quantitative neuroanatomical observations. These techniques have taken forward the selection process for patients who have intractable drug resistant epilepsy and in whom surgical treatment may be possible. Thus, for one of the common causes of epilepsy mesial temporal sclerosis,
the new MRI techniques have made it possible, generally, to find out whether there is a unilateral lesion underlying the epilepsy with preserved contralateral structures. This allows case selection for more favourable outcome, both in terms of epilepsy control as well as in avoidance of the complication of severe memory or language problems, which can occur if the remaining temporal lobe in not functional.
72.6 Spine One of the fastest-growing subspecialties within neurosurgery is that of the spinal surgeon. Spinal tumours have been discussed above. Increasingly complex spinal surgery involving instrumentation is being used as part of surgery for spinal tumours. An even larger
948
D. G. T. Thomas and L. Watkins
functional recovery after reimplantation of nerve roots. Such reimplantation is already undertaken, for example following brachial plexus avulsion injuries. If cell transplantation techniques demonstrate some benefit in that context, then further research and development is likely to continue to apply such techniques to the spinal cord itself.
72.7 The Future Fig. 72.6 Spinal Instrumentation. Lateral radiograph thoracolumbar showing multi level fusion with instrumentation
patient population is affected by degenerative spinal disease and the same techniques are applicable. The recent theme in spinal surgery as being an increasing use of metallic implants (Fig. 72.6), generally to fix spinal components relative to each other, reducing movement at an affected joint or disc. It remains controversial as to whether the clinical outcomes are better with instrumental fixation of spinal segments affected by degenerative disease, or with utilisation of artificial discs, which in theory restore movement at the affected level. Fixation of a degenerative segment of the spine reduces the ongoing process at that level and so is thought to reduce the chance of further osteophyte formation and possible recurrent neuronal impingement. However, fixation of one level of the spine puts an added mechanical stress on adjacent levels. It remains highly controversial as to whether this leads to adjacent level degeneration or whether the underlying process and would affect other discs anyway. Artificial discs allow continued movement at the operated level (at least in the short-term) and so may reduce the stresses on adjacent levels. Further clinical research will be necessary to define the relative merits of these different approaches to treatment. Another theme in spinal research is restoration of function after spinal cord injury. Understanding the cellular environment and the effect that this has on axonal elongation, apoptosis and scar formation is aimed at eventually increasing our ability to intervene to improve recovery. There are also very promising developments in the use of stem cells and other transplanted cells such as olfactory cells to encourage reconnection of axons following injury. It is likely that some of the earliest targets for such therapy will be at the interface between the peripheral and central nervous system, improving
The only certain prediction is that there will be further technical developments in neurosurgery. Robotics is likely to increasingly enter the field and is already being employed to replace conventional techniques in stereotactic neurosurgery and radiosurgery [8]. The place of interventional MRI remains to be defined. Thus, it may be that it is shown to be useful only for a limited number of indications and that the existing proven methods of neuronavigation, which are less expensive, remain in routine use. In functional neurosurgery, the indications for DBS are expanding and the necessary translational research is being undertaken to bring new target sites into clinical use, including restorative techniques for use in brain damage. Another promising approach to restorative neurosurgery is in neurotransplantation and the use of stem cell transplantation. It is likely that a basic understanding of brain tumour stem cell biology will lead to improved management of some of the highly malignant brain tumours, which currently cannot be treated effectively [9]. Another theme is likely to be the development of endoscopy of the CSF compartments within and around the brain and spine, allowing increasing use of minimally invasive techniques. Improved endoscopes are being developed to allow a stereoscopic image, and micromanipulators will enhance the ability of the neurosurgeon to perform increasingly complex interventions via an endoscope. In the future, neurosurgical research themes are likely to be driven by demography. The increasing age of the population in most developed countries means that we are likely to encounter a greater number of such conditions as normal pressure hydrocephalus, which typically affects the elderly, producing cognitive decline, decreasing mobility and problems with continence. Since it is a reversible condition, in many cases (with CSF shunt insertion), it will be important to identify those patients who would benefit from such
72
Neurosurgery: Current Trends and Recent Innovations
surgical intervention. Research therefore continues as to appropriate MRI sequences and CSF biomarkers to improve the diagnosis of the condition and to predict outcome. There is also some evidence to suggest that there is an overlap between patients with Alzheimer’s and hydrocephalus. Trials are continuing to explore this relationship and further basic research is needed to enhance our understanding of the pathophysiology.
References 1. Rhoton AL (2003) Operative techniques and instrumentation for neurosurgery. Neurosurgery 53: 907–934 2. Bernays RL, Imhof HG, Yonekawa Y (2003) Intraoperative imaging in neurosurgery MRI, CT and ultrasound. Springer, New York 3. Braun V, Dempf S, Tomczak R et al (2001) Multimodal cranial neuronavigation: direct integration of functional magnetic
949 resonance imaging and position emission tomography data. Neurosurgery 48:1178–1181 4. Paleologos TS, Wadley JP, Kitchen ND et al (2000) Clinical utility and cost-effectiveness of interactive image-guided craniotomy: clinical comparison between conventional and image-guided meningioma surgery. Neurosurgery 47: 40–47 5. Wadley J, Dorward N, Kitchen N et al (1999) Pre-operative planning and intra-operative guidance in modern neurosurgery: a review of 300 cases. Ann R Coll Surg Engl 81: 217–225 6. Thomas DGT (1993) Stereotactic and image directed surgery of brain tumours. Churchill Livingston, London 7. Thomas DGT, Graham DI (1995) Malignant brain tumours. Springer, London 8. Benabid AL, Hoffman D, Munari C et al (1995) Surgical robotics in minimally invasive techniques. In: Cohen AR, Haines SJ (eds) Neurosurgery. Williams and Wilkins, Baltimore, pp 85–97 9. Ma YH, Mentlein R, Knerlich F et al (2008) Expression of stem cell markers in human astrocytomas of different WHO grades. J Neurooncol 86:31–45
Molecular Techniques in Surgical Research
73
Athanassios Kotsinas, Michalis Liontos, Ioannis S. Pateras, and Vassilis G. Gorgoulis
Contents 73.1
General Introduction ............................................. 951
73.2
Single Molecule Analysis ....................................... 952
73.2.1 Introduction .............................................................. 952 73.2.2 In Situ Techniques .................................................... 952 73.2.3 Techniques for Molecular Analysis in Solution ....... 956 73.3
Whole Genome Analysis Techniques .................... 966
73.3.1 Introduction .............................................................. 966 73.3.2 DNA-Arrays ............................................................. 966 73.3.3 Comparative Genomic Hybridization Analysis ....... 968 73.4
Cell Cultures and Functional Assays.................... 969
73.4.1 73.4.2 73.4.3 73.4.4 73.4.5
Introduction .............................................................. Isolation of Cells and Establishment of Cell Lines .. Plasmid Transfections and Viral Infections.............. RNA Interference ..................................................... Drug Treatments .......................................................
73.5
Animal Models ........................................................ 972
969 969 970 971 971
73.5.1 Introduction .............................................................. 972 73.5.2 Brief Description ...................................................... 972 73.5.3 Methods of Gene Delivery ....................................... 972 References ........................................................................... 973
V. G. Gorgoulis () Department of Histology–Embryology, Molecular Carcinogenesis Group, Medical School, University of Athens, 75 Mikras Asias Street, Goudi, Athens 11527, Greece e-mail: [email protected]
Abstract The rapid technological improvements in molecular medicine and biology during the last decades enriched the armoury of research laboratories and offered new diagnostic tools to the clinician. New insights into the pathogenesis of the diseases and especially cancer, prognostic factors and new therapeutic approaches are the result of the implementation of these techniques. However, their growing complexity demands good knowledge regarding their use and the information they can provide to the clinician. The following chapter analyzes the available techniques for research and diagnostic use, emphasizing mainly on their rationale. Methodologies and available protocols are also provided, as well as their clinical applications.
73.1 General Introduction Behind the phenotypic characteristics, often exhibited by pathologic conditions, lay various molecular aberrations. They can be created by either exogenous factors such as toxic materials, irradiation or micro-organisms, or can be inherited. Since these aberrations are the primary cause for the development of various diseases, their detection is of great importance. As living cells follow the central dogma in biology, from DNA to RNA and proteins, in order to build up their internal structures and keep a functional homeostasis to survive, aberrations can occur either at the gene level (DNA) and/or its expression products [(m)RNA or protein]. Therefore, an arsenal of techniques is required to dissect any potential aberration at all these levels, respectively. These methods often follow the central dogma with regard the aim that they address to answer. For instance, some are employed to detect DNA mutations (e.g. nucleic acid sequencing technique), others examine
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_73, © Springer-Verlag Berlin Heidelberg 2010
951
952
RNA splicing aberrations [Northern blot analysis and real time-PCR (RT-PCR) analysis], while protein expression analysis requires distinct methods [immunohistochemistry (IHC) and Western blot analysis]. Grossly, these techniques can also be divided into those that analyze the target molecule with regard to its localization into the cell or tissue (collectively known as in situ techniques) and methods that require disintegration of the cell/tissue in order to isolate it (referred also as analysis of molecules in solution). Unfortunately, there is no technique that can provide answers at all these molecular levels concomitantly. For example, IHC can provide useful information about the pathologic sub-cellular translocation of a protein, but does not provide any clue about its molecular weight or accurate quantitative expressional aberrations, which usually is addressed by Western blot analysis. Therefore, frequently, more than one technique may be required to gain a larger and more accurate view about a malfunction taking place into the pathologic cells or tissue. At the research level, an analysis comprising multiple combinations of techniques provides accurate result through multiple validations, but in the case of routine clinical analysis, this approach is time consuming and cost inefficient. In conclusion, knowledge of the available techniques and specially the answers they can provide is crucial not only for accurate molecular aberration detection, aiming at diagnosis and prognosis, but also to aid in the design and/or monitoring of the appropriate therapeutic strategy.
73.2 Single Molecule Analysis
A. Kotsinas et al.
analysis, these methods can also be grouped under the following classification.
73.2.2 In Situ Techniques 73.2.2.1 Brief Description In situ techniques refer to a group of methods aiming at the detection of a gene or its products in the context of a tissue, cell or cellular organelle (e.g. chromosomes). A basic requirement is the intact preservation of the (internal) structure of these substrates. This is usually achieved by fixation in specific organic solvents, such as formaldehyde or alcohol solutions. If the substrate is stored frozen (below −20°C) or freshly obtained and must be used immediately, the above treatment suffices and the required detection method is applied. Alternatively, if the substrate must be archived for future use than additional steps, especially in the case of tissues, must be employed. These steps include dehydration in increasing concentration of ethanol, clearance in xylene and paraffin embedding. To be applied, all the in situ techniques require immobilization of the substrate(s) on microscopy slides. For tissues, this is achieved by obtaining thin sections (typically, 5 mm thick), while for cells or organelles, a spread of these substrates on the surface of the slides is performed. Variants of these techniques have also been developed for electron microscopy. In this case, specific fixatives (e.g. glutaraldehyde), embedding materials (e.g. methacryl resins), ultra-thin sectioning and specific detection methods (such as secondary antibodies conjugated with gold particles) must be applied. With in situ techniques, the following questions can be addressed: 1. At the DNA level:
73.2.1 Introduction From a historical point of view, the first developed techniques were dealing with the analysis of only one gene or one of its expression products (RNA or protein) per sample or patient under investigation. Progress made on this field allowed the simultaneous analysis of the same type of molecule in multiple samples or patients, but without deviating from the previous rule. According to specific principles employed in the
(a) Integrity of a gene (e.g. presence of deletion) (b) (Sub-) chromosomal localization (e.g. presence of translocation) (c) Copy numbers of a gene (e.g. amplification) 2. At the RNA or protein level: (a) Expression (b) In combination with subcellular or tissue “architecture” – internal localization (e.g. in the nucleus, extracellular matrix, nucleolus, mitochondria, etc.)
73
Molecular Techniques in Surgical Research
73.2.2.2 In Situ Detection of Nucleic Acids Fluorescent In Situ Hybridization (FISH) FISH is based on the detection of a segment of the genome using the appropriate nucleotide probes (Fig. 73.1). Probes are labelled with a fluorescent molecule, allowing for their detection with fluorescent microscopy. Both cells in culture and tissue specimens can be analyzed with FISH. It is important to note that while the majority of the cells analyzed in a tissue sample will be in mesophase – unless the tissue has a high proliferation index – the cultured cells can be specifically treated to increase the number of cells in metaphase in order to achieve maximum DNA concentration in chromosomes. Mitotic inhibitors like colchicine or nocodazole are commonly used in cell cultures. FISH is performed on microscopy slides. If a tissue sample is to be examined, the specimen is dewaxed and hydrated for further proceeding, while single cells should first be treated in a hypotonic solution and then fixed with a methanol–acetic acid mixture. Following this, the sample is chemically and physically treated in order to achieve optimal hybridization conditions. A fluorescent-conjugated probe is then applied on the slide and allowed to bind to the target sequence and the hybridization result is visualized under the fluorescent microscope. This constitutes the direct FISH technique. In indirect FISH, the probe is not directly
953
labelled with a fluorescent molecule, but linked with a hapten. A fluorochrome-labelled secondary antibody against the hapten is then used for the visualization of the hybridization product. This variant allows signal enhancement since it includes an amplification step, but may produce high background signal. FISH is widely used in breast cancer for the detection of Her2/neu amplification, which is a strong prognostic factor and determines therapy in patients with HER2 positive operable cancer [10, 13, 16].
Chromogenic In Situ Hybridization (CISH) Based on the same principle as FISH, CISH is differentiated in signal detection (Fig. 73.1). CISH is an indirect method and probes are not labelled with fluorescent molecules, but with a hapten. Peroxidase-linked secondary antibodies bind the hapten and the signal develops with the addition of a chromogenic peroxidase substrate, such as 3,3-diaminobenzidine tetrachloride (DAB). A common bright field microscope is used for the detection of the signal. Apart from the peroxidaseDAB technique, signal development may be achieved with a recently presented peroxidase-silver technique known as SISH (Silver enhanced in situ hybridization) [5]. Both these techniques are equally reliable with FISH in determining Her2/neu amplification and provide reduced cost alternatives [9, 11].
Tissue section immobilized on side
Chromosomes inside the nucleus
Fig. 73.1 Fluorescent in situ hybridization (FISH). Tissue sections (5 mm), cells or metaphase chromosomal spreads are laid onto a microscopy glass slide. Fluorescent (or chromogenic in the case of CISH) labelled probe(s) are hybridized for detection of specific gene(s) or chromosome(s)
Fluorescent-conjugated probe hybridized with the target sequence
Fluorescent Signal
954
In Situ Detection of Apoptotic Cells or TUNEL Assay In situ detection of apoptotic cells, also known as Terminal uridine deoxynucleotidyl transferase dUTP Nick end Labelling (in short TUNEL) assay, was originally described by Gavrieli and co-workers in 1992 [6]. The method relies on the labelling of the fragmented DNA that results from the apoptotic process in cells committed to programmed cell death. In this way, apoptotic cells can be revealed in situ (Fig. 73.2). Labelling is performed with the aid of the Terminal deoxynucleotidyl transferase (Tdt) enzyme and a labelled deoxynucleotidyl (dNTP). The originally employed dNTP is uracil, although other dNTPs have also been used successfully. The label can be a fluorescent molecule or a hapten, thus having a direct or an indirect labelling method of choice, respectively. In the labelling reaction, Tdt adds in a template-independent fashion the labelled nucleotide at the 3′-OH end of the fragmented genomic DNA. An important application of the TUNEL assay is usually in conjunction with determination of the proliferative index of the cells, assessed by IHC (see next section for the method) of the Ki-67 or PCNA factors, respectively. The ratio of the two indexes (proliferation vs. apoptosis) represents an estimation of the growth of normal or abnormal cells. Given that archival tissue is very popular in the pathology laboratories, evaluation of this ratio by in situ techniques is very useful in biopsies obtained from various pathologic conditions, such as malignant tissues.
A. Kotsinas et al.
a
Fragmented DNA as a result of the apoptotic process
Tdt adds labeled dNTPs at 3-OH of the fragmented genomic DNA
Conjugation with reporter molecule
Visualization of the fragmented DNA
b
73.2.2.3 In Situ Detection of Protein Expression
Immunohistochemistry IHC is one of the most widely used techniques, since it allows for the detection of proteins on paraffin-embedded sections mainly from surgically removed specimens. The technique is based on the localization of antigens in tissue sections using labelled antibodies through antigen–antibody interactions (Fig. 73.3). The detection is visualized either with enzyme-conjugated antibodies (mainly horseradish peroxidase or alkaline phosphatase) or radioactive elements-conjugated antibodies followed by autoradiography.
Fig. 73.2 In situ detection of apoptotic cell death by TUNEL assay. (a) Principles of the technique. The method relies on the labelling of free (3′-OH) ends of the fragmented chromosomal DNA. (b) Representative result showing (by arrows) in situ apoptotic cells in a colorectal carcinoma
More comprehensively, paraffin sections mounted on poly-l-lysine coated slides are dewaxed on a pre-heated plate and then rehydrated through sequential passage in solutions of decreasing ethanol concentration. Hereupon,
73
Molecular Techniques in Surgical Research
955
a
b Indirect IHC: two-step application of the conjugate Direct IHC: application of the conjugate in a single step Secondary Antibody Antigen A recognized by the Primary Antibody A
A
D
Nucleus
Primary Antibody
B
Nucleus
D
Cytoplasm
Cytoplasm
C
C
c
B
d
Fig. 73.3 Immunohistochemical analysis. Principles of the technique: (a) The direct method of immunohistochemical staining uses one labelled antibody, which binds directly to the target antigen being detected. (b) The indirect method of immunohistochemical staining uses one antibody against the antigen being probed for, and a second, labelled, antibody against the first.
Representative immunohistochemical results: (c) Detection of E2F-1 expression with indirect streptavidin-biotin staining in a colorectal carcinoma. (d) Immunofluorescent analysis of claspin expression. Note predominant nuclear localization of claspin (green staining) that coincides with DAPI staining (blue) of the cell nuclei in the U2OS human osteosarcoma cell line
sections are incubated with 3% hydrogen peroxide to quench the endogenous peroxidase activity. The antigen retrieval procedure that follows is usually conducted by heating either with in a steamer or in a microwave oven. The sections are then incubated with the corresponding primary antibody either at 4°C overnight or at room temperature for the designated amount of time. In classical protocols, the following steps include a sequential incubation with the secondary antibody at appropriate dilution and finally with a streptavidin-hyperoxidase complex. Currently though, these two steps have been replaced by a more simplified procedure, including the incubation with secondary antibodies “loaded” with amino-acid polymer “arms” that are linked to a streptavidin enzyme conjugate. 3, 3-diaminobenzidine
tetrahydrochloride (DAB) or alkaline phosphatase can be used for colour development and haematoxylin as counterstain [8]. IHC is of primary importance for neoplasia diagnosis and staging, while sometimes also confers to prognosis and treatment of the patient. IHC for example is used as routine examination for detection of steroid hormone receptors and along with FISH analysis for Her2/neu amplification in breast carcinomas [14, 18]. The importance of the method in guiding the clinician on decisions regarding patient’s treatment reflects the need for standardization and quality control by using internal – intralaboratory – and external – interlaboratory – controls [19]. Factors, such as fixation time, antigen retrieval techniques and staining methods may
956
A. Kotsinas et al.
differ among laboratories leading to interlaboratory variability [7]. Immunofluorescence (IF) A technique similar to IHC that takes advantage of a fluorescent dye-binding antibody for visualization instead of enzyme or radioactive substances – linked antibodies that are commonly used in IHC (Fig. 73.3). Imunofluorescence is characterized for its greater sensitivity than IHC and allows for the detection of smaller amounts of proteins. It also highlights better the subcellular localization of the examined protein (Fig. 73.3d). Immunofluorescence is currently the golden standard technique for the detection of antinuclear antibodies (ANA), Ro/SS-A autoantibodies and the deposition of immnoglobulins and complement in autoimmune disorders.
different tissues can be obtained from single classical paraffin-embedded tissues, respectively, and subsequently re-embedded in a larger paraffin block. Initial cores are obtained by “drilling” with a hollow needle in order to remove tissue cores as small as 0.6 mm in diameter from regions of interest in paraffin-embedded tissues such as clinical biopsies or tumour samples. These tissue cores are then inserted in a recipient paraffin block in a precisely spaced, array pattern. Subsequently, sections are obtained with a microtome and laid on microscopy slides for further histological analysis. Sections can be analyzed by any of the previously described in situ techniques, but with the advantage of cost and time efficiency, as only one slide is required to examine a molecule in multiple tissues.
73.2.3 Techniques for Molecular Analysis in Solution
73.2.2.4 Tissue Microarrays (TMAs)
73.2.3.1 Introduction
TMA analysis is an in situ technique that allows concomitant analysis of multiple separate tissues that have been embedded in an array format in a single paraffin block (Fig. 73.4). Specifically, cores from up to 1,000
These techniques include a group of methods that analyze a gene or its expression products that have been extracted intact from the surrounding tissue or cell context. Therefore, these techniques require an initial homogenization in hypotonic buffers to disrupt the internal “architecture” of the tissue or cells. Subsequently, the molecule(s) of interest are specifically isolated by various methods or combination of methods, such as organic solvent elution, chromatography, centrifugation or others. In this way, enrichment and a homogenous pool of the desired type of molecule(s) is obtained. Finally, the concentration and integrity of the obtained pool is determined by the appropriate methodology [UV spectrophotometry or fluorometry for nucleic acid concentration determination and specific chemistry (e.g. Bradford reaction) coupled to spectrophotometry in the visible range for protein determination]. Questions that cannot be answered by the in situ techniques, but can be addressed only with the following techniques are:
Hollow needle
Removal of tissue cores from regions of interest
Insertion of the removed tissues in a recipient paraffin block
Multiple core tissues embedded in an array format in a single paraffin block
Fig. 73.4 Principles of TMA construction. Cores (0.6–2.0 mm in diameter) are obtained from FFPE tumour tissue donor blocks and arrayed embedded into a new recipient block. Sections (5 mm) from the recipient block are obtained and laid onto a microscopy glass slide to be processed by immunohistochemistry
1. Presence of qualitative aberrations (e.g. point mutations, aberrant splicing, epigenetic modifications, etc.) 2. Presence of quantitative aberrations (e.g. absolute or relative: fold of amplification, mRNA or protein expression levels)
73
Molecular Techniques in Surgical Research
957
Fig. 73.5 Schematic representation of the Southern or Northern blot analysis. Fragmented DNA (by restriction enzyme digestion) or denatured RNA is size separated by electrophoresis (1)
and transferred by capillary blotting (2) on a solid support (nitrocellulose) (3). Following hybridization with a gene specific probe (4), results are usually visualized by autoradiography (5)
73.2.3.2 Southern Blot
detecting even a single DNA sequence in any given DNA sample.
Brief Description Southern blot is a molecular technique used to detect the presence of a DNA sequence in a DNA sample. It was invented by Edwin Southern, a British biologist from the University of Edinburgh, in 1975. This method was later named as Southern blot as a credit to him [17]. In principle, the DNA sample is digested with a restriction enzyme and the obtained fragments are separated through electrophoresis on an agarose gel. Subsequently, the DNA is denatured and transferred to a nitrocellulose membrane where it is incubated with a labelled DNA or RNA probe complementary for the specific sequence under investigation. Finally, detection is performed according to the type of label on the probe. Southern blot is very specific and sensitive technique capable of
Experimental Procedure of Southern Blot The basic steps in performing this method are as follows (Fig. 73.5) [15]: 1. DNA extraction The DNA sample is extracted from a biological material. Usually, fresh or frozen tissue or cells are employed, but use of archival material that has been fixed in certain solvents can also be used. 2. Digestion of DNA sample with restriction enzyme The DNA sample is digested with a restriction endonuclease at the appropriate temperature and in the
958
corresponding enzyme buffer solution for several hours. The restriction enzyme recognizes a specific nucleotide sequence and cuts its strand, producing smaller fragments of DNA. 3. Electrophoresis of digested DNA sample The digested DNA is electrophoresed in agarose gel, which separates the DNA fragments by size. The smaller fragments migrate faster than the larger ones. 4. Denaturation of digested DNA and transfer (blotting) on a nitrocellulose membrane The DNA fragments should be transferred to a chemically inert solid support, such as nitrocellulose membrane. This process is also called blotting. A major problem is that double-stranded DNA cannot be retained by nitrocellulose. Therefore, the agarose gel is first placed into an alkaline solution (usually containing sodium hydroxide) to denature the fragmented DNA. Secondly, a sheet of nitrocellulose paper is placed on top of the agarose gel. It needs great care to adhere firmly the sheet of nitrocellulose on the agarose gel. This is succeeded by exerting pressure on the gel by putting a stack of paper towels on the top of the sheet and the gel. Firm contact and the proper transfer buffer ensure that the capillary transfer of denatured DNA from the gel to the nitrocellulose sheet is successful. Alternatively, electrobloting can be used, during which transfer of the DNA fragments on the nitrocellulose is performed by applying a voltage difference. The time of transfer depends on the size of the DNA fragments, the percentage of agarose in the gel and the size of the gel. Larger fragments of DNA are transferred slower than smaller fragments. An overnight transfer is often sufficient. 5. Probe preparation Probes can be obtained from several sources and can be DNA or RNA. Currently, polymerase chain reaction (PCR) products are the most common probes. Alternative sources are DNA inserts cloned into various vectors, such as plasmids, or chemically synthesized oligonucleotides. Probes can be labelled by various techniques, such as nick translation or random hexanucleotide labelling assays, or can be directly obtained labelled during their synthesis in the case of PCR products, oligonucleotides or RNA probe preparation by in vitro transcription.
A. Kotsinas et al.
6. Hybridization of radioactive probe to target DNA sequence The nitrocellulose sheet is placed in a sealed polyethene bag, which contains the appropriate probe diluted in hybridization solution. Incubation time ranges from 1 to 48 h. After incubation, the sheet is washed for several times in solutions of decreasing salt concentration. This procedure increases the hybridization stringency in order to remove unbound probe. To ensure that the probe binds specifically to the sequence of interest, the usage of non-specific DNA, such as from salmon testes, and/or a detergent (like sodium dodecyl sulfate, SDS) reduces the non-specific binding. 7. Signal detection For radioactive labelled probes, the nitrocellulose membrane is dried and placed into a film developing cassette. Over the nitrocellulose sheet, an X-ray film is placed. The time of exposure depends on the amount of the DNA sequence and the activity of the radioactive probe (it may need several days). After the exposure time, the film is developed and fixed. The DNA of interest is depicted as a dark band on the film. Alternatively, a fluorescently labelled probe can be detected on specialized scanner. Southern blot can detect at least 0.1 pg of the target DNA.
Applications of Southern Blot Southern blot has several applications in clinical medicine and biology. It has been used in the diagnosis of particular inherited diseases through the detection of the culprit mutations. If a mutation exists at a restriction site (nucleotide sequence recognized by a specific restricted enzyme), it is detected through the production of a restriction fragment length polymorphism (RFLP, a pattern of fragmented DNA sequences arising after the treatment with a specific restriction enzyme) specific for the particular mutation. Another application of Southern blot is prenatal screening through the investigation of certain genetic defects. Southern blot is applied in genetic fingerprinting, which is used in forensic medicine and in evolutionary biology. Genetic fingerprinting exploits the fact that each human has a unique pattern of highly variable repeating sequences. We should keep in mind that Southern blot is a time-consuming experiment, requiring
73
Molecular Techniques in Surgical Research
about 7–10 days. In case radioactive material is used, specific safety regulations need to be applied.
73.2.3.3 Northern Blot Brief Description Northern blot is used in order to study RNA expression. It has been named as Northern blot because of its similarities in the procedure with Southern blot, with the key difference that in Southern blot, DNA is studied whereas in Northern blot, the examined substrate is RNA. This technique was employed in 1977 by James Alwine, James Kemp and George Stark at the University of Stanford [2].
Experimental Procedure of Northern Blot The basic steps in performing this method are as follows (Fig. 73.5) [15]: 1. RNA extraction The first step in Northern blot is to obtain high-quality RNA from the biological material under investigation. RNA isolation, handling and storage are crucial in order to have intact RNA.
959
results in the formation of covalent bonds between the nitrogenous bases in RNA and the amine groups of the membrane surface. In the second method, the membrane is baked at 80°C. Baking the membrane leads to the evaporation of the water where the RNA is solubilized. This strengthens the interaction between the hydrophobic nucleotide bases of RNA and the hydrophobic aromatic groups of the membrane. Immobilization is required to retain the RNA to the membrane. 4. Blocking and hybridization After the immobilization of the RNA on the membrane, a necessary step before hybridization is blocking (prehybridization). Blocking results in the reduction of non-specific signal. A radioactive or non-isotopic probe complementary to the target RNA is used in the hybridization process. The probe can be RNA or cDNA (complementary DNA). Unhybridized or non-homologous hybridized probe is removed by several washings in solutions of decreasing salt concentration, in similar way to Southern blot analysis. These steps minimize background signal. 5. Detection The nature of the probe determines the method of detection and the procedure(s) are similar to that followed in the Southern blot technique.
2. RNA gel electrophoresis RNA is electrophoresed in a denaturing agarose gel where the transcripts are separated by size. RNA tends to fold upon itself and form stable secondary structures, which do not migrate by size when the gel is non-denaturing. Formaldehyde is commonly used as a denaturing agent. 3. Transfer (blot) to nylon membrane and immobilization Once RNA is separated through the denaturating gel, it is transferred to a nylon membrane followed by the procedure of immobilization. RNA is negatively charged, so the membrane should be positively charged. Various membranes carrying chemical residues and composed by alternative materials are available. Optimal transfer is mediated by the use of passive alkaline elution. Pressure should be applied evenly to the membrane so that proper capillary migration of RNA is achieved. For RNA immobilization, two procedures are usually followed. In the first procedure, the membrane is exposed to ultraviolet (UV) radiation at short wavelength. Exposure to UV
Applications of Northern Blot The introduction of RT-PCR in clinical medicine and biology wanes the usage of Northern blot. However, Northern blot is still the gold-standard method for the study of gene expression in molecular biology research. Moreover, Northern blot is used in the detection of the size of mRNA transcripts, to study the half-life of RNA and to investigate RNA degradation. Northern blot is slightly less sensitive than RT-PCR. It is also a timeconsuming technique and it needs great care when using radioactive probes.
73.2.3.4 Western Blot Brief Description Western blot is the molecular technique used to detect the expression of a specific protein in a given
960
A. Kotsinas et al.
sample. This method gives information for the size of the target protein and its level of expression in the studied biological material. Western blot was developed in the laboratory of George Stark at the University of Stanford. It was named Western blot by Neal Burnett due to the similarity with Southern blot [3]. Experimental Procedure of Western Blot The basic steps in performing this method are as follows (Fig. 73.6) [15]: 1. Protein extraction The first step of Western blot analysis is the isolation of proteins from the biological sample (usually fresh or frozen tissue or cells). Proper handling and careful preservation of proteins is crucial in order to remain intact. The ordinary protocol for protein extraction isolates proteins from all the compartments of the cell. For research purposes, investigators may need to examine the nuclear or cytoplasmic presence of a protein. This is achieved by a technique called cellular fractionation where the researcher can obtain cytoplasmic and nuclear extracts separately.
a
Cdc6
2. SDS-PAGE electrophoresis Proteins are separated in a polyacrylamide gel (PAGE) containing SDS. SDS is a strong reducing agent that breaks the disulfide bonds (–S–S– to –SH and –SH). This results in maintaining the proteins in a denaturing state and therefore capable of moving through the pores of the polyacrylamide gel. Moreover, sampled proteins are covered by SDS, which is negatively charged. The higher the molecular weight of the protein, the larger the amount of SDS that wraps the protein. Altogether, the use of SDS makes feasible the separation of the proteins by their molecular weight and not by their own electric charge or their combination. The concentration of the acrylamide determines the resolution of the proteins; as it increases, the resolution is higher. The electrophoresis described above refers to one-dimensional (1D) electrophoresis. For optimal resolution, proteins can be separated by twodimensional electrophoresis (2D). In the first dimension, the proteins are separated by their own charge (isoelectric point: pH at which they possess neutral net charge), and in the second dimension, they are separated by their molecular weight.
b Fig. 73.6 Western blot analysis. (a) Schematic representation of the method. Electrophoretic separation of proteins on an SDS polyacrylamide gel is followed by blotting of the proteins on a solid support to produce an exact replica of the size (kDa) discriminated proteins. The antigen of interest is detected with a specific antibody employing either a direct or an indirect method. (b) Representative result of a Western blot analysis showing cdc6 protein overexpression in various human tumour cell lines
73
Molecular Techniques in Surgical Research
3. Transfer After electrophoresis, proteins are transferred to a membrane. Usually, the membrane is nitrocellulose or PVDF. The membrane is settled carefully on top of the gel and a pack of papers are placed on top of that. Pressure is exerted evenly to maximize the contact between the gel and the membrane. The whole stack is placed in a proper buffer solution (transfer buffer) and transfer (blotting) is commonly performed under a stable electric field. The proteins, due to their negative charge from the bound SDS, move towards the positive electrode and bind to the attached membrane. Thus, membranes retain the same pattern of protein separation that has been achieved in the gel during electrophoresis. The efficiency of protein transfer may be checked by staining the membrane with Coomassie Blue or Ponceau S dyes.
961
is both specific for the target protein and it is conjugated with the detectable molecule. One-step probe has the advantage of being time-sparing. 6. Detection The choice of the method for detection depends on the nature of the conjugation on the probe. The most popular method is chemiluminescence. The membrane is incubated with a colourless substrate, which is converted to a light (photon)-emitting product by the reporter enzyme that is attached to the secondary antibody. The signal is detected by autoradiography and the target protein is depicted as a dark band. Alternatively, the colourless substrate is converted to an insoluble colour product that is deposited and stains the membrane at the position corresponding to the antigen detected.
4. Blocking Blocking is of great importance in order to prevent and minimize non-specific signal. The membrane has a great ability to absorb proteins. Before applying the specific antibody against the target protein, blocking of non-specific binding sites is achieved by incubating the membrane with a protein solution, usually bovine serum albumin (BSA), gelatin or non-fat dry milk. 5. Incubation with primary and secondary antibody (two-step) The membrane is incubated with the primary antibody (primary probe) recognizing the target protein. The time and the temperature of incubation are inversely related (the longer the time, the lower the temperature). Usually, overnight incubation at 4°C leads to specific binding and low “background” signal. However, the conditions for each pair “antibody-target protein” are unique. After incubation with the first antibody, the membrane is washed several times to remove its unbound fraction. Then, the second antibody (secondary probe) – specific for the primary – is applied. The secondary antibody is usually conjugated with biotin or a reporter enzyme (alkaline phosphatase or horseradish peroxidase). The membrane is usually incubated with the secondary antibody for 30 min at 37°C. Excessive incubation maximizes the non-specific signal. Another washing step follows to remove unbound antibody. The process described above refers to two-step probing. Two-step process is more flexible and sensitive. However, there is the one-step probe where the membrane is incubated with only one antibody, which
Applications of Western Blot Western blot has several applications in clinical medicine and in experimental biology. It is used in the diagnosis of HIV infection, Lyme disease and spongiform encephalopathy. In medical and biological research, Western blot is used for the detection of the molecular weight of a new protein. It is also applied for the presence and the quantification of a known protein in cancer biology.
73.2.3.5 Polymerase Chain Reaction Analysis Brief Description PCR was invented in 1989 by Kary Mullis. It is a technique that can produce multiple copies of (amplify) a specific short DNA segment from extremely low quantities of starting genetic material. In principle, it relies on the semi-conservative nature of the doubling of the DNA helix, but is performed in a test tube. It is a fast, sensitive and specific method that can produce one billion copies starting from just one DNA segment of interest, even in the presence of high excess of nonrelated genetic material [15].
Experimental Procedure of PCR The technique relies on the numerous repetition of a three-step reaction, also known as cycle (Fig. 73.7).
962
A. Kotsinas et al. Target DNA
a
Step 1. Heat denaturation; Step 2. Primer annealing Primers Step 3 Cycle 1
Amplification
Repetition of Step 1,2,3
Cycle 2
Cycle 3
After 30 Cycles the final product is already in excess (≥1 x 109 copies)
1
2
3
4
5
b
219bp
Fig. 73.7 Polymerase chain reaction (PCR). (a) Schematic representation of the principles of the PCR analysis. Multiple repetitions of a reaction called cycle, consisting of (1) denaturation of the DNA strands, (2) annealing of short oligonucleotides (primers) adjacent to the region of interest, and (3) elongation of the initiated strands by Taq polymerase, results in an exponential accumulation (i.e. amplification) of the desired fragment of DNA from very small quantities of initial genetic substrate. In turn,
this can be manipulated with many other molecular techniques (e.g. sequencing). (b) Representative result from a PCR analysis, after electrophoresis in an agarose gel, showing amplification of a fragment (219 bp) from the HPV43 genome present in a laryngeal carcinoma (Lane 1: 50 bp ladder, Lane 2: Positive control for the presence of HPV43, Lane 3: Laryngeal carcinoma positive for HPV43, Lane 4: Laryngeal carcinoma negative for HPV43, Lane 5: Negative control free of HPV infection)
In the first step of each cycle, the DNA sample is heat denatured at high temperatures (usually 93–96°C). Subsequently, the reaction is heat adjusted to a lower
temperature in order to allow two short, single-stranded oligonucleotides (usually in the range of 18–30 nucleotides) to anneal at the corresponding sense and
73
Molecular Techniques in Surgical Research
antisense sequences flanking the desired segment for amplification. Each oligonucleotide, also known as primer, anneals (hybridizes) to one of the complementary strands, thus initiating a primer extension reaction. Finally, the reaction is heat adjusted to a temperature required by a thermostable DNA polymerase to elongate the newly initiated DNA strands. The necessary free nucleotides to elongate (polymerize) the newly initiated strands, the polymerase co-enzyme(s) (usually divalent Mg+ cations) and the ionic strength (usually monovalent Na+ or (NH4)+ cations) are supplied in the reaction buffer, which also provides the pH of the reaction. As each cycle is repeated in an automated fashion on a microprocessor-controlled thermal cycler, the product(s) of each cycle represents the substrate for the next one. This process allows for an exponential accumulation of the desired product, until saturation, due to depletion of reaction components. Therefore, in just 30 cycles, with a starting number of one copy of the desired segment, the final product is already in excess of one billion copies, as previously mentioned (Fig. 73.7a). In order to achieve specificity as well as sensitivity, several rules must be followed in designing the two types of oligonucleotides (one to anneal at the sense strand and the other on the antisense one): 1. They must not present self-complementarity 2. They must not create dimmers 3. They must have a balanced nucleotide composition (40–60% GC content) 4. They must present stable and similar thermodynamic behaviour in order to achieve a common or very similar annealing temperature 5. They must not contain palindrom sequences 6. They must not present non-specific annealing sites within the same gene or others, especially in case that non-specific products may be created Design of primers is achieved with the aid of computerassisted software (e.g. Oligo), where all these parameters are taken into consideration. In order to select the oligo-nucleotide sequence of the primers, the nucleotide sequences flanking the region for amplification, must be previously known. Automation of this technique relies on the use of thermostable DNA polymerases, cloned from specific micro-organisms, which can sustain their function for prolonged periods of time in the presence of high temperatures. Currently, many such enzymes are available for PCR amplification reactions, with Taq polymerase being the most widely used one.
963
The initial substrate for a PCR reaction can be genomic DNA (prokaryotic or eukaryotic), plasmids or any kind of genetic vectors, RNA (provided that is reverse transcribed into the complementary DNA), mitochondrial DNA, viral DNA. Sources from which the above material can be obtained include solid tissues (fixed or fresh), cells (fixed or fresh), body fluids containing cells (blood, saliva, etc.) or excretions (urine, faeces) from various organisms (animal, plant or micro-organisms) [12].
Applications of PCR In its initial design, the PCR reaction represents a qualitative method that can denote the presence or absence of the desired sequence. Due to its simplicity, automated nature and great plasticity, the method has acquired a wide range of variations that in turn cover many applications. Some of the most prominent applications are: 1. Amplification of DNA segments at levels suitable for various molecular analyses or manipulations such as, point mutation detection in genes, nucleotide sequence analysis, cloning in various vectors (plasmids, viruses), mutagenesis, etc. 2. Reverse-Transcription PCR for analysis of RNA expression. RNA is not a suitable substrate for thermostable polymerases. Nevertheless, if a reverse transcription reaction precedes, leading to creation of cDNA, then analysis of RNA expression is feasible. 3. Multiplex PCR, which represents co-amplification of two or more independent segments of DNA in one (tube) reaction. It is often used in the analysis of polymorphisms (e.g. microsatellites) in order to detect allelic imbalance occurring in the human genome in neoplastic conditions. 4. Cycle-sequencing. This technique represents a combination of the PCR amplification method of a DNA segment with nucleotide sequence determination of this segment (see next section). 5. Quantitative (RT-) PCR. There is a vast literature concerning the methodologies employed to obtain quantitative data through a PCR reaction. The most popular ones involve the co-amplification of a reference gene along with the target one. Currently, the most accurate and widely employed version of quantitative PCR relies on the use of the so-called Real Time (RT-) PCR during which accumulation
964
of the products is continuously monitored in the course of reaction, thus leading to accurate estimations of the initial copies of the target gene of interest. Quantitative (RT-) PCR is of great importance in determining viral (HIV, HPV, etc.) load, gene amplifications (c-erb2-Her2/neu, etc.) or deletions (p53, pRb, etc.), over- or under-expression of gene transcripts and other quantitative genetic abnormalities [4]. All these variations have proven invaluable tools in dissecting many genetic aberrations that in turn are related to a wide range of pathological conditions.
73.2.3.6 Nucleic Acid Sequence Analysis (Sequencing) Brief Description Nucleotide sequence analysis of DNA refers to the biochemical determination of the position order of the nucleotides adenine, thymine, cytosine and guanine along a DNA segment. Given that the DNA sequences carry the genetically heritable information that is characteristic of each species, deciphering this information is of paramount importance in the research of normal cell(s) function as well as of pathologic ones. Despite numerous chemical analyses developed for assessing the DNA sequence, the most popular and wide spread method is that developed by Frederick Sanger and co-workers, also known as chain-terminator (Fig. 73.8) [15].
Experimental Procedure of Sequencing The key feature of the chain-terminator method is the use of dideoxynucleotides triphosphates (ddNTPs) as DNA chain terminators. These modified deoxynucleotides lack the 3′-OH group required to establish a phosphodiester bond between two consecutive nucleotides, thus leading to termination of elongation of the DNA strand undergoing synthesis. Specifically, the Sanger method requires: 1. A single stranded DNA segment, the sequence of which is to be determined 2. A short, single-stranded oligonucleotide, complementary to the DNA strand, to function as a primer in the polymerization reaction
A. Kotsinas et al.
3. All four deoxynucleotides triphosphates (dNTPs) for elongation of the primer extension reaction 4. Dideoxynucleotides triphosphates (ddNTPs) as DNA chain terminators 5. DNA polymerase to elongate the initiated primer extension reaction The substrate (DNA) is divided in four similar reactions where the primer is annealed to the 3′-end of the DNA segment. All four reactions contain equimolar concentrations of all four deoxynucleotides triphosphates (dATP, dCTP, dGTP, dTTP). Each tube is supplemented with only one of the four dideoxynucleotides triphosphates (ddATP, ddCTP, ddGTP, ddTTP) at lower concentration to allow sufficient elongation to proceed. Thus, primer extension in each reaction will lead to production of nascent strands that differ in size due to termination of elongation at the corresponding ddNTP, respectively. Either the primer or one of the dNTPs is radioactively or fluorescently labelled. After chain termination, the products are denatured and separated by electrophoresis in high-resolution gels. These are usually urea denaturing polyacrylamide slab gels that allow separation by size, at a level of one nucleotide difference. All four reactions are run in four consecutive lanes. In case of radioactive labelling, the results are visualized by autoradiography, while fluorescent labelling is employed in automated procedures where visualization is performed by laser excitation and measurement of the emitted signal from the fluorochrome. The increasing in size fragments are counted from the low length (terminated closer to the primer) fragments running at the bottom of the gel to the larger ones expanding to the wells. “Reading” of the gel requires concomitant evaluation of the relative mobility of all nested fragments across all four lanes based on the one nucleotide difference in size along all four reactions.
Experimental Procedure of Automated Sequencing Recent progress in modern technology has allowed the full automation of this procedure. This has been imposed by large-scale international efforts, such as the Human Genome Project, involved in decoding the genetic information of the genome from various species. Currently, the latest version of the Sanger chaintermination method developed specifically for high
73
Molecular Techniques in Surgical Research
5’
965 3’
a
A T G C T T C G G C A A G A C T C A A A A A A T
Primer Template
Labeled primer Primer Template
Labeled dNTP Primer Template
Labeled terminator (ddNTP)
b
A G
C
A
T
c
Fig. 73.8 Determination of nucleic acid sequence. (a) DNA fragments can be labelled by using a radioactive or fluorescent tag on the primer, in the new DNA strand with a labelled dNTP, or with a labelled ddNTP. (b) Sequence ladder obtained by
radioactive sequencing compared to ddNTPs fluorescent labelling. (c) Representative result from an automated sequence analysis of exon 5 of the mouse p53 oncosuppressor gene
throughput and sensitivity in automated procedures is the dye-terminator method coupled with cyclesequencing. In principle, the reaction is identical with that previously described, except for the following improvements:
a different fluorochrome. In this way, there is no need to split the reaction into four separate ones, and also, only one electrophoresis lane is required for separation of the products. Thus, the differentially ddNTP labelled nested fragments are concomitantly size separated and discriminated with an optical system that recognizes the terminator nucleotide according to its attached fluorochrome.
1. The term dye-terminators refers to the use of labelled ddNTPs. Specifically, each ddNTP is labelled with
966
2. To increase the sensitivity of the reaction, especially in cases where the amount of the substrate genetic material is low, the primer extension sequencing reaction is continuously repeated in the form of cycles as per the PCR reaction. This combination of the PCR principle of amplification with the sequencing reaction is known as cycle-sequencing and apart from its increased sensitivity, it is also more economical.
A. Kotsinas et al.
mRNA expression status from multiple genes of a single patient. Recent advances in technology have produced methods that allow the simultaneous deter mination of the genomic profile of one patient in one single analysis. According to their principles, these techniques are classified as follows.
73.3.2 DNA-Arrays Applications of DNA Sequencing 73.3.2.1 Brief Description Apart from deciphering the genome of various organisms, sequencing of DNA has proven valuable in determining the organization of genes, detection of aberrations that are responsible for inherited genomic disorders or neoplastic development. These aberrations include nucleotide point mutations, substitutions and deletions, respectively that in turn affect transcription and/or translation of the genetic code in the cell. Another important field where DNA sequencing has been a vital tool is in the determination of genetic polymorphisms (naturally occurring differences present in the genome). Apart from revealing the presence of these polymorphisms, the use of these polymorphisms per se in determination of genomic instability involved in various pathologic conditions (mainly neoplasms) has also been facilitated by this automated sequencing technology [1].
DNA-arrays (also known as DNA-microarrays or DNAchips) are collections of DNA fragments, cDNAs or oligonucleotides, which are spotted on a glass or other solid face by covalent attachment. Arrays can be classified as macroarrays or microarrays according to the size of the spot. The arrays can be purchased commercially or produced in-house. The affixed DNA is called probe. Importantly, each probe is homologous to or specific for only one gene. The probes are hybridized to labelled DNA or RNA (named as target) extracted from the test biological material (Fig. 73.9). The most frequently employed labelling consists of fluorochrome dyes, which emit different colours. Detection is based on a laser scanner, which reads the colour of each spot and sophisticated software, which analyzes the obtained data. DNA arrays is a powerful technique in molecular biology for the analysis of the expression or the presence of thousands of genes simultaneously.
73.3 Whole Genome Analysis Techniques 73.3.2.2 Experimental Procedure of DNA-Arrays
73.3.1 Introduction The techniques described up to now rely conceptually on the analysis of one gene or its expression product per investigated sample or patient, respectively. What about investigating multiple different genes or their corresponding (m)RNAs from just one sample or patient? In other words, is it possible to obtain a specific molecular profile (DNA or RNA) of a patient in one round analysis? With the previously described techniques, multiple rounds of a specific method (e.g. Northern blot or RT-PCR) would be required to obtain, for example, the
The basic steps in performing this method are as follows (Fig. 73.9): 1. Nucleic acid extraction Genomic DNA or RNA is isolated from the test and the reference material. In the case of RNA, a reverse transcription reaction is employed in order to transcribe it into cDNA. 2. (c) DNA labelling The test and the normal (c)DNAs are differentially labelled with fluorescent dyes (that emit at different wavelength, e.g. green and red fluorescent signal, respectively).
73
Molecular Techniques in Surgical Research
967
Genomic DNA or cDNA labeled with fluorescent dyes Reference Test
Experimental
Laser excitation at dye-specific frequency Hybridization to microarray
DNA gain “up-regulation” DNA loss “down-regulation” Quantification of fluorescent signal by specific software
Laser Emission
+
=
Fig. 73.9 cDNA microarray hybridization. RNA is extracted from a tissue with a pathologic condition and a normal-reference tissue. After reverse transcription, the corresponding cDNAs are differentially fluorescently labelled. Hybridization is performed
with oligonucleotides corresponding to specific genes, spotted in an array format. The hybridized arrays are analyzed with the aid of computers and specific bioinformatics software
3. Hybridization to normal human metaphase spreads
DNA probes, the biological material, which is tested, the chip fabrication and the detection and analysis system. Two variants of DNA-arrays are spotted arrays and oligonucleotide arrays in terms of the probe used. Spotted arrays consist of oligonucleotides or PCR products or cDNA probes, which are spotted on a glass solid surface. This type of arrays can be produced in-house. In oligonucleotide arrays, the probes are short sequences, which hybridize with part of the target sequence. Although oligonucleotides are used in spotted arrays, the term oligonucleotide array refers to a particular method of construction of this type of array. Sequences may be short (25-mer) or long (60-mer). Longer sequences are more sequence specific. Another categorization of DNA-arrays is based on the detection system. The one-colour system utilizes one fluorescent dye. It is used for the estimation of the
Labelled test and control DNA are simultaneously hybridized to the spotted probes on the array. 4. Detection of hybridized DNAs DNA gains or losses appear as green or red fluorescent signal, respectively. Aberrations of DNA in the test genomic material are determined through the differences found in fluorescence ratio of green/red between the test DNA and the reference DNA. Epifluorescence microscopy is used to detect green and red signal. The quantification of fluorescent signal is analyzed by specific software. 73.3.2.3 Variations of DNA-Arrays There are several combinations in the designing of an array system. They are based on the collection of the
968
A. Kotsinas et al.
absolute levels of a gene expression; therefore, the results are compared with a reference probe. The twocolour array system employs cDNA pools prepared from two different samples. The target material from each sample is labelled with different fluorescent dye (i.e. emitting at different wavelength). The two differentially labelled cDNAs are mixed and hybridized to a single array. The intensity of each colour is analyzed and the researcher can obtain data for the relative expression of one or more genes between the two samples.
73.3.2.4 Applications of DNA-Arrays DNA-array analysis is used for obtaining multiple gene expression results. Many studies have been published using array technology for the expression status of several genes in cancer [22]. This has a great impact on oncology since it can be used for the classification of a tumour and to evaluate the response to the therapy and the prognosis of a patient. DNA-arrays can also be applied for the detection of a gene in a sample and also for the identification of global single nucleotide polymorphisms (SNPs).
73.3.3 Comparative Genomic Hybridization Analysis 73.3.3.1 Brief Description Comparative genomic hybridization (CGH) is a molecular technique for the detection of DNA copy number alterations between a given subject DNA and a normal reference DNA (Fig. 73.10). The observed DNA aberrations are classified as gains or losses. The test and control DNAs are differentially labelled with a fluorescent dye, respectively, and subsequently, simultaneously hybridized to normal human metaphase chromosomes. Final detection is performed using epifluorescence microscopy. Fluoroscence is the property of a molecule to emit light when it is excited by incident light. The fluorescent dyes used for labelling possess this property.
73.3.3.2 Experimental Procedure of CGH Analysis Genomic DNA is isolated from the test and the reference material. Usually, the test DNA is tumour genomic DNA. The reference DNA is normal genomic DNA.
DNA-labeling by nick translation of reference material
DNA-labeling by nick translation of test material
Mix and add Cot1
Fig. 73.10 Comparative genomic hybridization. DNA is extracted from a tissue biopsy with a pathologic condition and mixed with DNA from a normal-reference tissue. Hybridization is performed in the presence of Cot1 DNA onto normal metaphase chromosomal spreads. Chromosomal gains or losses are detected due to the differential fluorescent labelling of the extracted DNAs
Denaturation
Denaturation
Normal chromosome metaphase spreads
Hybridization
Detection of hybridized DNA Green : DNA gain Red: DNA loss
73
Molecular Techniques in Surgical Research
1. DNA labelling by nick translation The test and the normal DNA are labelled with different fluorescent dyes (e.g. green and red fluorescent signal, respectively) by nick translation. Nick translation is a technique in molecular biology in which some of the nucleotides in the examined DNA sequence are replaced with labelled analogues. In the standard protocol, tumour DNA is labelled with biotin and normal DNA is labelled with digoxigenin. 2. Hybridization to normal human metaphase spreads Labelled test and control DNA are simultaneously hybridized to normal human metaphase spreads. Unlabeled human Cot 1 DNA is added to both labelled genomic material in order to suppress the hybridization of repetitive DNA sequences. Cot 1 DNA contains highly repetitive DNA sequences in excess, which hybridize with the complementary strands of the labelled DNAs, therefore selectively blocking their hybridization to the metaphase spreads. This procedure decreases hybridization background.
969
There are databases for CGH analysis in tumours (CGH Database of NCBI). The use of CGH analysis in cancer research has revealed new chromosomal aberrations that could not be detected by other methods so far. Chromosomal regions with gains or losses are candidates for the location of genes involved in carcinogenesis. ArrayCGH analysis is also used for the study of epigenetic (methylation and acetylation/deacetylation of histones) changes in cancer. CGH analysis is also a great tool in clinical medicine. The profile of a patient’s DNA alterations can provide evidence for the prognosis and the response to the treatment of the patient. Certain DNA aberrations are valuable diagnostic clues for particular diseases. Thus, conventional and array-CGH are great tools for both research studies and clinical practice [20].
73.4 Cell Cultures and Functional Assays 73.4.1 Introduction
3. Detection of hybridized DNAs DNA gains or losses appear as green or red fluorescent signal, respectively. Aberrations of DNA in the test genomic material are determined through the differences found in fluorescence ratio of green/red between the test DNA and the reference DNA. The ratio green/ red is measured along the longitudinal axis of the chromosomes. Epifluorescence microscopy is used to detect green and red signal. The quantification of fluorescent signal is analyzed by specific software.
73.3.3.3 Applications of CGH Analysis CGH analysis is a technique for the detection of unbalanced chromosomal aberrations [21]. Changes, which do not alter the DNA copy number like balanced reciprocal inversions or translocations, cannot be determined. CGH can provide resolution at 5–10 Mb level for DNA losses and 2 Mb for DNA gains. To circumvent these limitations, array-based CGH has been introduced. Array-based CGH detects DNA alterations at a higher resolution than conventional CGH. In the array-CGH version of the technique, labelled DNAs are hybridized to oligonucleotides, genomic fragments or cDNAs, which are cloned into various vectors, thus increasing the analysis resolution. CGH analysis is used mainly in cancer research.
The analysis of tissue specimens with in situ or molecular techniques offers valuable information regarding molecules that may be deregulated during disease progression. It is also indicative of possible correlations among molecules that could act at the same or alternative pathways. Functional assays are however needed to fully explore the specific “gain” or “loss” of function(s) of the deregulated molecules that have been investigated with the application of the previously described analyses. The establishment of the first human cell line in 1951 opened a new era towards this direction. The following section, will describe the establishment of cell lines and the main functional assays that are usually performed.
73.4.2 Isolation of Cells and Establishment of Cell Lines Cells maintained in cell cultures are derived either from blood or from tissue specimens. Since white cells are the only cells from blood with propagation capacity, the isolation procedure includes a two-step density gradient centrifugation of blood using Ficoll [24], allowing harvesting of Peripheral-Blood Mononuclear Cells (PBMCs) with great purity. Cells circulating in bloodstream – like PBMCs – don’t require adhesion
970
with surfaces to survive in cultures, therefore creating suspension cultures. In contrast, the isolation of cells from tissue specimens is more demanding. Usually, enzymatic digestion with trypsin or collagenase is performed to facilitate the degradation of the cellular junctions with the extracellular matrix. A major problem in handling tissue specimens as a source of cells is the presence of fibroblasts. These cells are usually in abundance in the surrounding stroma, proliferate fast in culture and hinder the growth of other cell types. The use of serumfree media is a common technique implemented to prevent fibroblast growth. All cells derived from explants are referred as adhesion cells, because they need to attach to a surface to grow. Cells isolated from explants from living organisms are considered as primary cell cultures. With the exception of cancer cells, primary cells have a limited lifespan and acquire a senescence phenotype due to telomere attrition and cultivation stress. Experimental research has shown though that primary human cells, when are continuously propagated, enter senescence due to activation of the p53 and pRb pathways [30]. Cells that spontaneously inactivate these pathways may continue to proliferate, but will soon enter a crisis stage and die due to telomere attrition. Only very few cells that will reactivate telomerase can escape crisis and become immortalized. On the other hand, cancer cells can proliferate indefinitely and lead to established cell lines. Primary cells and cell lines are usually maintained in common or specialized media and are incubated at 37°C and 5% CO2 atmosphere.
73.4.3 Plasmid Transfections and Viral Infections A common tool in assessment of a gene’s function is its (over-) expression in a cell line. This is mainly achieved through the introduction of its coding DNA into eukaryotic cells using either plasmid (transfection) or viral vectors (infection-transduction). This type of manipulation allows for the identification of the subcellular localization of its protein product, protein interactions, alterations in the molecular pathway(s) that participates in and its effect on cellular proliferation or cell death. Gene introduction into cell lines was primarily accomplished with the aid of plasmid vectors. Plasmids
A. Kotsinas et al.
are circular double-stranded and self-replicating DNA molecules. Genetic engineering techniques, employing restriction endonuclease (enzymes) and DNA ligase manipulations, allow for the “cut and paste” of specific DNA sequences into transforming bacterial plasmids, thus creating vectors that facilitate the insertion of genes into eukaryotic cells [15]. The core elements of a plasmid vector are: 1. The origin of replication that is responsible for self multiplication in a bacteria, thus allowing the production of high number of copies of the plasmid for further genetic manipulations. 2. A short multicloning site (called polylinker) containing several restriction enzymes sites where the insert sequence of choice can be introduced after “cutting” with the appropriate endonuclease(s) and ligation. 3. A promoter sequence, located adjacent to the polylinker, is responsible for regulating the expression of the cloned gene into the eukaryotic cell environment after transfection. 4. Genes that confer resistance to specific antibiotics, once the plasmid has been introduced into prokaryotic or eukaryotic cells, respectively. These genes allow for the selection only of cells, prokaryotic or eukaryotic, respectively, that have received the plasmid. The latter is related with the existence of endonucleases into the recipient cells that degrade the exogenous DNA. Therefore, the transfected plasmids can only transiently survive and express their insert in eukaryotic cells (transient transfection), if there is no antibiotic selection. However, when an antibiotic is added to the medium, only the cells that have received the plasmid can resist and survive. Therefore, permanent selection of these transfected eukaryotic cells creates homogeneous clones that stably express the desired gene (stable transfection). An alternative approach to study a gene of interest stems from the need to investigate its regulation of expression. In such cases, sequences corresponding to putative regulatory elements of a gene, such as promoters and/or enhancers, are cloned into the polylinker region of a plasmid[15]. Such plasmid vectors have their polylinker attached to a coding sequence of a reporter gene that usually codes for a non-related to the recipient cells protein[15]. The expression of this reporter gene can easily be discriminated from the host’s proteins and monitored quantitatively (e.g. lightemitting proteins derived from fire-fly species). Given
73
Molecular Techniques in Surgical Research
that the expression of the reporter gene is under the control of the cloned regulatory element, any changes in the expression levels of this reporter reflects binding interactions between regulatory factors and the investigated element within a specific cellular context. Since only small discrete DNA molecules can be inserted into plasmid vectors, the insertion of the complete gene with its introns and exons is prohibitory. Therefore, only the transcribed region of a gene (cDNAs) is inserted into plasmids, while vectors that can be “loaded” with larger DNA segments, such as cosmids, Yeast Artificial Chromosomes (YACs) and Bacterial Artificial Chromosomes (BACs) are used when the whole genome sequence should be inserted. Plasmid constructs are propagated into bacteria and mainly E. coli strains, while their uptake by the eukaryotic cells is facilitated through various techniques such as electroporation, cationic lipid transfection and calcium phosphate transfection[15]. The plasmid transfection method is usually limited by the small yield of transfected cells. This was improved by the introduction of viral vectors. In this case, the viral sequences are appropriately modified so as to contain the desired gene as well as sequences needed for viral genome replication and expression. Commonly used viruses are Adeno-, Retro- and Lentiviruses [26, 27, 31]. The latter, since enable the integration of their genome into the chromosomes of eukaryotic cells, are widely used not only for transient transductions, but for stable transductions as well, with greater efficacy [28].
73.4.4 RNA Interference A simple idea developed into a powerful tool in genomic research was to introduce into cells, RNA sequences complementary to mRNA that would inhibit its translation. Indeed, it was found that small double stranded RNA sequences when inserted into cells are processed by a ribonuclease called Dicer [25] into double-stranded molecules with hanging 3′ ends known as short interfering RNAs (siRNAs). siRNAs are further processed by a multiprotein complex called RISC (RNA-induced silencing complex), producing single stranded RNA oligonucleotides that hybridize with complementary sequences of mRNAs leading them to degradation, thus inhibiting their translation.
971
Apart from siRNAs, there are also endogenous untranslated small RNA molecules called microRNAs (miRNAs) that are also processed by Dicer and RISC that regulate gene expression [29]. The introduction of siRNA oligonucleotides into cells confers to transient gene silencing. Recently, special expressing vectors containing gene specific inserts have been developed, allowing for permanent suppression of gene expression. The inserts are composed by two complementary oligonucleotide sequences separated by a spacer [23]. The transcript produced folds to form a double stranded structure with a hairpin on its one end that will be processed by Dicer and RISC to create siRNAs.
73.4.5 Drug Treatments Cell cultures consist an ideal setting for testing chemical compounds for several reasons. First, already established treatments are tested to gain insight into the molecular mechanisms involved in the therapeutic actions and side-effects of their regimen, as well as the mechanisms adopted by cells to develop resistance to the drug. This enables the design of new therapeutic protocols and new drug combinations that maximize the therapeutic efficacy and simultaneously decrease the adverse effects and the drug resistance. Furthermore, newly discovered substances can be tested for their efficacy so as the most potent and less noxious of them to be further tested at clinical level. In order to assist the above applications that are widely employed in cancer research, several assays have been developed to measure drug cytotoxicity. The most common are presented below.
73.4.5.1 MTT Assay It is a colourimetric assay where yellow MTT (3-(4,5-Dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide) is reduced in the mitochondria of living cells to purple formazan. After the application of a specific treatment, MTT is added in the medium of the culture. The insoluble formazan created is solubilized with specific reagents – e.g. dimethyl sulfoxide (DMSO) – and the absorbance of the coloured solution, as measured with a spectrophotometer, is proportional to the percentage of living cells.
972
73.4.5.2 Clonogenic Assay In this method, the treatment is applied and cells are allowed to grow. The colonies produced by the surviving cells are fixed, stained and counted, creating a cell survival curve where the number of colonies is plotted against the drug concentration.
73.4.5.3 Annexin V Annexins is a group of proteins that bind membrane phospholipids in a calcium-dependent way. In living cells, they are located in the inner surface of the cellular membrane. However, early during apoptosis, phosphatidylserine is translocated to the outer surface. Therefore, cultured cells after treatment are incubated with a fluorophore-conjugated anti-annexin antibody, and are subsequently analyzed by flow cytometry (FACS) to determine the percentage of apoptotic cells.
A. Kotsinas et al.
in order to study the effects of this alteration in the context of an organism. Currently, the most popular animal models are the transgenic ones. Transgenic animal represents an animal in which foreign DNA has been introduced, employing recombinant DNA technology, resulting in a deliberate modification of the genome in contrast to spontaneous mutation. The foreign gene must be transmitted through the germ line so that every cell, including germ cells, of the animal contains the same modified genetic material. Deletion or inactivation of a gene results in a knockout animal, while removal of a DNA sequences that blocks gene transcription or replacement of an endogenous gene with a new gene results in a knock-in animal. Although this technology has been applied to a wide range of mammals, mice are the predominant species widely employed.
73.5.3 Methods of Gene Delivery 73.5.3.1 Microinjection
73.5 Animal Models 73.5.1 Introduction The complex interactive processes of living mammals are not reproducible in vitro. Functional assays performed in cell cultures represent a first step simulating the in vivo conditions and are easily adoptable in research laboratories. Yet, cell lines are homogeneous cell populations that function ex vivo; therefore, there is still a simplistic approach at the cellular, but not organism, level. For this reason, animal models have been adopted.
Historically, Brinster et al. [32] described first a method for delivering a genetic construct into the pronucleus of a fertilized ovum (Fig. 73.11). The delivered gene may be either from another individual of the same species carrying a specific alteration or from different species. Subsequently, the fertilized ovum is transferred to a recipient for pregnancy. The inserted construct may lead to the over- or under-expression of certain genes or to the expression of genes entirely new to the animal species. As insertion of DNA is a random process, there is a high probability that the construct will not deliver the gene into a site on the host DNA that will permit its expression.
73.5.2 Brief Description 73.5.3.2 Embryonic Stem Cell Gene Delivery Initially, animals were selectively bred in order to obtain individuals carrying a specific genetic background, such as a mutation in a gene, suitable for investigating the function of that gene. With the advent of genetic engineering methods, manipulation of any gene within any organism’s genome context has become feasible. Thus, insertion, deletion or reactivation of a gene from the same species, or introduction of a foreign gene from different species can be performed
This method requires (Fig. 73.11): 1. Harvesting of embryonic stem (ES) cells from the inner mass of a mouse blastocyst 2. Growth in culture of the ES cells 3. Insertion of the genetic construct into the genome of the stem cells, while these are grown in cell culture conditions 4. Re-incorporation of these cells into a blastocyst
73
Molecular Techniques in Surgical Research
973
Method 1 Target DNA Selection of cells expressing target gene
Injection of the transformed ES cells into the inner mass of mouse blastocyst
Blastocyst Inner cell mass
Insertion of the genetic construct into the genome of the Embryonic stem (ES) cells
Implantation in mouse uterus
Method 2 Pronuclei Insertion of the target gene
Foster mother Implantation in mouse uterus Fertilized egg Test descendant for expression of target gene
Intercross heterozygous descendant to produce homozygous transgenic strain
Fig. 73.11 Delivery of specific genes into animals. Method 1. Embryonic stem (ES) cells are obtained from a blastocyst and grown under culture conditions. The gene of interest is delivered into the ES cells either by plasmid transfection or virus infection.
Subsequently, ES cells are reintroduced into a blastocyst, which is implanted in uterus. Method 2. The desired gene, cloned into an appropriate vector, is introduced into the pro-nucleus of a fertilized oocyte and subsequently implanted in uterus
The obtained animal is a chimeric one, i.e. not all cells carry the delivered gene. In this case, after several rounds of inbreeding (from 10 to 20), homozygous transgenic animals are obtained and the transgene is present in every cell [33, 34].
References
73.5.3.3 Retrovirus-Mediated Gene Transfer Retroviruses are commonly used vectors for genetic material transfer into the cell [35] (Fig. 73.11). Their advantage is the ability to infect host cells. In this way, they offer increased probability of expression of the gene transferred. Offspring derived from this method are also chimeric. Transmission of the transgene is possible only if the retrovirus integrates into some of the germ cells. Homozygous transgenic animals are obtained as previously described.
1. Alphey L (1997) DNA sequencing: from experimental methods to bioinformatics. BIOS Scientific, New York 2. Alwine JC, Kemp DJ, Stark GR (1977) Method for detection of specific RNAs in agarose gels by transfer to diazobenzyloxymethyl-paper and hybridization with DNA probes. Proc Natl Acad Sci USA 74:5350–4535 3. Burnette WN (1981) Western blotting: electrophoretic transfer of proteins from sodium dodecyl sulfate – polyacrylamide gels to unmodified nitrocellulose and radiographic detection with antibody and radioiodinated protein A. Anal Biochem 112:195–203 4. Dennis Lo YM (2006) Clinical applications of PCR (methods in molecular Biology). Humana, Totowa 5. Dietel M, Ellis IO, Hoefler H et al (2007) Comparison of automated silver enhanced in situ hybridization (SISH) and fluorescent ISH (FISH) for the validation of HER2 gene status in breast carcinoma according to guidelines of the American Society of Clinical Oncology and the College of American Pathologists. Virchows Arch 451:19–25
974 6. Gavrieli Y, Sherman Y, Ben-Sasson SA (1992) Identification of programmed cell death in situ via specific labeling of nuclear DNA fragmentation. J Cell Biol 119:493–501 7. Goldstein NS, Ferkowicz M, Odish E et al (2003) Minimum formalin fixation time for consistent estrogen receptor immunohistochemical staining of invasive breast carcinoma. Am J Clin Pathol 120:86–92 8. Gorgoulis VG, Zacharatos P, Mariatos G et al (2002) Transcription factor E2F-1 acts as a growth-promoting factor and is associated with adverse prognosis in non-small cell lung carcinomas. J Pathol 198:142–156 9. Gupta D, Middleton LP, Whitaker MJ et al (2003) Comparison of fluorescent and chromogenic in situ hybridization for detection of HER-2/neu oncogene in breast cancer. Am J Clin Pathol 119:381–738 10. Kreipe HH, von Wasielewski R (2007) Beyond typing and grading: target analysis in individualized therapy as a new challenge for tumour pathology. Recent Results Cancer Res 176:3–6 11. Madrid MA, Lo RW (2004) Chromogenic in situ hybridization (CISH)Q a novel alternative in screening archival breast cancer tissue samples for HER-2/neu status. Breast Cancer Res 6:R593–600 12. McPherson M J, Moller SG (2006) PCR (the basics). Taylor & Francis, New York 13. Piccart-Gebhart MJ, Procter M, Leyland-Jones B et al (2005) Herceptin Adjuvant (HERA) Trial Study Team, trastuzumab after adjuvant chemotherapy in HER2-positive breast cancer. N Engl J Med 353:1659–7162 14. Rhodes A, Jasani B, Barnes DM et al (2000) Reliability of immunohistochemical demonstration of estrogen receptors in routine practice: interlaboratoryvariance in the sensitivity of detection and evaluation of scoring systems. J Clin Pathol 53:125–130 15. Sambrook J, Russell D (2001) Molecular cloning: a laboratory manual. Cold Spring Harbour Laboratory, New York 16. Slamon DJ, Clark GM, Wong SG et al (1987) Human breast cancer. Correlation of relapse and survival with amplification of the HER2/neu oncogene. Science 235:177–812 17. Southern EM (1975) Detection of specific sequences among DNA fragments separated by gel electrophoresis. J Mol Biol 98:503–157 18. Umemura S, Kurosumi M, Moriya T et al (2006) Immunohistochemical evaluation for hormone receptors in breast cancer: a practically useful evaluation system and handling protocol. Breast Cancer 13:232–235 19. Vincent-Salomon A, MacGrogan G, Couturier J et al (2003) Calibration of immunohistochemistry for assessment of
A. Kotsinas et al. HER2 in breast cancer: results of the French multicentre GEFPICS study. Histopathology 42:337–347 20. Beijani BA, Shaffer LG (2006) Applications of array-based comparative hybridization to clinical diagnostics. J Mol Diagn 8:528–353 21. Kallioniemi A, Kallioniemi OP, Sudar D et al (1992) Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science 258:818–281 22. Schena M, Shalon D, Davis RW et al (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467–470 23. Brummelkamp RB, Bernards R, Agami R (2002) A system for stable expression of short interfering RNAs in mammalian cells. Science 296:550–553 24. De Rock E, Taylor N (1977) An easy method of layering blood over Ficoll-Paque gradients. J Immunol Methods 17:373–437 25. Ketting RF, Fischer SE, Bernstein E et al (2001) Dicer functions in RNA interference and in synthesis of small RNA involved in developmental timing in C. elegans. Genes Dev 15:2654–9265 26. Kinsella TM, Nolan GP (1996) Episomal vectors rapidly and stably produce high-titer recombinant retrovirus. Hum Gene Ther 7:1405–1413 27. Miller AD (1990) Retrovirus packaging cells. Hum Gene Ther 1:5–14 28. Miller AD (1992) Human gene therapy comes of age. Nature 357:455–640 29. Tang G (2005) siRNA and miRNA: an insight into RISCs. Trends Biochem Sci 30:106–114 30. Vaziri H, Benchimol S (1999) Alternative pathways for the extension of cellular life span: inactivation of p53/pRb and expression of telomerase. Oncogene 18:7676–8760 31. Vigna E, Naldini L (2000) Lentiviral vectors: excellent tools for experimental gene transfer and promising candidates for gene therapy. J Gene Med 2:308–136 32. Brinster R (1974) The effect of cells transferred into mouse blastocyst on subsequent development. J Exp Med 140:1049–1056 33. Gordon JW, Ruddle, FH (1981) Integration and stable germ line transformation of genes injected into mouse pronuclei. Science 214:1244–1246 34. Gossler A, Doetschman T, Korn R et al (1986) Transgenesis by means of blastocyst-derived embryonic stem cell line. Proc Natl Acad Sci 83:9065–9069 35. Jaenisch R (1976) Germ line integration and Mendelian transmission of the exogenous Moloney leukemia virus. Proc Natl Acad Sci 73:1260–1264
Molecular Carcinogenesis
74
Michael Zachariadis, Konstantinos Evangelou, Nikolaos G. Kastrinakis, Panagiota Papanagnou, and Vassilis G. Gorgoulis
Contents 74.1
Introduction ............................................................ 975
74.2
Molecular Basis of Cancer..................................... 976
74.2.1 Cellular Independence of External Growth Signals: Role of Oncogenes ........................ 74.2.2 Insensitivity to Growth-Inhibitory Signals: Role of Tumor Suppressor Genes ............................ 74.2.3 Evasion of the Anti-tumor Barriers (Apoptosis and Senescence) ..................................... 74.2.4 Umlimited Replicative Potential .............................. 74.2.5 DNA Damage Response, Repair, and Mitotic Surveillance in Cancer .......................... 74.2.6 Sustained Angiogenesis ........................................... 74.2.7 Invasion and Metastasis............................................
976 985 987 990 991 993 994
74.3
Multistep Carcinogenesis....................................... 995
74.4
Carcinogens............................................................. 995
74.4.1 74.4.2 74.4.3 74.4.4
Chemicals ................................................................. Radiation .................................................................. Viral Agents ............................................................. Genetic Predisposition .............................................
Abstract Over the last 20 years, the systematic application of molecular methodologies has revealed a series of cellular events that contribute to cancer development. Although it is clear that each tumor type is unique in terms of pathogenesis and progression, carcinogenesis seems to follow some characteristic fundamental principles, including nonlethal genetic damage, clonal expansion of a single transformed cell, as well as alterations in proto-oncogenes and tumor suppressor genes, all presented in a progressive, multistep fashion.
74.1 Introduction
References............................................................................1000
To date, cancer research at the cellular and molecular level has established the great diversity exhibited between different tumor types in terms of pathogenesis and progression. However, it has become evident that carcinogenesis, although innately complex, seems to follow certain common “rules” or fundamental principles.
V. G. Gorgoulis () Department of Histology & Embryology, Molecular Carcinogenesis Group, Medical School, University of Athens, 75 Mikras Asias Street, Goudi, Athens 11527, Greece e-mail: [email protected]
• The ground upon which carcinogenesis develops is nonlethal genetic damage. Genetic lesions occur under the influence of environmental factors, such as radiation, chemicals, and/or viral infections, they may be either inherited or, more often, random and spontaneous. The accumulation of specific genetic lesions in a cell may eventually lead to the selection of potentially malignant clones bearing a growth advantage over their normal counterparts, through a process that greatly resembles Darwin’s theory of evolution. • Each tumor is the result of the clonal expansion of a single transformed cell. • The genetic damage that leads to malignant transformation involves four great classes of normal regulatory genes: the growth-promoting proto-oncogenes (known
995 996 997 999
T. Athanasiou (eds.), Key Topics in Surgical Research and Methodology, DOI:10.1007/978-3-540-71915-1_74, © Springer-Verlag Berlin Heidelberg 2010
975
976
M. Zachariadis et al.
gether, since they typically evolve from a critical genetic error in a cell of haemopoietic origin.
74.2 Molecular Basis of Cancer Hanahan and Weinber [2] suggest that malignant transformation proceeds through six acquired essential alterations in cell physiology: self-sufficiency in growth signaling, insensitivity to growth-inhibitory signals, evasion of programmed cell death (apoptosis), limitless replicative potential, sustained angiogenesis, and, lastly, tissue invasion and metastasis. These are driven by a seventh factor, genomic instability, as a result of the failure of various genome “caretaker” mechanisms. The above traits will be further analyzed in the following paragraphs. Fig. 74.1 Schematic representation of multistep carcinogenesis integrated with the hallmarks of cancer. The activation of an oncogene by a DNA damaging agent is one of the initiating events of the whole process. Inactivation of tumor suppressors complements the neoplastic profile. The physiological barriers of apoptosis and senescence inhibit cancer progression. However, due to the selective pressure imposed by the cellular response to DNA damage, some cells acquire properties that allow them to override these barriers. Thus these cells attain immortality but, at the same time, abolish genomic integrity. Accumulation of additional genetic lesions progressively leads to in situ cancer development and, eventually, to invasion and metastasis. The driving force of carcinogenesis, along this process, is provided by genomic instability and epigenetic alterations (see text for details)
as oncogenes in their altered form); the growth inhibitory tumor suppressor genes; the genes regulating programmed cell death (apoptosis); and the genes involved in DNA repair. Alterations in these genes enable potential tumor cells to override barriers of tumorigenesis, such as apoptosis and senescence, when the cellular integrity is in danger. • Solid tumors constitute paradigms of the multistep model of carcinogenesis [1], according to which, the transition of potentially neoplastic lesions to fullblown cancer results from the gradual acquisition of aberrations in various classes of genes and gene products (Fig. 74.1). This progression is also evident at the phenotypic level, where characteristics such as excessive growth, invasiveness, and metastasis are acquired in a stepwise fashion, possibly reflecting the underlying accumulation of various genetic lesions. Hematological malignancies deviate from this concept and comprise a different disease category alto-
74.2.1 Cellular Independence of External Growth Signals: Role of Oncogenes Normal cell division is generally controlled by the microenvironment surrounding each cell. Extracellular mitogenic signals are transported through transmembrane receptors into the cytoplasm, stimulating a cascade of reactions that ultimately leads to cell division. By contrast, tumor cells are not subject to such control, due to their ability to produce their own growth signals, an ability that is acquired by genetic lesions to a class of genes, termed proto-oncogenes. These genes encode proteins that regulate cell growth and differentiation, and can be classified according to their function as growth factors (GFs), growth factor receptors (GFRs), signal transducers (STs), nuclear transcription factors (NTFs), and cell cycle regulators (CCRs). Any given cell in quiescence requires an extracellular signal, usually in the form of a GF, in order to enter an active proliferative state. GFs bind to specific GFRs located at the plasma membrane of the target cell. GFRs are transmembrane proteins with an extracellular domain (ECD), which receives signals from the cellular microenvironment, and an internal domain, usually with tyrosine kinase activity, that transmits the received signals toward the nucleus. This conveyance is achieved by a class of proteins known as STs. These proteins constitute a
74
Molecular Carcinogenesis
complicated network inside the cell, the end members of which target another class of proteins, the NTFs. NTFs promote transcription of specific genes which, in turn, orchestrate entry to and progression through the cell cycle. Aberrations at any point in this cataract of events may contribute to cellular transformation. Genetic alterations in proto-oncogenes convert them to oncogenes, which act as dominant alleles, since they induce transformation despite the presence of a normal allele. Their protein products, oncoproteins, serve the same functions as their normal counterparts (e.g., they promote DNA replication, cellular proliferation, angiogenesis, and cell migration). Nevertheless they are constitutively expressed, being independent of external stimuli, and confer self-sufficiency in growth signaling, therefore increasing the malignant potential of a given cell.
74.2.1.1 Growth Factors Normally, in order for cells to enter an active proliferative state, there is a need for external stimulation by GFs, usually produced by different cell types (heterotypic signaling). Many cancer cells acquire the ability to overproduce GFs, to which they are themselves responsive. When they carry the respective membrane GFRs, a positive feedback signaling loop is created, which confers a growth advantage (autocrine stimulation). However, in several cases, there is a paracrine pathway at play. In this latter circuit, GFs produced by transformed cells prompt adjacent stromal cells to divide and, in turn, stromal cells produce GFs to promote proliferation of cancer cells; an interplay that is vital for tumor growth, expansion, and invasion [3, 4]. In a variety of cancer types (e.g., astrocytoma, osteosarcoma, prostatic cancer, melanoma), overexpression of the SIS proto-oncogene, which encodes platelet-derived growth factor (PDGF) b-chain (PDFG-B), coincides with high levels of the PDGF receptor (PDGFR), indicating an autocrine stimulation [5]. In the case of dermatofibrosarcoma protruberans, PDGF-B overexpression is due to a chromosomal translocation that brings the encoding gene under the control of the widely expressed collagen, type 1, alpha 1 chain (COL1A1) promoter [6]. Fibroblast growth factor (FGF)8 constitutes a powerful autocrine GF in the cases of breast and prostate cancer, with its presence in the latter being correlated with poor prognosis [7]. Another member of the FGF family,
977
FGF-2 (basic FGF [bFGF]), exerts significant autocrine activities in leukemic stem/progenitor cells, and is overexpressed in approximately half of the patients suffering from B-cell chronic lymphocytic leukemia and chronic myeloid leukemia [8]. Hepatocyte growth factor (HGF) and its receptor (MET) are often co-expressed in many human cancers, including invasive breast carcinomas, and their presence has been linked with malignant progression and a poor outcome [9]. Another important autocrine circuit, present in many human carcinomas, is formed between epidermal growth factor (EGF)-like peptides (e.g., transforming growth factor-alpha [TGF-a], amphiregulin) and the ERBB receptor family [10]. EGF-like peptides may bind and activate different members of the ERBB family of receptors. Often, during malignancy, there is more than one EGF-like GFs expressed, and this network is believed to sustain the autonomous proliferating capacity of cancer cells. Moreover, this class of GFs seems to cooperate with other proto-oncogenes, like RAS and MYC, contributing to transformation [10]. Even though the autocrine stimulation of cancer cells seems important in carcinogenesis, the overexpression of GFs per se is unambiguously not sufficient to cause malignant transformation. Nevertheless, the ability to proliferate at will offers the ground for selection of potent clones bearing genetic lesions that may eventually lead to transformation.
74.2.1.2 Growth Factor Receptors GFRs represent the connective links between cells and their microenviroment, being able to sense signals from the extracellular space and transmit them toward the cytoplasm, triggering distinct reactions depending on the original stimulus. GFRs are transmembrane proteins composed of a ligand-binding ECD, a transmembrane segment and an intracellular domain with tyrosine kinase activity. Binding of the GF ligand to the ECD of the receptor activates the latter through dimerization or heterodimerization. This interaction triggers the activation of the associated tyrosine kinase. Autophosphorylation of multiple tyrosine residues located in the intracellular domain of the receptor follows, creating a docking site for various adaptor molecules, as well as for functional proteins. Hence, multiple parallel intracellular signaling pathways are activated, exerting diverse biological responses [11].
978
The activation of oncogenic GFRs may occur through a plethora of mechanisms. In several cases, they participate in autocrine circuits, as previously described. Overexpression of GFRs has also been observed, which renders the cells susceptible to paracrine GF stimulation, even in the presence of low GF concentrations [2]. Moreover, overexpression of GFRs is believed to cause crowding at the plasma membrane, leading to spontaneous dimerization and activation of receptors, making the presence of an activating GF obsolete. Overexpression of c-MET has been observed in many different tumor cells. Characteristically, c-MET is overexpressed in almost all cases of sporadic papillary thyroid carcinoma (PTC) [12]. High levels of MET may be due to gene amplification, stimulation by oncogenes like RAS and RET, or hypoxia-activated transcription [4, 13]. Another abundantly expressed receptor in human carcinomas is the ERBB-1 receptor, also known as EGF receptor (EGFR). ERBB-1 is overexpressed in up to 80% of lung, colon and breast carcinomas (reviewed in [14, 15]) and up to 100% of head and neck neoplasms. High levels of ERBB-1 are usually the result of gene amplification, although they have been observed in the absence of the latter [15, 16]. Nevertheless, elevated expression of EGFR by itself is not enough to induce transformation in vivo. Activation of additional oncogenes or inactivation of tumor suppressor genes is required in order for transformation to occur [17]. Another member of the ERBB family of receptors, ERBB-2 (or HER 2/Neu) is overexpressed in about 30% of human breast carcinomas, and its presence in node-positive patients is related with poor prognosis (reviewed in [18, 19]). High levels of ERBB-2 are attributed to gene amplification in most cases of breast carcinoma, while in other types (e.g., pancreatic, colon, lung), gene deregulation is suggested as the underlying mechanism [19]. Furthermore, overexpression of PDGFR has been observed in a small subset of glial tumors [20, 21], as well as in metastatic medulloblastoma [22]. In several hematologic malignancies where PDGFR is implicated, an alternative mode of GFR activation is observed, which involves chromosomal translocations. The end product is usually a chimeric fusion protein, which retains the cytoplasmic tyrosine kinase domain of the receptor, while the extracellular portion consists of the N-terminal domain of some partner protein. These abnormal translocation events result in ligand-independent dimerization and constitutive activation of the chimeric proteins. Such fusion products involving PDGFR have been observed in chronic
M. Zachariadis et al.
myeloproliferative disorders, myelodysplastic/myeloproliferative syndromes, as well as in acute myeloid leukemia (AML) (reviewed in [5]). Chromosomal rearrangements (inversions and translocations) have also been observed in PTC and involve the RET proto-oncogene at a frequency of up to 70%. In the case of PTC, the rearrangements that result in the production of the so called RET/PTC chimeric proteins have significant pathobiological consequences. In normal epithelial follicular thyroid cells, the RET protooncogene is silent, whereas the chromosomal rearrangements observed in PTC bring it under the influence of the promoter of its fusion partner. Thus, the chimeric product is readily expressed, exhibiting ligand-independent dimerization and constitutive tyrosine kinase activity [for reviews see [23–25]. The RET proto-oncogene is also implicated in multiple endocrine neoplasia (MEN) 2. MEN 2 is an inherited (familial) autosomal dominant cancer syndrome caused by germline point mutations in RET. MEN 2 is classified in three clinical entities: MEN 2A, MEN 2B and familial medullary thyroid carcinoma (FMTC). MEN 2A is the most frequent of these syndromes (over 75%) and its symptomatology includes medullary thyroid carcinoma (MTC), pheochromocytoma in 50% of the cases and parathyroid hyperplasia or adenoma in up to 30% of the cases. MEN 2B is the most aggressive variant and is characterized by earlier onset of MTC associated with pheochromocytoma in nearly 50% of cases and, only rarely, with other clinical manifestations (i.e., mucosal neuroma, ganglioneuromatosis of the intestine, ocular and skeletal abnormalities [Marfanoid habitus]). FMTC is defined by the presence of MTC alone in at least four family members and is considered the least aggressive syndrome type. The point mutations affecting RET fall into two main groups: those affecting the extracellular cysteine-rich domain and those at the cytoplasmic tyrosine kinase domain. The first group of mutations has been detected only in MEN 2A and FMTC. The latter syndrome has also been associated with mutations at the kinase domain, yet these are most closely related to MEN 2B. All the above mentioned mutations confer a gain-of-function effect. Alterations at the cysteine-rich domain cause constitutive dimerization of the receptor. Mutations at the kinase domain produce more diverse effects, including aberrant substrate specificity and ATPbinding capacity, as well as constitutive activation of a monomeric receptor form (reviewed in [23–25]). Point mutations in GFRs, other than RET, which alter their function, have been observed in many different
74
Molecular Carcinogenesis
cancer types. Alterations in the kinase domain of the MET proto-oncogene have been observed in thyroid carcinoma, ovarian cancer and juvenile hepatocellular carcinoma (HCC). These typically increase the tyrosine kinase activity of the receptor. MET is also mutated in other domains, as observed in hereditary papillary renal carcinomas, gastric and lung cancers, leading to constitutive dimerization and hyperactivation [4, 13]. Mutations (deletions or tandem duplications) of the EGFR proto-oncogene are linked to glioblastoma multiforme (GBM), and are mainly responsible for ligand-independent activity of the receptor [17]. Markedly, gene amplification, small in-frame deletions and point mutations occurring within the kinase domain of EGFR have all been observed in a certain subset of non-small cell lung cancer (NSCLC) patients with defined clinical and pathological features (female, non-smoker adenocarcinoma patients). In this selected group of individuals, activation of the EGFR pathway seems to be the dominant event leading to transformation [17]. Similar ERBB-2 gene lesions have been observed in a subgroup of NSCLC patients with analogous clinicopathological features [26, 27]. Interestingly, in these patients, the aforementioned alterations are mutually exclusive. Aberrations in the ERBB-2 gene have also been found in breast carcinomas, where constitutively active dimers of the receptor are formed due to alternative splicing. Moreover, in breast cancer, ERBB-2 seems to undergo post-translational modification, and particularly, proteolytic cleavage of its ECD. The truncated receptor shows increased kinase activity and enhanced transforming potential (reviewed in [17]). In gastrointestinal stromal tumors (GISTs), gain-offunction mutations of the protooncogene KIT seems to drive the malignant transformation of KIT-dependent cell types. These alterations (point mutations, in-frame deletions, internal tandem duplications) result in ligand-independent phosphorylation and constitutive activation of KIT. Approximately half of the KIT-negative GIST cases carry activating mutations in the PDGFRA proto-oncogene. These are also gain-of-function mutations, inducing ligandindependent activation. Both mutated genes, KIT and PDGFRA, activate common signaling pathways, yet they do not coexist in transformed cells [5, 28–30].
74.2.1.3 Signal Transducers STs constitute a highly heterogeneous group of proteins, which participate in discrete molecular pathways
979
within the cell, coupling signals perceived at the cell surface by receptors, to transcription factors (TFs) that regulate gene expression. These signaling cascades are not independent of each other, but often interconnect, creating complex molecular networks. STs are implicated in diverse cellular functions, including cell survival, cell cycle progression, differentiation, senescence, apoptosis, migration and angiogenesis. The deregulation of these molecular signaling pathways may lead to malignant transformation. The RAS family of small guanine triphosphate (GTP)-binding proteins (G proteins) is the best characterized oncogene family in this group. It includes the upstream activators of several signaling cascades, including the RAF/MEK/ERK, the PI3K/AKT and the RalEGF/RAL ones (reviewed in [31, 32]). It is believed that about 20% of all human tumors contain an activating mutation in one of the RAS genes. The RAS protein superfamily contains more than 150 members, but special attention has been drawn to the three pivotal RAS proteins, H–, N– and K-RAS. These three isoforms show tissue-specific expression, and their oncogenic forms have been preferentially detected in specific cancer types, e.g., K-RAS mutations are implicated in colon, pancreatic and lung neoplasms, H-RAS mutations in bladder carcinomas, and N-RAS mutations in tumors of hemopoietic origin. RAS proteins alternate between an “ON” GTPbound state and an “OFF” GDP-bound state. The exchange of GDP for GTP is catalyzed by a family of regulatory proteins called guanine nucleotide exchange factors (GEFs), which include son of sevenless (SOS), RAS-guanine nucleotide releasing factors (RAS-GRFs) and RAS-guanine nucleotide releasing proteins (RASGRPs). Among these, SOS is associated with GFRmediated RAS activation. Upon activation of a tyrosine kinase receptor by cytokines, GFs or mitogens, the coupling protein complex SHC/GRB2/SOS is recruited to the plasma membrane. This reaction brings SOS into the vicinity of RAS, thus facilitating the formation of active GTP-RAS. The intrinsic GTPase activity of RAS, is too low to bear any physiological significance. Consequently, in order for RAS to return to its “OFF” state, the activity of additional adaptor proteins is required. These proteins, called GTPase activating proteins (GAPs), increase 100,000-fold the GTPase activity of RAS, thereby rapidly restoring the inactive GDP-RAS form and preventing uncontrolled RAS activity. The most frequent point mutations in RAS
980
M. Zachariadis et al.
Fig. 74.2 Diagram depicting the main molecular events involved in the RAS/MEK/ERK and PI3K/AKT signaling pathways, as well as their end effects on cell proliferation and apoptosis (the
green lines denote activation; the red lines denote inhibition; dotted lines indicate indirect effect; see text and corresponding literature for details)
observed during tumorigenesis involve codons 12, 13, 59, and 61. Mutated RAS can still bind GAPs, but its GTPase activity fails to be augmented, thus remaining “locked” in an active GTP-bound state. The active RAS form exerts its downstream actions via RAS effector proteins, triggering several distinct molecular pathways. Of particular importance are the RAF/MEK/ERK and the PI3K/AKT signaling cascades (Fig. 74.2). The RAF/MEK/ERK pathway. Upon activation of RAS, RAF is recruited to the plasma membrane and is activated. RAF is a serine/threonine (S/T) kinase and its activity is regulated through phosphorylations. RAF may be found in three isoforms, A-RAF, B-RAF and RAF-1, among which, B-RAF is commonly mutated in certain types of cancer, mainly in melanoma, PTC, colorectal cancer and ovarian cancer. The main genetic lesion, observed in more than 90% of cases, is a change at nucleotide 600, which replaces the normally encoded valine with glutamic acid (V600E). This mutation hyperactivates B-RAF, enhancing
its downstream signaling action and, thus, inducing proliferation, survival and, potentially, malignant transformation. Active RAF proteins phosphorylate and activate the mitogen-activated protein kinase/ERK kinase (MEK), also known as MAPK. MEK is an S/T protein kinase of dual specificity, whose primary downstream target is the extracellular-signal-regulated kinase (ERK). ERK is also an S/T protein kinase targeting many TFs, such as avian erythroblastosis virus E26 oncogene homolog (ETS)-1, c-JUN, c-MYC and ribosomal protein S6 kinase, 90kD, polypeptide 1 (p90RSK1), implicated in cell cycle progression [32, 33]. Furthermore, the RAF/MEK/ERK signaling cascade exhibits an active role in the regulation of apoptosis. Activation of this molecular pathway inhibits apoptosis through a complex mechanism involving activation or suppression of various key molecules. Notably, RAF-1 demonstrates anti-apoptotic activity that is independent of MEK and ERK [32]. Moreover, the RAS/RAF/MAPK pathway may participate in the
74
Molecular Carcinogenesis
981
establishment of autocrine loops, as discussed previously. Many GF genes contain, within their promoter regions, cis-elements that specifically bind TFs activated by the RAS cascade. Consequently, aberrant activation of RAS or RAF may lead to overexpression of GFs, formation of autocrine loops and constitutive stimulation of cell growth. The PI3K/AKT pathway. Another important effector protein of RAS is phosphatidylinositol-3 kinase (PI3K) (reviewed in [34, 35]). In its active form, the protein is composed of a regulatory subunit, p85, and a catalytic subunit, p110. It is likely that there are two types of PI3Ks: one that can be directly activated by receptor tyrosine kinases and another that is triggered by receptors coupled with G proteins, such as RAS. Activation of PI3K via either mode results in the conversion of phosphatidylinositol 4,5 biphosphate (PIP2) to phosphatidylinositol 3,4,5 triphosphate (PIP3). This reaction promotes localization of phosphoinositol-dependent kinase-1 (PDK1) to the plasma membrane. Likewise, interaction of AKT/PKB
with PIP3 facilitates its translocation to the membrane, where it is phosphorylated and activated by PDK1. AKT is then distributed to various subcellular locations, where, through binding to a variety of substrates, it controls pathways involved in cell survival, cell cycle progression and proliferation. For example, AKT phosphorylates and inactivates many pro-apoptotic proteins, including BCL2associated agonist of cell death (BAD), procaspase-9, prostate apoptosis response protein 4 (Par-4) and forkhead box O3 (FOXO3A). Moreover, it phosphorylates and activates cAMP responsive element binding protein (CREB) and inhibitor of nuclear factor (NF)-κB (IκB) kinase which are, both regulators of the expression of anti-apoptotic genes. Hence, AKT holds a prominent position in survival signaling. Additionally, by controlling the subcellular localization of proteins like cyclin-dependent kinase inhibitor 1A (p21WAF1/CIP1), cyclin-dependent kinase inhibitor 1B (p27KIP1), and mouse double minute 2 (MDM2) (Fig. 74.3), AKT modulates their activity, thereby controlling cell cycle progression.
Fig. 74.3 A simplified view of the p53- and pRb-associated networks of proteins, including cyclins, cyclin-dependent kinases and cyclin-dependent kinase inhibitors, and their influence upon
cell cycle progression and apoptosis (the green lines denote activation; the red lines denote inhibition; dotted lines indicate indirect effect; see text for details)
982
PI3K/AKT pathway implication in human carcinogenesis, lies mostly on gene amplification and less so on mutations of its components. For example, PI3K is amplified in ovarian and cervical carcinomas, while AKT in ovarian, pancreatic, breast and stomach malignancies. Aberrant expression of AKT is believed to partake in epithelial to mesenchymal transition (EMT), a process associated with tumor invasiveness and metastasis. Moreover, a positive relation has been established between ERBB-2 receptor overexpression and activation of AKT in the development of breast carcinoma (reviewed in [36] and [134]). In AML, the PI3K/AKT pathway is often hyperactive, mostly due to activating mutations in FLT3 and c-KIT receptors, as well as RAS [37]. Another mode of constitutive activation of AKT is through deregulation of a tumor suppressor protein, namely the phosphatase and tensin homolog (PTEN), which inhibits AKT activation. Aberrations in PTEN are often observed in breast and prostate cancer, primary acute leukemia and non-Hodgkin’s lymphoma, resulting in AKT upregulation. The JAK/STAT pathway. Another important signaling cascade, coupled to cytokine receptor activation, is the JAK/STAT pathway (reviewed in [38, 39, 137]). Upon activation of a receptor by its cognate cytokine, a Janus kinase (JAK) is activated by tyrosine phosphorylation. In turn, the active JAK phosphorylates various substrates, including tyrosine residues within the cytoplasmic domain of the receptor. This reaction recruits the signal transducers and activators of transcription (STATs) to the receptor site, thus allowing JAK to phosphorylate and activate STATs. The activated STAT dissociates from the receptor and forms dimers, which translocate to the nucleus and regulate gene transcription. JAK can also indirectly activate PI3K and RAS, thereby activating their downstream targets. Negative regulation of the JAK/STAT pathway is mainly accomplished by three classes of proteins: protein tyrosine phosphatases, including protein tyrosine phosphatase, non-receptor type 6 (SHP-1), protein tyrosine phosphatase, non-receptor type 11 (SHP-2) and CD45, protein inhibitors of activated STATs (PIAS) and suppressors of cytokine signaling (SOCS). Abberant JAK signaling is primarily observed in hematological malignancies where downregulation of JAK/STAT inhibitors, activating mutations, amplification or chromosomal translocation at the JAK2 locus are common events (reviewed in [40, 41]).
M. Zachariadis et al.
74.2.1.4 Nuclear Transcription Factors All signaling pathways eventually lead to the activation of TFs. These proteins contain motifs that allow them to attach to specific DNA sequences, characteristic for each TF family. These DNA sequences lie within the promoter, enhancer or silencer of the target gene, and binding of a TF enhances or represses transcription. Target genes participate in diverse cellular processes, including regulation of entry into and progression of the cell cycle, differentiation, hematopoiesis, apoptosis, metastasis and angiogenesis; hence, their deregulation may be involved at some stage of carcinogenesis. MYC. The c-MYC protooncogene (reviewed in [42– 45]) is expressed in proliferating tissues and is tightly controlled by extracellular signaling, i.e., mitogens induce its expression while growth-inhibitory signals suppress the production of the encoded protein. MYC lies at the crossroad between cell proliferation and apoptosis. It exerts its function by forming active heterodimers with MYC associated factor X (MAX), which can bind to specific DNA sequences with a core CA(C/T) GTG motif and induce transcription. MAX may also form heterodimers with MAD/MXI family members. The latter bind the same DNA motifs as the MYC/MAX complexes, yet their role is inhibitory. Thus, regulation of proliferation, apoptosis and differentiation may be achieved through the opposing activities of MYC/MAX and MAX/MAD on common target genes. These include, among others, cyclin D2, cyclin-dependent kinase (CDK)4, cullin (CUL)1, CDC28 protein kinase regulatory subunit 2 (CKS2), E2F2 and E2F3, inhibitor of DNA binding 2 (ID-2), cyclin-dependent kinase inhibitor 2B (p15INK4B) and p21WAF1/CIP1; all of which are key molecules controlling life and death decisions by the cell. Thus aberrant MYC activity may have a profound impact upon malignant transformation. In fact, deregulation of c-MYC is observed in many cancers, as a result of gene amplification, translocation and/or c-MYC overexpression. In Burkitt’s lymphoma (BL), the c-MYC locus is translocated to segments of chromosomes 14, 2 or 22, bringing the gene under the influence of immunoglobulin (Ig) enhancer sequences and resulting in aberrant expression of MYC. These abnormal rearrangements occur early in B-cell ontogeny and bear an etiologic relation to the development of BL. c-MYC amplification has been observed in a number of carcinomas, including primary breast tumors, where it correlates with amplification of ERBB-2. N-MYC, a homolog
74
Molecular Carcinogenesis
of c-MYC, is amplified in up to 30% of patients with neuroblastoma and its presence is associated with a clinically aggressive variant of the disease [136]. MYB. The MYB family of TFs includes A-MYB, B-MYB and c-MYB. While A-MYB and c-MYB show tissue-specific expression, B-MYB is abundantly expressed in hyperproliferating tissues, representing the family member particularly associated with tumorigenesis [46]. Several transcription co-factors bind to B-MYB, including CREB binding protein (CBP)/p300, cyclin D1, poly (ADP-ribose) polymerase (PARP)1, p107, and cyclin-dependent kinase inhibitor 1C (p57KIP2), which either enhance or suppress its activity. Its DNA binding site contains the consensus sequence C/ TAACNG. The attachment of B-MYB to proteins like cyclin D1 and p107 shows that its activity is tightly related to cell cycle control and progression. Moreover, B-MYB is implicated in the regulation of apoptosis by controlling the expression of survival (anti-apoptotic) genes like BCL2 and APOJ/clusterin. Intriguingly, under certain conditions, MYB may even promote cell death. Deregulation of B-MYB has been detected in many cancers. More specifically, gene amplification has been observed in cases of breast, liver and ovarian carcinoma, as well as cutaneous T lymphoma, while overexpression of the protein has been demonstrated in testicular and prostatic malignancies, as well as in advanced stage neuroblastomas. AP-1. Activator protein (AP)-1 is a heterodimeric TF, composed by a FOS family member (c-FOS, FOSB, FRA-1, FRA-2, FOSB2 or delta-FOSB2) (reviewed in [47]) in complex with a JUN family member (c-JUN, JUNB or JUND) (reviewed in [48]). AP-1 proteins bind the promoter or enhancer regions of target genes via TPA (12-O-tetradecanoylphorbol-13-acetate)-responsive elements (TREs: TGAC/GTCA). The various heterodimers formed between FOS and JUN members exhibit activating or repressing activities with different biological effects. This is partly achieved due to the fact that FOS proteins have unique attributes and demonstrate tissuespecific activities. The target genes of AP-1 are implicated in a wide spectrum of cellular processes, including invasion and metastasis, proliferation, differentiation, survival and angiogenesis. Indeed, deregulation of the FOS units of the aforementioned complexes are implicated in the pathogenesis of various tumors, including osteosarcomas and chondroid neoplasias, breast, lung, esophageal, colorectal and thyroid carcinomas. Notably, malignant transformation and progression is accompa-
983
nied by a shift in the expression of FOS isoforms in specific cell types (reviewed in [47]). SP1. Specificity protein 1 (SP1) (reviewed in [49]) belongs to the growing family of transcription proteins known as SP/Krüppel-like factors (KLFs) and interacts with the DNA helix via GC/GT boxes. SP/KLF proteins seem to play critical roles in normal development, while their deregulation has been implicated in tumor growth and metastasis. SP1 overexpression in gastric and pancreatic tumors is closely related with high levels of vascular endothelial-derived growth factor (VEGF), while its overexpression in colon tumors is associated with upregulation of Ku70 and Ku80. VEGF represents a direct marker of angiogenesis, while Ku70 and Ku80 are involved in telomere maintenance and prevention of apoptosis. These observations point towards a link between SP1 deregulation and carcinogenesis. ETS. The ETS family of TFs represents one of the largest protein families regulating transcription. Their interaction with DNA is mediated by a winged helixturn-helix domain, which recognizes the core sequence GGAA/T. The target genes of the ETS family are more than 400 and are clearly involved in diverse cellular processes, including proliferation, differentiation, hematopoiesis, apoptosis, metastasis and angiogenesis (reviewed in [50]). The activity of ETS proteins is modulated via their interaction with co-factors or through post-translational modification. These events may regulate transcriptional activation, as well as target gene and tissue specificity. Aberrations of different ETS family members have been observed in various neoplasias. Amplification and overexpression have been reported in hematologic malignancies (leukemia, lymphoma, AML and acute nonlymphoblastic leukemia), as well as invasive and metastatic solid tumors (breast, lung, colon, pancreatic, thyroid, esophageal and cervical cancer). Point mutations have been identified in AML, which result in the attenuated ability of PU.1 (a member of the ETS family) to control the expression of its target genes. Finally, distinct translocations involving ETS genes have been identified in various hematologic malignances, Ewing’s sarcomas and primitive neuroectodermal tumors.
74.2.1.5 Cell Cycle Regulators The preceding paragraphs suggest that aberrations in proto-oncogenes may eventually lead, among others,
984
to enhanced cell proliferation. Hence, it is valid to assume that key components of the pathways governing cell cycle progression, and particularly those in control of G1/S, S/G2 and G2/M phase transitions (e.g., cyclins and CDKs), represent common targets for deregulation during carcinogenesis. Indeed, CCRs are deranged in many types of cancerous lesions and, even though these aberrations may sometimes reflect indirect consequences of “hits” in their upstream molecular pathways, it has been well established that CCRs themselves, and mainly cyclins and CDKs, exert oncogenic activities [51]. For example, a mutant CDK4 acts as dominant oncogene in a hereditary form of melanoma, while a low-molecular-weight isoform of cyclin E is overexpressed specifically in a subset of breast cancers, and its presence is associated with a poor outcome [52, 53]. Furthermore, cyclin A is frequently overexpressed in many types of cancer and this event signifies poor prognosis (reviewed in [54]). Cyclin D1. D-type cyclins are the major effectors of G1 entry and progression, and are among the first genes expressed following GF stimulation. They bind to and activate certain G1 CDKs, namely CDK4 and CDK6. These complexes are further phosphorylated by CDK-activating kinase (CAK) in order to be fully activated. The active complex is now able to phosphorylate members of the retinoblastoma protein family (pRbs), thus alleviating their inhibitory effect upon the E2FTFs, which in turn promote transcription of several genes needed for G1/S transition and cell cycle progression. There are three D-type cyclins, among which cyclin D1 is most often associated with cancer. There is amble evidence that cyclin D1 as an oncogene, besides being a potent enhancer of proliferation, exerts additional functions related to angiogenesis and resistance to apoptosis [55]. Overexpression of cyclin D1 has been reported in several cancer types, including breast, colon, lung, bladder and liver carcinomas. Nevertheless, in certain experimental models, it seems that overexpression per se is not enough to induce transformation. In these cases, cooperation with other oncogenes, like RAS and c-MYC, promotes malignancy. Some studies suggest that cyclin D1 overexpression enhances the transcription of FGF receptor (FGFR)1, thereby establishing an autocrine growthstimulating loop [55]. Interestingly, there is accumulating evidence that the oncogenic activity of cyclin D1 is tightly related to its location in the nucleus during G1/S transition. Normally, at this point, cyclin D1
M. Zachariadis et al.
is exported from the nucleus upon phosphorylation by GSK-3b and is proteolytically degraded at the cytoplasm. A mutant cyclin D1, refractory to nuclear export, is able to promote cell transformation independently of other oncogenes. Moreover, the cyclin D1b isoform, a product of alternative splicing, lacks the domain responsible for nuclear export and is constitutively located at the nucleus. This isoform has been detected only in primary esophageal carcinoma tissue but not in normal mucosa, suggesting that cyclin D1b is a cancer-specific marker. In addition, it has been found that in endometrial cancer, cyclin D1 suffers mutations that probably perturb nuclear export [56]. E2F. An important effector of cell cycle regulation is the E2F family of TFs mentioned previously. There are six established members of the family (E2F1 to 6), which form active heterodimeric complexes with members of the DP family of TFs. E2Fs either behave as oncogenes or tumor suppressors depending on the cellular context (reviewed in [57, 58]). Moreover, both their overexpression and suppression have been implicated in tumorigenesis and tumor inhibition, depending on the entity under study. In NSCLC, E2F1 is overexpressed in the cancerous tissue, an increase that positively correlates with the tumor growth index. Moreover, while apoptosis does not seem to be influenced, E2F1 expression relates to p53-MDM2 deregulation. An analogous mode of expression has been observed in cases of breast, thyroid and pancreatic tumors. Other members of the E2F family, namely E2F4 and E2F5, are also associated with breast carcinogenesis and a poor prognosis, interestingly, in association with aberrations in other oncogenes, such as c-MYC and c-MOS. E2F3 is also overexpressed or amplified in human tumors, including bladder carcinomas, prostate cancer and NSCLC, and its presence correlates with tumor progression, invasiveness and poor survival rates. In contrast to the above findings, inactivation of E2F1 in animal models accelerates tumor development in squamous epithelial tissues when combined with MYC overexpression. In line with the latter observation, E2F1 overexpression in colon cancer exhibits anti-tumor properties through induction of apoptosis. An analogous action is evident in large B-cell lymphomas. In fact, in rodent models, the absence of E2F1 predisposes to the development of tumors, in a tissue-specific manner. The same attribute also applies to E2F2 and E2F3, but not to E2F4, E2F5 and E2F6. It seems that E2F may act at multiple levels in inhibiting transformation. In one particular
74
Molecular Carcinogenesis
model of RAS-driven skin carcinogenesis, overexpression of E2F1 promotes transcription of the tumor suppressor cyclin-dependent kinase inhibitor 2A, alternative reading frame (ARF or p14ARF) and, consequently, enhances p53 activity. Nevertheless, it is becoming more and more evident that E2F1 suppressive activity proceeds through a nonapoptotic mechanism. It is possible that E2F may promote premature senescence, in cooperation with other tumor suppressors, including p14ARF, p53, p21WAF1/CIP1 and cyclin-dependent kinase inhibitor 2A (p16INK4A). It should also be noted that E2F1 seems to participate in the cellular response to DNA damage due to ionizing radiation (IR) and other genotoxic agents, but not ultraviolet (UV) radiation. In this DNA damage response (DDR) pathway, E2F1 is targeted to the promoter of the p73 gene through acetylation. Expression of p73 is subsequently enhanced, contributing to the induction of apoptosis, thus preventing the accumulation of potentially transforming lesions. CDC6/CDT1. Beside cyclins and CDKs, other CCRs are implicated also in malignant transformation, exhibiting oncogenic characteristics. Cell division cycle 6 homologue (CDC6) and chromatin licensing and DNA replication factor 1 (CDT1) constitute a complex that is vital for both DNA replication and orderly progression through the S phase. Together, they represent the loading factors of minichromosome maintenance (MCM) proteins on DNA, in order for replication to begin. They also regulate replication by surveying cell cycle checkpoints, thereby ensuring that the genome is only replicated once per cycle. It is not surprising that these proteins are often deregulated in malignancy, as observed in prostate cancer, cervical carcinoma, brain tumors and NSCLC, since their aberrant function eventually leads to genomic instability. Furthermore, immunoexpression of CDC6/CDT1 in cervical carcinomas may serve as a more useful prognostic marker than proliferating cell nuclear antigen (PCNA) and Ki-67 (reviewed in [59]). Importantly, deregulation of CDC6/ CDT1 is a relatively early event in carcinogenesis, which, on the ground of p53 aberrations, confers an enhanced transforming potential [60, 61]. The above reference to various oncogenes implicated in tumorigenesis is rather simplistic and one should bear in mind that oncogenes often exert diverse and opposing activities. Furthermore, additional alterations in different molecular pathways are necessary in order for transformation to occur. Clearly, the distinct networks in control of proliferation, differentiation, senescence and apoptosis
985
are interconnected, with aberrations in one signaling route possibly bearing additional adverse effects on other pathways or being overcome by compensatory activities of parallel molecular networks.
74.2.2 Insensitivity to Growth-Inhibitory Signals: Role of Tumor Suppressor Genes In parallel to growth signals, normal cells are also subject to growth-inhibitory stimuli from their surrounding microenvironment, which are also perceived through membrane receptors. These signals play a nodal role in physiological and orderly development of tissues and organs, since they control proliferation and differentiation. Exit from the proliferating state is achieved through two distinct pathways: (i) entry into quiescence (G0) and (ii) initiation of a morphogenetic/ differentiation cascade. Cells in a quiescent state can reassume the active proliferating state if properly stimulated. On the other hand, cells initiating a differentiation program usually lose their pluripotency and are seldom able to reenter the cell cycle. The growthinhibiting signals are processed and integrated by a large group of proteins, known as tumor suppressor proteins, which are also activated whenever genomic integrity is compromised. Tumor suppressors prevent cells from acquiring malignant characteristics. Vital tumor suppressors include cell-cycle checkpoint regulators, entrusted with the responsibility of arresting the cell cycle upon conditions that threaten genomic integrity. Others are critical components of the DNA repair machinery, apoptotic pathways, cell-to-cell adhesion complexes and immune surveillance networks. Collectively, they safeguard cellular homeostasis and represent common targets of carcinogenic processes. p53. The TP53 tumor suppressor gene, located on chromosome 17p13, encodes a multifunctional 53-kD TF (p53) involved in DDR, cell cycle control at checkpoints, apoptosis and differentiation decisions. Due to its multiple oncosuppressive activities, p53 has often been referred to as the “guardian” of genomic integrity [62]. Deregulation of p53 and its associated network of proteins is a nearly ubiquitous phenomenon in epithelial carcinogenesis, with TP53 mutations representing the most frequent genetic lesions. Sequence analysis has
986
identified several mutational “hotspots,” depending on the tumor type and population under study. Allelic loss represents an alternative mechanism of gene inactivation, while elevated p53 immunoexpression, attributable to the increased half-life of mutant gene products, is a common neoplastic feature as well. Research groups have repeatedly reported p53 aberrations in diverse premalignant lesions, suggesting that they represent early carcinogenic events. It has been long speculated that TP53 mutations may emerge as a direct consequence of genotoxic exposure. Recently, a more convincing explanation was offered. Activation of the cellular DDR – a pathway that engages p53 as one of its essential components – occurs very early in the tumorigenic process as a result of replication stress. In order for malignant transformation to take effect, potentially neoplastic clones need to overcome the “anti-cancer barrier” posed by DDR (i.e. arrest, senescence, and apoptosis). This requirement increases the selective pressure toward inactivating TP53 mutations, justifying the high incidence of these alterations in human cancer [63, 64]. An additional mechanism employed by tumor cells to counteract p53mediated surveillance is the elimination of CDK inhibitors (CDKIs), such as p14ARF and p21WAF1/CIP1 [65]. The former blocks MDM2, a protein that induces p53 degradation via ubiquitination, while the latter is an effector of p53 growth-suppressive activities. Loss of p14ARF and/or p21WAF1/CIP1 expression is not uncommon among neoplasms carrying wild-type TP53 and has been associated with an adverse prognosis (Fig. 74.3). pRb. The retinoblastoma susceptibility gene (RB1) at 13q14.2 was the first human tumor suppressor identified. It encodes a pivotal CCR which, along with its molecular network, represents a common and early target in many tumor types [66]. When in a hypophosphorylated state, the pRb binds to and blocks E2F TFs, which, as discussed previously, normally promote G1/S progression. Mono- and bi-allelic deletions in RB1 lead to pRb underexpression, while post-translational inactivation of the protein via phosphorylation yields a characteristic aberrant immunoexpression pattern. Either type of pRb inactivation is present in 60–70% of epithelial tumors, while an even higher proportion of neoplasms demonstrate some alteration in at least one component of the so called “pRb protein network.” This includes the CDKIs that are entrusted with restraining pRb phosphorylation and, particularly, p16INK4A. The p16INK4A gene locus maps to 9p21 and encodes a protein that blocks the formation of the pRb-phosphorylating complex between cyclin D1
M. Zachariadis et al.
and CDKs 4 and 6 (Fig. 74.3). In the absence of p16INK4A, pRb is phosphorylated and G1/S checkpoint control is abrogated [65]. Deregulated expression of p16INK4A, as a result of point mutations, homozygous deletions or hypermethylation of CpG islands within the gene’s promoter, has been reported in a variety of respiratory tract, digestive tract and reproductive system tumors. INK4/ARF. The INK4/ARF locus on chromosome 9p21 encodes the tumor suppressors p15INK4B, p16INK4A, and p14ARF (reviewed in [67–69]). These represent examples of the so called “gatekeeper” genes, referring to genes that control cell growth [70]. Generally, these proteins act as inhibitors of proliferation via two distinct pathways that are activated as a response to aberrant growth and replication stress. Primarily, p15INK4B and p16INK4A are potent inhibitors of CDK4 and CDK6, thereby preventing their interaction with their “partner” proteins, i.e., the D-type cyclins. This process renders CDK4/6 inactive, unable to phosphorylate pRb , thereby inhibiting the suppressing action of pRb on E2F. Consequently, E2F remains inactive and the cell cycle is arrested. On the other hand, p14ARFconstitutes a key regulator of p53 activity. p14ARF binds to its major target, MDM2, rendering the latter unable to exert its blocking action upon p53 and, as a result, p53 is stabilized. In addition, p14ARF inhibits the activity of ARF-BP1/Mule (p14ARF-binding protein 1/MCL-1 ubiquitin ligase E3), which disrupts p53 via ubiquitination in an MDM2independent manner. Surprisingly, this protein also ubiquitinates and promotes degradation of the antiapoptotic protein myeloid cell leukemia sequence (MCL)-1, a member of the BCL-2 family. It is evident that ARF-BP1 exerts two opposing activities, since it inhibits and, at the same time, induces apoptosis. The intracellular localization of p14ARF seems to compensate for this apparent discrepancy. p14ARF is a predominantly nuclear protein, thereby is able to inhibit ARF-BP1 in that cellular compartment, rescuing p53 and promoting apoptosis. In the cytoplasm, where p14ARF is less than abundant, ARF-BP1 succeeds in enforcing the degradation of MCL-1, further sustaining apoptosis [67]. The INK4/ARF locus plays a central role in inhibiting proliferation under conditions of genomic imbalance and is often targeted in the course of malignancy. Indeed, this locus is among the most frequently deleted loci in human cancers, particularly in glioblastomas, melanomas, pancreatic adenocarcinomas, NSCLCs and bladder carcinomas. Inactivation of p16INK4A through deletion, point mutations or promoter methylation is observed in about
74
Molecular Carcinogenesis
30% of neoplasms. p14ARF is less frequently mutated, and its inactivation is mostly due to homozygous deletion of the entire INK4/ARF locus. Hence, its role as a “true” tumor suppressor has been questioned, as it is often lost alongside p16INK4A. However, p14ARF exerts its inhibitory action on MDM2 through a small peptide of 25 aminoacids; thereby inactivating point mutations affecting this region should be extremely rare. Nevertheless, a few mutations affecting p14ARF have been detected in certain familial cases of melanoma and astrocytoma. Moreover, since the oncogenic TFs Twist and AML-ETO have been shown to specifically repress p14ARF, the tumor suppressor character of p14ARF has been reinstated [69]. p27. p27KIP1 is an essential regulator of cell cycle progression, which inhibits the activity of cyclin/CDK2 complexes during G0 and G1, thereby halting the transition toward the S phase and, subsequently, G2 and mitosis [71-74]. A critical event for the progression of the cell cycle is p27KIP1 degradation by ubiquitin-dependent proteolysis, which is accomplished via two distinct signaling cascades: the first is cyclin E/Cdk2 phosphorylation-dependent, implicating the SCFSkp2 pathway (reviewed in [75]), while the second involves mitogendependent proteolysis through activation of the RAS/ MAPK or the PI3K/AKT axis (Fig. 74.3). However, p27KIP1 does not act solely as a cell cycle suppressor, since it also promotes G1 progression by facilitating the formation of D-type cyclin complexes with Cdk4/6. The involvement of p27KIP1 in cancer presents novel features: in human malignances, most tumor suppressor genes are subject to various genetic lesions; instead, p27KIP1 is deregulated via accelerated proteolysis, sequestration or cytoplasmic translocation. Deletions or inactivating point mutations are rarely observed at the p27KIP1 locus. Nevertheless, loss of p27KIP1 expression is a common feature of human carcinomas, detected in up to 60% of neoplasms of the colon, breast, prostate, lung and ovaries, as well as in brain tumors and lymphomas, representing, in many cases, an independent prognostic marker that correlates with tumor grade and a poor clinical outcome. The accelerated proteolysis of p27KIP1 observed in many tumors is the direct effect of SCFSkp2 upregulation that characterizes these cases. Yet, the latter is not a widespread phenomenon in human malignances. In all other cases, exhibiting normal SCFSkp2 levels, p27KIP1 proteolysis is probably mediated by oncogenic activation of tyrosine kinase receptors and/or the RAS and PI3K pathways. Physiological inactivation
987
of p27KIP1 may also be achieved via sequestration, for example, through binding to cyclinD/Cdk4/6 complexes, as observed in a subgroup of lymphomas. Cyclin D1 overexpression is a common feature among malignant tumors, typically induced by oncogenic activation of c-MYC. Moreover, in up to 40% of breast carcinomas, p27KIP1 is mislocalized to the cytoplasm, a feature associated with tumor dedifferentiation and poor prognosis. Cytoplasmic translocation of p27KIP1 seems to be dependent on the PI3K/PKB pathway, which is also often deregulated in human cancers. PTEN. PTEN is a potent phosphoinositide 3-phosphatase, converting PIP3 to PIP2, thereby antagonizing the activity of oncogenic PI3K. Hence, it exerts antiproliferative and apoptotic activities, which have nominated PTEN as a tumor suppressor protein (reviewed in [76-79]). Numerous mutations and/or deletions in the PTEN gene have been identified in many tumors, including GBM, prostatic and endometrial carcinoma. Furthermore, germline mutations in the PTEN locus are associated with multiple hamartoma syndromes, such as Cowden syndrome, in which patients are predisposed to developing carcinoma of the breast, thyroid and endometrium. The role of PTEN inactivation in malignant transformation seems to be context-dependent, i.e., the synergistic interaction between PTEN loss and other oncogenic processes may influence the outcome.
74.2.3 Evasion of the Anti-tumor Barriers (Apoptosis and Senescence) Virtually, all cells possess an intrinsic self-destructive mechanism, called apoptosis, which is activated in response to extracellular stimuli, such as severe hypoxia, as well as to a variety of aberrations in cell physiology, including irreversible DNA damage, oncogenic pressure, insufficiency of survival factors and others (reviewed in [80-82]). Once initiated, the cellular components are degraded in a tight schedule, spanning 30–120 minutes. This mechanism is vital during development and morphogenesis, and is also required for tissue remodeling and repair upon external insults, such as injury and infection. Moreover, apoptosis represents an effective barrier to carcinogenesis and its deregulation is a prerequisite for malignant transformation. The end effectors of the apoptotic machinery comprise a class of cystein proteases, which are proteolytically
988
M. Zachariadis et al.
activated and fall into two categories, the“initiator caspases” (i.e., caspases-8, -9 and -10) and the “executioner caspases” (caspases-3, -6 and -7), which selectively degrade cytoplasmic organelles and structures, as well as nuclear structures and chromatin (Figs 74.2 and 74.4). Two different key apoptotic pathways have been established. The first is activated in response to extracellular stimulation, mediated by receptor/ligand complexes, conveying either survival signals (e.g., insulin-like growth factor (IGF)1/2 and IGF-1 receptor [IGF-1R]; interleukin [IL]-3 and IL-3 receptor [IL-3R]) or death signals (e.g. FAS ligand [FASL]/FAS; tumor necrosis
factor (TNF)-α/TNF-receptor 1 [TNF-R1]). The former complexes are integral for continued cell prosperity, with their absence leading unambiguously to cell death. The second pathway is triggered upon various forms of cellular stress and is realized, at large, through the BCL-2 protein family at the mitochondria. FASL/FAS. FAS belongs to the “death receptor protein family,” a subset of the TNF receptor superfamily, and was identified by its propensity to induce rapid death of tumor cells upon stimulation [81, 83]. FAS is activated through binding of its ligand (FASL), whose expression is restricted to specific tissues, i.e., those of
Fig. 74.4 Schematic view of the cellular DNA damage response (DDR) to exogenous and endogenous stimuli. Double- and single-strand breaks (produced by the exposure to radiation, chemicals or even randomly) activate the ATM/Chk2 and ATR/ Chk1 axes, respectively. These pathways may act independently or, sometimes, cooperatively in order to impose decisions that
affect progression through the cell cycle and apoptosis. In both routes, p53 plays a prominent regulatory role (the green lines denote activation; the red lines denote inhibition; dotted lines indicate indirect effect; the red Xs denote cell cycle arrest; see text and corresponding literature for details)
74
Molecular Carcinogenesis
the lungs, testes and eyes. FAS signaling is mediated by proteins possessing distinct modular motifs, namely death domains (DD), death effector domains (DED) and caspase recruitment domains (CARD). Upon activation, FAS assembles an intracellular signaling complex called death-induced signaling complex (DISC). Via the adaptor FAS-associated protein with death domain (FADD), procaspase-8 or, alternatively, procaspase-10 are recruited to the complex. Dimerization and autoproteolytic activation of caspase-8 or -10 is followed by activation of caspases-3 and -7, which finally execute the apoptotic program. The conditions imposed by the microenvironment of transformed cells (e.g., absence of the appropriate substrates and/or limited amounts of survival factors) are such that markedly enhance FAS signaling and ensure apoptosis. Therefore, during malignancy, the rescued cells are expected to be resistant to FASmediated apoptosis. During early tumor development, cells expressing FAS are susceptible to death induced by FASL-expressing lymphocytes. On the other hand, tumor cells expressing both FAS and FASL are able to establish an autocrine loop leading to massive apoptotic destruction. This process may eventually promote the selection of clones that have developed resistance to FAS-mediated cell death. Indeed, loss of FAS expression or function has been documented in a variety of cancers, including melanoma, colon, esophageal, breast and HCC. Moreover, constitutive expression of FASL is also observed in human malignances, such as melanoma, astrocytoma, lung, hepatocellular, ovarian and esophageal carcinoma. The mechanisms involved in the evasion of FAS-mediated apoptosis are quite diverse. Constitutive expression of FASL by tumor cells seems to confer an immune privilege and induce peripheral tolerance, promoting apoptosis of FASpositive lymphocytes. Additionally, the oncoprotein RAS downregulates FAS expression through the PI3K/AKT pathway, thus promoting resistance to FAS-mediated death. BCL-2. The members of the BCL-2 family of proteins, divided into three discrete groups, produce an effective adaptation that controls stress-induced apoptosis [80, 82, 84, 85]. The first subfamily, consisting of the so called “pro-survival proteins,” includes B-cell CLL/ lymphoma 2 (BCL-2), BCL2-like 1 (BCL-XL), BCL2like 2 (BCL-w), A1, and MCL-1, which exert antiapoptotic activities. The other two subfamilies are
989
considered pro-apoptotic and are both required for apoptotic cell death. The BCL-2 homology domain (BH)3-only subfamily includes BCL2-interacting killer (BIK), BAD, BH3 interacting domain death agonist (BID), BCL-2 interacting mediator of cell death (BIM), BCL2 modifying factor (BMF), harakiri, BCL2 interacting protein (HRK), NOXA, and BCL2 binding component 3 (PUMA). The main activity of the BH3-only subfamily lies on restraining BCL-2 prosurvival protein subfamily-mediated inhibition of the third subfamily, the BAX-like proteins. The latter subfamily includes the BCL2-associated X protein (BAX), BCL2-antagonist/ killer (BAK) and BCL2-related ovarian killer (BOK) proteins, which act downstream of the other two groups. BAX-like proteins form oligomeres that are present on organelle membranes, mainly on mitochondria, enhancing membrane permeability and the release of a multitude of cytotoxins, including cytochrome c. The latter induces activation of apoptotic peptidase activating factor (APAF)-1, which is essential for apoptosome formation and recruitment of procaspase-9. This event is followed by activation of caspase-9 and subsequent stimulation of the executioner caspases. It should be noted that p53-mediated apoptosis, in response to DNA damage, employs this particular pathway. NOXA and PUMA are under the transcriptional control of p53 and are both upregulated upon DNA damage, leading ultimately to enhanced apoptosis (see Figs 74.2 and 74.4). Aberrations at any point of the aforementioned pathway is likely to confer an anti-apoptotic advantage to transformed cells. The hallmark of B-cell lymphomas of the follicular type is the t(14;18) chromosomal translocation that involves the BCL-2 locus. This abnormal rearrangement leads to overexpression of BCL-2, preservation of survival and propagation of B-cells. The oncogenic role of BCL-2 may also be complemented by other oncogenes, like c-MYC, as shown in experimental animal models. On the other hand, since prosurvival BCL-2 exhibits oncogenic characteristics, its antagonists are anticipated to exhibit tumor-suppressor traits. Indeed, BIM acts as a potent tumor suppressor in experimental models. Consistent with this role, in a substantial proportion of mantle cell lymphomas, both alleles of the BIM gene are deleted, while in the majority of BLs, characterized by a MYC translocation, the BIM locus is silenced through methylation of its promoter. Moreover, mutations of NOXA have been identified in a small number of lymphomas, while some human colorectal carcimomas and
990
hematopoietic tumors exhibit mutations in BAX and BAK. Aside from apoptosis, potentially malignant and transformed cells face yet another major intrinsic antitumor cellular barrier, namely senescence [86-89]. Two main forms of senescence have been described to date: replicative senescence (further analyzed below), and oncogene-induced cellular senescence (OIS). As initially observed in vitro and later confirmed in vivo, the expression of several different oncogenes, including RAS, RAC1, RAF, MOS, MYC, E2F and BRAF, induces cellular senescence. Moreover, downregulation of the tumor suppressor PTEN also results in senescence. Interestingly, human nevi are composed of senescent melanocytes carrying BRAF-activating mutations, an observation that established cellular senescence as a physiological homeostatic tumor-suppressive mechanism. In fact, the stimulation of an oncogene often leads to downstream activation of a tumor-suppressor protein. For example, H-RAS signaling leads to premature senescence. H-RAS exerts its activities through the RAF/MEK/ERK pathway leading to the activation of p38 MAPK and, consequently, to p53 and p16INK4A upregulation. Overexpression of E2F also leads to premature senescence, mediated by E2F-induced transcription of p14ARF and subsequent upregulation of p53. Overexpression of ERBB-2 induces upregulation of p21WAF1/CIP1 through a p53-independent mechanism that also leads to premature senescence. Oncogene activation also triggers DDR signaling through diverse pathways, all converging on p53, leading to senescence. Besides the p53 pathway, senescence relies upon the pRb molecular network that is yet to be fully elucidated. pRb remains hypophosporylated in senescent cells, suggesting that CDK4/6 is actively inhibited by p16INK4A, which is undeniably upregulated during sencescence. Both pRb and p16INK4A, participate in the remodeling of chromatin into senescence-associated heterochromatin foci (SAHF). SAHF are compact chromatin structures associated with modified histones, such as H3K9m, as well as other proteins, such as heterochromatin protein 1 (HP1). These structures are believed to repress the transcription of genes involved in cell cycle progression, many of which are targeted by E2F. Inactivation of the pRb pathway prevents SAHF formation and senescence establishment. One may hypothesize that senescence itself enforces a selective pressure on newly transformed cells, which, eventually, allows some phenotypes that carry aberrant apoptotic
M. Zachariadis et al.
and senescence features to be spared, thereby creating a window toward the development of full-blown malignancy.
74.2.4 Umlimited Replicative Potential Normal cells in culture exhibit a finite number of replication cycles they can perform before entering a state of irreversible growth arrest, termed “first mortality barrier” or “mortality stage 1” (replicative senescence). This process is dependent upon chromosome telomeres (reviewed in [90-92]. Telomeres are repetitive DNA sequences complexed with specific proteins, which are located at the termini of chromosomes and are essential for their stability. In humans, telomeres consist of tandem 5′-TTAGGG-3′ repeats with a single-stranded G-rich 3′ overhang, coupled with three proteins, telomeric repeat binding factor (TRF) and TRF2, and protection of telomeres (POT)1. In normal, cycling cells, these telomeric sequences progressively shorten with each division cycle, mainly due to the intrinsic inability of DNA polymerase to duplicate linear DNA toward its 3′ end. When telomeres reach a critical length, DNA damage signals, similar to DNA double-strand breakinduced signals, are produced, which command the p53 and pRb pathways to block cell cycle progression. Indeed, senescent cells exhibit increased levels of DDR components, including ataxia telangiectasia mutated (ATM), phosphorylated histone H2AX (γ -H2AX), p53 binding protein (53BP)1, mediator of DNA-damage checkpoint (MDC)1, Nijmegen breakage syndrome (NBS), phosphorylated CHK1 and CHK2, which are preferentially located at sites of short telomeres. Inhibition of the p53 and pRb pathways alleviates cell cycle arrest and cells continue to proliferate, demonstrating consequent progressive shortening in telomere length. When a minimal length is reached, chromosome ends are no longer protected and cells enter the state of “second mortality barrier” or “mortality stage 2” (M2), termed “crisis.” Crisis is characterized by massive cell death and extensive chromosomal instability. Nevertheless, a very small number of cells survives crisis due to a mutation or some epigenetic event that rescues telomeres. These cells are considered immortal and are characterized by a plethora of mutations and chromosomal abnormalities, conferring insensitivity to both replicative senescence and OIS. Telomere length
74
Molecular Carcinogenesis
rescue is achieved either through the activation of the enzyme telomerase or through an alternative telomere lengthening (ALT) mechanism. Telomerase (hTR) is an enzyme that extends telomeres and maintains their overhang. It contains an RNA sequence that serves as a template for the addition of the characteristic 5′-TTAGGG-3′ motif. Notably, the catalytic subunit of telomerase is a reverse transcriptase (human telomerase reverse transcriptase or hTERT). While hTR expression is widespread, hTERT is only active in embryonic stem cells and germ cells. Hence, telomerase is inert in normal somatic cells, rendering them unable to overcome the barrier of replicative senescence. In contrast, expression of hTERT correlates with telomerase activity in immortalized and transformed cells. Evidently, up to 90% of transformed cells exhibit reactivation of telomerase, while the remaining 10% employ an ALT mechanism that proceeds via the exchange of telomere ends between chromosomes, indicating that immortalization and maintenance of replicative potential are essential traits of malignant transformation. It should be noted, however, that the reactivation of telomerase or ALT requires a preceding deregulation of both the p53 and pRb pathways, and greatly depends upon the establishment of genomic instability.
74.2.5 DNA Damage Response, Repair and Mitotic Surveillance in Cancer Cells are under a constant stress because of damage to their DNA due to a multitude of agents, both exogenous, like IR and UV irradiation, and endogenous, like reactive oxygen species (ROS). The rate of damage from endogenous sources ranges from 104 to 106 molecular lesions per cell per day [93, 94], and it is even higher for exogenous genotoxic agents. Moreover, the cell senses as damage, errors occurring during replication that lead to stalled or collapsed replication forks (replication stress). The integrity of the genome, which is vital for cell survival and prosperity, is protected by intricate surveillance and repair mechanisms; if a genomic lesion happens during cell division, the cell cycle is halted until the damage is repaired; if the lesion is too extensive or cannot be repaired, there are mechanisms in place that eliminate the damaged cell. Generally, DNA lesions are divided into two major groups: (1) DNA modifications, such as oxidation,
991
alkylation, de-purination/de-pyrimidination, bulky adduct formation of the DNA nucleotide bases, and (2) DNA single strand breaks (SSBs) or double strand breaks (DSBs). The repair mechanisms employed, depend on the type of DNA damage and the phase of the cell cycle the damage occurs. Interestingly, there is substantial overlap between the various repair pathways, allowing different repair mechanisms to converge on the same type of damage [95, 96]. Although this redundancy increases the repair potential of the cell, in certain cases, it can compromise the DNA sequence fidelity [95, 96]. The incidence of accumulated genomic alterations during each cell cycle, which renders the genome progressively unstable, is termed genomic instability. If genomic instability is below a certain threshold, compatible with cell viability, then the cell is in increased risk for neoplastic transformation. Generally, genomic instability is classified into two major types: microsatellite instability (MIN) and chromosomal instability (CIN) [97]. MIN involves changes in microsatellite sequences due to defects, mainly of the mismatch repair mechanism [98], whereas CIN features chromosomal abnormalities, both structural and numerical [99, 100]. Compared with MIN, the mechanisms underlying CIN are much more complicated and not fully elucidated. Structural abnormalities are probably the outcome of aberrations in the DNA-damage response (DDR) machinery and the DSB repair process [101]. It should be noted that DSBs are considered the most catastrophic and mutagenic DNA insults, which may potentially lead to loss of thousands of base pairs [102 -104]. DSBs may occur as a direct consequence of the exposure to exogenous (e.g., IR and chemicals) or endogenous agents (e.g., ROS), or indirectly, as a result of collapsed replication forks [105]. Nevertheless, DSBs are exploited in many physiological processes, such as chiasmatypy during meiosis or recombination events during B-cell differentiation. Unrepaired DSBs can lead to chromosome rearrangements (i.e translocations, amplifications and deletions), which are likely to exert undesirable effects on normal cell physiology. On the other hand, numerical instability most likely occurs as a result of errors during mitotic segregation [99, 100]. Understanding the molecular basis of DDR and the organization and function of the repair machinery is essential in understanding how the impairment of this pathway may contribute to cancer development. Any type of damage to the DNA elicits a DDR, which results
992
in cell cycle arrest at checkpoints. The first step in DDR is detection of the lesion by multisubunit protein complexes and subsequent recruitment and activation of the major response kinases, ATM and/or ATM- and Rad3related (ATR). These kinases belong to the phosphoinositide 3-kinase related family of protein kinases (PIKKs) and are vital components of the highly conserved pathways that govern checkpoint control in response to DNA damage [106-108]. Even though ATM and ATR share significant homology, they respond to different DNA lesions, although partial redundancy has been observed between their actions and they often supplement each other. ATM is mainly activated by DSBs, and is implicated in all cell cycle checkpoints, with the exception of the mitotic ones [106, 108]. ATR, on the other hand, responds to single-stranded DNA (ssDNA), such as that present at stalled replication forks [109], and contributes to the surveillance of the S and S/G2 checkpoints [106, 108]. Detection of DNA lesions employs different protein complexes, depending on the type of damage. Repair of a DSB (reviewed in [110, 111]) proceeds initially through the recruitment of the MRN complex (consisting of the meiotic recombination 11 homolog A (Mre11), Rad50, and Nbs1 proteins) at the lesion site. The MRN is considered the initial DSB sensor, and its Nbs1 component is responsible for initial recruitment and activation of ATM. ATM phosporylates the H2AX histone, and in turn, the γ H2AX recruits additional ATM complexes. Thus, a positive feedback loop is created, enhancing the spread of γ H2AX along the chromatin. Essential to this positive feedback loop are proteins, called DDR mediators, like MDC1, 53BP1, and breast cancer 1 (BRCA1). These proteins help stabilize the MRN-ATM-γ H2AX complexes at the lesion site, provide a scaffold for recruitment of additional response proteins, and amplify and transduce the DNA damage signal to downstream effectors. Likewise, when ssDNA is generated, a similar response is initiated (reviewed in [109, 110, 112]). The ssDNA is rapidly coated by the ssDNA binding replication protein A (RPA), a process that leads to recruitment of ATR at the lesion site through its regulatory subunit, ATR interacting protein (ATRIP) [113]. The molecular mechanism leading to activation of ATR is not fully clarified, but essential players in the process are, a heterotrimeric complex similar to PCNA, composed of Rad9, Rad1, and Hus1 (called the “9-1-1” complex), topoisomerase II binding protein 1 (TopBP1), and claspin [112]. ATR
M. Zachariadis et al.
is believed to respond mainly to ssDNA lesions, but it may also by activated during DSBs, along with ATM. In the latter case, the endonuclease and/or exonuclease activity of MRN (specifically of the Mre11 component) resects the DSB, thus creating ssDNA which is recognized by RPA, and this event brings into play the ATR [114, 115]. Subsequently, the activated ATM and/or ATR, phosphorylate and activate their main downstream targets, the effector kinases CHK2 and CHK1, respectively. These effectors of DDR, initiate several different p53-dependent and p53-independent pathways that arrest cell cycle at various checkpoints (Fig. 74.4), allowing for the repair machinery to resolve the lesion. If the DNA damage cannot be repaired the DDR pathway will lead the cell to apoptosis or senescence [116]. It should be noted that CHK1 and CHK2 kinases are not restricted at the DNA lesion site, but upon activation they are freely diffused throughout the nucleoplasm, where they phosphorylate their targets, and therefore propagate the DDR signal beyond the site of DNA damage [117, 118]. These phosphorylations initiate multiple signaling pathways, ultimately leading to checkpoint enforcement through key factors of the cell cycle machinery. Two major molecular pathways connect DDR with cell cycle arrest at checkpoints, one rapid and transient and one delayed and sustained (reviewed in [119]). The first pathway involves the direct phosphorylation of the phosphatase CDC25, at multiple sites by CHK1 and CHK2, which ultimately leads to its proteasome-mediated degradation [75]. Consequently, CDKs fail to be activated by CDC25-mediated dephosphorylation and the cell cycle halts. Specifically, CDC25 targeting by CHK1/CHK2 arrests cells at G1/S transition, S, or G2 phase due to inactivity of cyclin E/CDK2, cyclin A/CDK2, or cyclin B/Cdc2 kinases, respectively. The second pathway is mediated by p53, and involves the phosphorylation of p53 by both ATM/ATR and CHK1/CHK2, and the concurrent phosphorylation by ATM of the negative regulator of p53, MDM2. These events lead to the stabilization and accumulation of p53, and also to enhancement of its transcriptional activity. The ensuing upregulation of the CDK inhibitor p21WAF1/CIP1, leads to inhibition of most of the cyclin-CDK complexes and to sustained and even permanent cell cycle arrest, although this event happens much later than the CDC25-mediated cell cycle arrest. As mentioned previously, cell cycle arrest is meant to allow for the repair of the DNA. The repair of a given DNA lesion is dependent on its type and the phase of
74
Molecular Carcinogenesis
the cell cycle it occurs, and employs many different strategies [see Table 1]. For example DSBs are repaired mainly through two major mechanisms: homologous recombination (HR) and non-homologous end joining (NHEJ). The HR repair of the damaged strand requires an intact homologous sequence, and thus it is error-free. In contrast, NHEJ processes the damaged DNA strands without using a homologous template; hence it is prone to errors [96, 120-122]. In cases of modified bases (e.g oxidized, alkylated or deaminated) and SSBs, the base excision repair (BER) machinery is employed [123]. The lesion is recognized and subsequently excised, creating a SSB that can be processed by either short-patch BER (SP-BER), where a single nucleotide is replaced, or long-patch BER (LP-BER), where 2-10 new nucleotides are synthesized [98]. The gap that is created from the excision of the damaged base(s) is filled by polymerase-β (Pol β) using the complementary base as a substrate. Base mismatches arising as a result of replication errors or after the exposure to several exogenous agents (e.g. substitution, insertion, deletion mismatches) trigger the mismatch repair (MMR) machinery, leading to the excision of the lesion and gap filling by DNA repair synthesis and ligation [124]. Cell's final level of surveillance occurs during mitosis as its genetic material is equally divided to daughter cells. Mitosis is a highly accurate and orchestrated process which leads to aneuploidy and CIN when deregulated. Mitotic integrity is assured by multiple checkpoints that control timely progression from one phase to the next. Identified checkpoints occur between prophase and prometaphase controlled by checkpoint with forkhead and ring finger domains (CHFR), implicated primarily in sensing microtubule poisons [125], metaphase and anaphase governed by the spindle assembly checkpoint (SAC) and the more recent cytokinesis or “abscission” checkpoint dependent by Aurora-B [126]. Perhaps the most important checkpoint that controls mitotic progression is SAC which ensures the correct alignment of the chromosomes before segregation [127]. In cells with intact SAC, chromosome misalignment at the equatorial level delays anaphase onset until the defect is corrected. During prophase and prometaphase the unattached kinetochores recruit the SAC machinery which includes, mitotic arrest deficient (MAD)1, MAD2, monopolar spindle (MPS)1, budding uninhibited by benzimidazole (BUB)1, BUB3, BUB receptor 1 (BUBR1), the ZW10-ZWINT-ZWILCH complex and centromere protein-E (CENPE). Briefly, BUBR1 kinase
993
is activated upon binding with CENPE recruiting the MAD1-MAD2 hetrodimer, in a ZW10-ZWINTZWILCH dependent manner. This complex sequesters CDC20, blocking the activation of the anaphase promoting complex (APC). Upon proper chromosome alignment the MAD1-MAD2 heterodimer is released and APC is activated, targeting securin (PTTG1) and cyclin-B for destruction. Destruction of securin releases separase which in turn cleaves the cohensin rings promoting chromosome segregation. Even though, all the above mechanisms appear as favorable targets for cell transformation, it appears that during carcinogenesis the repair pathways and mitotic checkpoints do not represent a primary target, possibly because their functional efficiency is essential for cell viability. Alternatively, targeting the signaling pathways that regulate cell death or senescence, such as p53 or p16INK4A (see previous sections), can provide the cells undergoing transformation the required time for selecting the “fittest environment” for their progression and expansion.
74.2.6 Sustained Angiogenesis Neoangiogenesis is an essential trait for tumor development and metastasis [128]. Without the supply of nutrients and oxygen through the blood, tumors are unable to achieve a size larger than 1–2 mm3. This size reflects the limited distance that nutrients and oxygen may travel following the penetration of capillary walls via diffusion. In order to obtain larger sizes, tumors must acquire angiogenic attributes, which are typically absent from newly transformed cells. It is believed that at some point of malignant development, an “angiogenic switch” occurs, which allows the formation of new blood vessels and supports tumor growth. Angiogenesis is a complex procedure, tightly surveyed at all times, and strictly coordinated with normal tissue growth during organogenesis and development. It mainly depends on a fragile balance between a multitude of pro-angiogenic and anti-angiogenic factors. Tipping of this balance toward either way may effectively enhance or inhibit angiogenesis. The process evolves through a carefully orchestrated series of cellular and molecular events, promoting proliferation, migration and differentiation of endothelial cells into new capillaries that eventually develop into mature
994
vessels. Nevertheless, angiogenesis within the tumor context, seems to proceed in an aberrant way, resulting in the formation of irregular and tortuous vessels with a defective basement membrane and an increased permeability. This feature is a clear sign of the disturbed balance between angiogenic and angiostatic factors in cancer. Stimulation of angiogenesis is the combinatory effect of the upregulation of proangiogenic factors, such as VEGF and bFGF, and the downregulation of angiostatic factors, such as thrombospondin-1 (TSP-1) and endostatin/angiostatin. VEGF is a potent angiogenic cytokine that is often expressed in human malignances, playing an essential role in tumor “nourishment.” It exerts its activity through binding to VEGF receptors (VEGFRs), located on vascular endothelial surfaces. Binding of VEGF-A to VEGFR-2 leads to the dimerization and activation of the latter, generating mitogenic, chemotactic and pro-survival signals. VEGF production is also enhanced by several GFs expressed by tumor cells, such as EGF, FGF and PDGF. FGFs also act via receptor binding to induce endothelial cell proliferation and differentiation of epiblasts into endothelial cells, and are commonly overexpressed in many malignant tumors. On the other hand, TSP-1 is a potent angiostatic factor, promoting the induction of endothelial cell apoptosis and cell cycle arrest. Expression of TSP-1 is positively regulated by p53, and since the latter is often inactivated in cancer, levels of TSP-1 are low in neoplastic cells. In this way, angiogenesis is favored during carcinogenesis.
74.2.7 Invasion and Metastasis These processes represent the hallmarks of malignancy, being the main sources of cancer-associated morbidity and mortality. Together, they comprise a multistep route, which briefly involves detachment from the tumor mass, infiltration of adjacent tissues, entry into the circulation, migration to a remote anatomical site and foundation of a new malignant colony. Each step entails new barriers that restrict migratory cell survival and, in fact, a very small number of transformed cells released into the circulation are able to form distant metastases [129]. Moreover, there is evidence suggesting that the metastatic phenotype is not acquired as late as originally thought during tumorigenesis, but may be
M. Zachariadis et al.
a rather early event. This notion is consistent with the observation that some primary tumors exhibit at diagnosis a more aggressive profile than advanced-stage ones [129]. Nevertheless, the metastatic phenotype does not guaranty successful colonization of distant sites. An important feature of effective invasion and metastasis seems to be the interaction of the malignant cell with its surrounding microenvironment, whether composed of cells or extracellular matrix (ECM). Several proteins constitute the connecting links between a cell and its environment, which are often deranged in transformed cells. These include cell adhesion molecules (CAMs), which mediate cell-to-cell interactions, and integrins, which regulate cell-to-ECM interactions. E-cadherin, a homotypic cell-to-cell interaction molecule, acts as a bridge interconnecting adjacent epithelial cells and is responsible, along with other proteins, for the tight structural arrangement of epithelial tissues. E-cadherin is often deregulated in human cancers (reviewed in [130]), including esophagus, colon, breast, ovary and prostate carcinomas, resulting in the loss of epithelial cell polarity and tissue compactness, thereby enhancing tumor progression and invasiveness. Downregulation of E-cadherin is accomplished via inactivating mutations, transcriptional repression or posttranslational truncation [130]. On the other hand, aberrant expression of diverse integrins and their ligands allows transformed cells to adapt to the many different microenvironments encountered during migration (reviewed in [131]). The process of invasion also involves the modification of the surrounding ECM through the altered expression of matrix proteases and protease inhibitors by tumor or stromal cells. A special role has been ascribed to the class of matrix metalloproteinases (MMPs) and, particularly, to the MMP9 and MMP2 collagenases. Upregulation of MMPs has been associated with increased invasiveness and metastasis in various studies (reviewed in [132]). However, the role of matrix proteases is not limited to the obvious degradation of ECM. They are also implicated in the release and/or activation of GFs that reside in the ECM, such as macrophage colony-stimulating factor (M-CSF), IGF and VEGF, hence, promote tumor growth and angiogenesis. At the same time, degradation of collagen IV produces potent inhibitors of angiogenesis, such as endostatin and tumstatin. It is evident that invasion and metastasis are complex processes, which warrant further study in order to be fully elucidated.
74
Molecular Carcinogenesis
74.3 Multistep Carcinogenesis The molecular lesions and aberrant signaling pathways described above may effectively be integrated under the popular theory of multistep carcinogenesis [129, 133-135]. In general, it may be stated that cancer proceeds via distinct clinicopathological stages that reflect the underlying genetic alterations that progressively accumulate in transformed cells (Fig. 74.1). As previously mentioned, nonlethal genetic damage lies at the heart of carcinogenesis, and confers a growth advantage. Gene alterations in each step are transferred to the next, complemented by novel ones that confer cumulative survival/growth advantages, while some of the past “hits” may already be reversed or compensated for. Once again, it should be noted that any single genetic lesion may offer a proliferative advantage, yet is rarely sufficient to cause malignant transformation by itself, with the exception of certain hematological neoplasms. The malignant phenotype is, typically, the net effect of a plethora of molecular genetic abnormalities that are simultaneously present within a single cell. To date, the best studied model of carcinogenesis is colorectal cancer [1, 136], which proceeds through the distinct stages of hyperplasia, dysplasia, early adenoma, intermediate adenoma, late adenoma and, finally, carcinoma. Pivotal in the whole series of aberrations accompanying this carcinogenic process is the APC locus alteration, the subsequent activation of RAS and, at later stages, the deregulation of additional genes, including TP53 and TGF-b receptor II. Generally, the following sequence of events may be observed during solid tumor development. Under the influence of an external or internal stimulus that activates an oncogene, some cells begin to hyperproliferate, hence giving rise to benign, potentially neoplastic foci. The deregulation of cell division eventually leads to DNA damage, invoking DDR and challenging the physiological barriers of apoptosis and senescence. At this point, cells are considered precancerous. Under the pressure of natural selection, some cells acquire attributes (e.g., loss of tumor suppressor gene activity) that allow them to override the aforementioned barriers, thereby attaining immortality. These properties promote genomic instability and permit the accumulation of additional genetic lesions, which offer new properties, such as growth independence, insensitivity to signals from the surrounding microenvironment and induction of angiogenesis. Therefore, from a mass of a few, potentially
995
neoplastic or precancerous cells, in situ carcinomas develop. Further genetic lesions, conferring invasive and migratory properties, lead to full-blown metastatic cancer and colonization of remote sites. The above simplified scheme represents a typical series of events, which does not apply to all types of tumors nor is followed in the exact order described. One should keep in mind that tumorigenesis is highly selective, with most of the candidate, potentially malignant cells being eliminated at some point of the process. This is exactly why cancer may be viewed as a chronic disease with a long latency period. On the other hand, progression to malignancy requires only one successfully transformed cell, capable of overcoming the anti-tumor barriers it is about to encounter.
74.4 Carcinogens 74.4.1 Chemicals Chemical carcinogenesis includes three distinct stages: initiation, promotion and progression. Initiation involves at least a single nucleotide substitution caused by a mutagenic substance. Promotion takes place in the presence of a non-genotoxic factor that instigates cellular proliferation and, thereby, enables the propagation of the initially mutated cell (hyperplasia). Progression is associated with the gradual acquisition of a malignant phenotype (cancer). It is believed that most chemicals act at the stage of cancer promotion, only exerting proliferative activities. However, certain genotoxic agents, termed “complete” carcinogens, have been shown to contribute both to the initiation and promotion processes, while others have been implicated in cancer progression, as well. The majority of chemical carcinogens acquire their DNA-binding activity and the resulting mutagenicity upon metabolic activation (bioactivation) in the liver. For instance, benzo(a)pyrene (BP), a well-studied carcinogenic compound in cigarette smoke, is converted to diol-epoxides by cytochrome P450 oxidoreductase (CYPOR) enzymes, while aflatoxin B1 (AFB1), present in a wide range of tropical food commodities, is converted to epoxides. Diol-epoxides and epoxides are highly reactive electrophilic molecules, which bind covalently to DNA with deleterious effects on genome
996
integrity; the bulky BP-DNA or AFB1-DNA adducts that are formed render DNA replication prone to errors, and eventually result in G-to-T transversions. Interestingly, mutation “hotspots” within the coding region of TP53 have been shown to be preferentially targeted by BP. These mutations are thought to inactivate p53 and contribute to carcinogenesis [137]. Another intriguing category of chemical carcinogens is that of polyaromatic hydrocarbons (PAHs). PAHs play a bimodal role in tumorigenesis; they may participate both in tumor initiation, acting as mutagens following bioactivation, as well as in tumor promotion. It has been shown that PAHs elicit inflammatory responses, which coincide with those produced by ROS formation, oxidative DNA damage and aberrant cellular proliferation. Things become even more complicated owing to reports that suggest an active role of oxidative stress in tumor progression; ROS overproduction has been correlated with aberrant activation of AP-1 and NF-kB TF in malignant cells [138]. In addition, ectopic expression of mitogenic oxidase (MOX1), a homolog of phagocytic superoxide-generating NADPH oxidase, is sufficient to induce the transformation of NIH3T3 fibroblasts [139]. Hence, it is possible that PAHs may also be involved in the phase of tumor progression. It is also noteworthy that carcinogenic important metals, such as arsenic and nickel, have been shown to interfere with the methylation status of genes, resulting in the activation of oncogenes (e.g., CCND1) and the silencing of tumor suppressor genes (e.g., CDKN2A/ INK4A), respectively. Hence, among the plethora of mechanisms by which chemicals contribute to the carcinogenic process, interference with epigenetic factors represents a rather important one [140].
74.4.2 Radiation Ionizing Radiation. IR negatively affects the ability of cells to progress through the various phases of the cell cycle. Exposure to IR leads to DNA DSBs, thereby activating the ATM/p53 axis, which results in cell cycle arrest and, either genomic repair or apoptosis. Nonetheless, the cell cycle blockage and the apoptotic response elicited by IR are also governed by p53-independent pathways [141] (Fig. 74.4). The IR-induced arrest exhibits distinct features during different phases of the cell cycle, while not all cell
M. Zachariadis et al.
types halt at the same checkpoints. For instance, all eukaryotic cells seem to experience a delay in G2 progression following exposure to IR, yet some cell types demonstrate G1 arrest. The latter is mediated by p53 and its downstream transcriptional targets, such as growth arrest and DNA-damage-inducible alpha (GADD45) and the p21WAF1/CIP1. The significance of p53 in G1 arrest is emphasized by the fact that mutations within the coding region of TP53 severely inhibit the establishment of G1 blockage. S phase delay only takes place following exposure of cells to high doses of IR. Notably, abolishing the ability to halt at G1 or G2 makes quite a difference for IR-exposed cells; the likelihood of cellular survival is significantly decreased when G2 arrest is compromised, while, surprisingly enough, the inhibition of G1 arrest confers resistance to IR in some cell types [142, 143]. UV Radiation. One of the major environmental carcinogens participating in skin cancer development is solar radiation, which is emitted in wavelengths ranging from 240 to 290 nm, termed UV B and C radiation. Exposure of cells to UVB and UVC results in the formation of the highly mutagenic photoproducts 6,4-pyrimidine-pyrimidone photoproducts ([6-4]PPs) and the bulky cyclobutane pyrimidine dimers (CPDs). During S phase, the process of new DNA strand synthesis relies on the tight coordination or coupling between the leading and lagging strands. Upon UV-induced uncoupling, aberrant structures arise, containing ssDNA. In order to avoid the deleterious effects of collapsed replication forks and to preserve genomic integrity, the cell has evolved a number of strategies to cope with UV radiation. These strategies involve nucleotide excision repair (NER) of photoproducts, as well as the recruitment of the DNA polymerases eta and zeta, which are able to bypass CPDs in an errorless manner, in contrast to polymerases alpha and delta, which are blocked by these formations. The significance of this mechanism, called translesion DNA synthesis, acting as a shield against the occurrence of skin cancers, is highlighted in xeroderma pigmentosum (XP) patients, in which polymerase eta harbors mutations and the propensity for skin cancer development is extremely high. Interestingly, an ATR-mediated intra-S checkpoint pathway has also been described, which delays new DNA chain synthesis and replication bubble formation, as a response to UV exposure. In this molecular circuit, ATR signaling toward the transducer checkpoint kinase Chk1 is achieved through the aid of the mediator proteins
74
Molecular Carcinogenesis
claspin and human timeless (TIM). Recently, it was shown that TIM forms a heterodimeric complex with TIM-interacting protein (Tipin) which, in turn, associates with RPA-bound DNA. It has been suggested that TIM acts as a “supervisor” of normal replication, even in the absence of genotoxic insults, while Tipin attenuates the process in UV-exposed human cells [144].
74.4.3 Viral Agents Viral infections have been implicated in a diverse spectrum of malignant manifestations. This section overviews the carcinogenic action of viruses that have been established as the etiological agents of at least one solid tumor type.
74.4.3.1 DNA Viruses Human Papillomaviruses (HPVs). These small epitheliotropic, non-enveloped DNA viruses belong to the family of papovaviridae. During the course of infection, the viral genome may either retain an episomal state, or integrate into the host’s DNA. To date, more than 100 HPV types have been identified and are characterized as “low-” or “high-risk” according to their oncogenic potential (reviewed in [145]). Members of the low-risk group, particularly HPVs 6 and 11, are frequently detected in anogenital condylomas, papillomas of the respiratory tract and other benign mucocutaneous lesions at various anatomical sites. High-risk viral types, such as HPVs 16, 18, 33 and 58, are the etiological agents of cervical cancer and may also be implicated in the pathogenesis of anal, vulvar, penile, oropharyngeal, laryngeal, lung and skin tumors. The latter HPV types employ the integrative pattern of infection, which allows persistent expression of viral proteins, at least three of which possess growth-stimulating and transforming properties. Two products of the early high-risk genomic region are capable of forming specific complexes with vital CCR; E6, which binds to p53 and induces its degradation, and E7, which interacts with pRB and blocks its downstream activity. The functional deregulation of these key tumor suppressors results in uncontrolled DNA replication and apoptotic impairment, which explains the increased carcinogenic potential of high-risk types. Another protein, E5, has
997
been shown to bind EGFR and other membrane receptors, suppress p21WAF1/CIP1 activity and induce c-JUN expression in vitro. Yet the contribution of the protein in naturally occurring infections is still poorly understood [145]. Markedly, since 2006, prophylactic vaccines against the L1 capsid protein of HPVs 6, 11, 16 and 18 – altogether accounting for 90% of genital warts and 70% of cervical carcinomas – are commercially available. Vaccination is safe, highly immunogenic and confers complete type-specific protection against persistent infection and associated lesions in properly vaccinated women [146]. Epstein–Barr Virus (EBV). EBV is a g1-herpesvirus that infects >90% of the human population. Upon primary infection, a few individuals develop infectious mononucleosis, whereas most remain life-long, asymptomatic carriers of the virus. During acute infection, EBV colonizes the stratified squamous epithelium of the oropharynx, where it assumes an active replicative state. This stage is followed by latent infection of B lymhocytes in the oropharyngeal lymphoid compartment, after which, the virus persists in circulating memory B cells. Reactivation from latency may occur at any site where B cells reside. Its binary tropism for epithelial cells and lymphocytes, as well as its ability to maintain a latent state for long periods, are the fundamental features which allow EBV to participate in the pathogenesis of several diverse malignancies. Indeed, the virus is the main etiological agent of undifferentiated nasopharyngeal carcinoma and endemic BL, and has been strongly associated with Hodgkin’s disease, non-Hodgkin’s lymphoma, lymphoepithelioma-like gastric carcinoma and a number of lymphoproliferative conditions in immunocompromised patients [147]. EBV-induced tumorigenesis is based on “molecular mimicry,” a strategy employed by many oncogenic viruses. EBV proteins imitate the actions of membrane receptors, TFs and anti-apoptotic mediators in order to gain control of various homeostatic cellular pathways. For example, in undifferentiated nasopharyngeal cancer, where EBV infects the epithelium of the posterior nasopharynx (Waldeyer’s ring), a “latency II” expression profile has been demonstrated. In this state, the virus produces – among other proteins – EBV nuclear antigen 1 (EBNA1), latent membrane protein 1 (LMP1) and BCRF1. EBNA1 is sequence-specific DNA-binding phosphoprotein, required for the maintenance of EBV latency. LMP1 functions as constitutively active form
998
of CD40, mimicking the cellular growth signal that results from receptor–ligand binding. It also triggers overexpression of BCL-2, thus protecting the infected cells from p53-mediated apoptosis. Finally, BCRF1 bears a considerable structural and functional homology to IL-10 and is thought to contribute to tumor growth and immune evasion [147]. Although therapy for EBV-associated tumors remains unsatisfactory, encouraging signs emerge from modern antiviral and immunological approaches. Recent studies suggest that the use of selected antiviral agents, as well as immunotherapy with EBV-specific cytotoxic T lymphocytes and targeted monoclonal antibodies may hold some promise for the treatment of EBV-related malignancies. Human Herpesvirus 8 (HHV-8). The recently identified Kaposi’s sarcoma-associated herpesvirus or HHV-8 is a g2-herpesvirus that belongs to the genus of rhadinoviruses. HHV-8 exhibits significant sequence homology to EBV – the only other known g-herpesvirus [148]. Like EBV, HHV-8 is capable of (i) infecting cells of both epithelial and lymphoblastoid origin, (ii) persisting in latency for many years before reassuming a lytic state and (iii) expressing proteins that mimic the functions of cellular regulators involved in cell cycle progression, apoptosis, angiogenesis and cytokine immunomodulation. The virus has been established as the cause of all clinical variants of Kaposi’s sarcoma, primary effusion lymphoma and multicentric Castleman’s disease, while mounting evidence points toward the implication of HHV-8 in further lymphoproliferative diseases, such as multiple myeloma and AIDS-related lymphomas [148]. The etiological link between HHV-8 and Kaposi’s sarcoma has been confirmed by a plethora of seroepidemiological and molecular studies, as well as by functional assays in human cell lines and animal models. Most tumor cells demonstrate latent infection with a strikingly narrow gene expression profile. This includes latency-associated nuclear antigen (LANA-1), a protein that is functionally equivalent to the E6/7 proteins of papillomaviruses. LANA-1 interacts with p53 and suppresses its downstream activity. The protein also binds the “pocket region” of pRb, hence releases E2F and allows for the transcription of genes involved in cell cycle progression. Only a minority of sarcoma cells express lytic genes, which may however, play a critical role in tumor growth via paracrine mechanisms. The lytic-cycle gene expression cascade includes a viral G-protein-coupled-receptor (vGPCR) and a BCL-2
M. Zachariadis et al.
homolog (vBcl-2). The former, as its human counterpart, binds IL-8 and stimulates tumor growth and angiogenesis. The latter is expressed in late-stage Kaposi’s sarcoma and exerts its anti-apoptotic activity in a somewhat different manner than its human homolog. An in-depth understanding of these molecular mechanisms is expected to provide the rationale for targeted therapies against HHV-8-associated diseases in the future [148]. Hepatitis B virus (HBV). This small, partially doublestranded DNA virus is a member of the hepadnaviridae family. The virus is hepatotropic and causes acute or chronic liver disease. Patients with HBV surface antigen (HBsAg) persisting in their serum for more than 6 months are referred to as “chronic HBsAg carriers.” Among these, the risk of developing HCC – the predominant type of primary liver cancer – has been shown to increase by 100-fold. The viral genome contains four open reading frames coding for the core, surface, X and polymerase proteins. Interestingly, while HBV is a DNA virus, it replicates through an RNA intermediate and requires an active reverse transcriptase (RT) gene. In the context of hepatocellular carcinogenesis, the integration of HBV DNA into the host genome and the functional attributes of the X protein (HBxAg) are of pivotal significance. Viral integration is random and typically occurs at multiple genomic sites. Hence, proviral DNA inserts may act as mutagenic agents, causing genomic instability and secondary chromosomal rearrangements. Loss of tumor suppressor loci and amplification of GF genes have been observed following HBV DNA integration in vitro. On the other hand, the HBx protein acts as a transcriptional transactivator of several host genes engaged in cell growth control, such as the proto-oncogenes c-JUN, c-FOS and c-MYC, possibly through the protein kinase C (PKC) and NF-kB pathways. HBxAg may also interact with p53 and pRb, thus interfering with cell cycle regulation, DNA repair and apoptosis [149].
74.4.3.2 RNA Viruses Hepatitis C Virus (HCV). HCV is a flaviviridae family member. This virus causes acute and chronic liver disease, and has been strongly linked to the development of HCC. Globally, the number of HCV-infected individuals is smaller than that of HBV-infected ones, yet the chronicity is considerably higher, reaching up to 85%. The HCV 9,600-bp single-stranded RNA genome encodes a
74
Molecular Carcinogenesis
single polyprotein precursor, which is cleaved into smaller structural (core, envelope 1 and 2) and nonstructural (NS1, NS3–5) proteins. The virus possesses no RT activity and does not integrate into the host genome. However, at some stage of the viral cycle, the core protein subunits p19 and p21 enter the nucleus and trigger the transcriptional activity of key regulatory molecules, such as NF-kB and STAT-3. Comparative studies have demonstrated that the changes brought about by cirrhosis and chronic liver inflammation in the cellular microenvironment play a critical role in HCV-associated carcinogenesis, in contrast to HBV, which is capable of transforming hepatocytes in a more direct manner [149]. Other RNA Viruses. Most of the RNA tumor viruses identified to date exert their transforming activities via two common mechanisms; (i) expression of viral oncoproteins and (ii) insertional mutagenesis. For example, Simian sarcoma virus encodes a homolog of the human PDGF, which leads to the establishment of a powerful autocrine growth-stimulatory loop and, eventually, to the malignant transformation of infected cells [150]. Avian leukosis viruses are thought to be integrated into the host genome, adjacent to the oncogene c-MYC. Consequently, the latter is constitutively expressed due to the enhancer activity of viral 3' long terminal repeats (LTR) sequences [151]. An additional, though less frequent, mechanism employed by RNA tumor viruses relies on the transactivation of host genes associated with cell proliferation by viral proteins, as in the case of the Tax transforming protein produced by human T-cell lymphotropic virus-type 1 (HTLV-1) [152].
74.4.4 Genetic Predisposition It is currently estimated that 5–10% of all cancers are hereditary or familial. These tumors are predominantly early-onset, carry distinct morphological, immunophenotypic and prognostic features, and originate from autosomal dominant mutations in high-penetrance susceptibility genes. Typical examples of genes responsible for hereditary cancer syndromes are BRCA1/2 in familial breast and ovarian cancer, RET in FMTC, APC in familial adenomatous polyposis, the DNA mismatch repair genes hMSH2, hMLH1, hPMS1, hPMS2, hMSH3 and hMSH6 in hereditary nonpolyposis colorectal cancer, and E-cadherin/CDH-1 in hereditary diffuse gastric cancer [153].
999
The molecular pathogenesis of sporadic tumors may also involve a hereditary component, although a less straightforward one. Numerous groups have performed association studies to assess candidate cancersusceptibility genes and have produced an extensive directory of identified suspects. Among these, genes encoding metabolic enzymes engaged in the detoxification of carcinogenic agents, as well as proteins implicated in the DNA repair machinery, have received the utmost attention. Glutathione S-transferases (GSTs) comprise a family of isoenzymes that promote water solubility of xenobiotic and endogenous substances, thus facilitating their cellular excretion. The most important genes seem to be GSTM1 and GSTT1, coding for enzyme classes m and t, for which 40 and 15% of Caucasians, respectively, are homozygotes for the nonfunctional – null – alleles. Each of the GSTM1 and GSTT1 null/null haplotypes confers an increased risk for epithelial carcinogenesis, while their combination exerts a synergistic effect [154]. The isoenzymes N-acetyltransferases (NAT) 1 and 2 are encoded by highly polymorphic genes and participate in the neutralization of tobacco-specific aromatic amines via N- and O-acetylation. Certain NAT1/2 genotypes have been strongly associated with an increased susceptibility to lung and upper aerodigestive tract carcinogenesis, whereas weaker trends have been reported for tumors of the breast, colon and rectum, prostate and bladder [155]. A prominent predisposing role has been ascribed to the genes encoding the cytochrome P450 (CYP) superfamily of enzymes, which are involved in the early metabolism of polycyclic aromatic hydrocarbons, nitroaromatics and arylamines. Polymorphisms in CYP genes have been linked to an elevated risk of epithelial neoplasia in a variety of organs and tissues, with the best-documented relations being those of CYP1A1 with NSCLC, and CYP2E1 with colorectal cancer [156]. It is noteworthy that a number of CYP enzymes partake in the inactivation pathway of cancer chemotherapeutic agents, an aspect that has recently rendered them promising targets for novel treatment strategies [157]. To date, studies of familial cancer syndromes have identified further genes, typically tumor suppressors, which seem to convey an increased hereditary risk for malignancy. However, germline mutations and polymorphisms in such genes pertain to a small number of cases and to specific tumor types.
1000
References 1. Fearon ER, Vogelstein B (1990) A genetic model for colorectal tumorigenesis. Cell 61:759–767 2. Hanahan D, Weinberg RA (2000) The hallmarks of cancer. Cell 100:57–70 3. Goustin AS, Leof EB, Shipley GD et al (1986) GrowthFactors and Cancer. Cancer Research 46:1015–1029 4. Lesko E, Majka M (2008) The biological role of HGF-MET axis in tumor growth and development of metastasis. Front Biosci 13:1271–1280 5. Jones AV, Cross NC (2004) Oncogenic derivatives of plateletderived growth factor receptors. Cell Mol Life Sci 61:2912–2923 6. Simon MP, Pedeutour F, Sirvent N et al (1997) Deregulation of the platelet-derived growth factor B-chain gene via fusion with collagen gene COL1A1 in dermatofibrosarcoma protuberans and giant-cell fibroblastoma. Nat Genet 15:95–98 7. Mattila MM, Harkonen PL (2007) Role of fibroblast growth factor 8 in growth and progression of hormonal cancer. Cytokine Growth Factor Rev 18:257–266 8. Dvorak P, Dvorakova D, Hampl A (2006) Fibroblast growth factor signaling in embryonic and cancer stem cells. FEBS Lett 580:2869–2874 9. Desiderio MA (2007) Hepatocyte growth factor in invasive growth of carcinomas. Cell Mol Life Sci 64:1341–1354 10. Normanno N, Bianco C, De Luca A et al (2001) The role of EGF-related peptides in tumor growth. Front Biosci 6: D685–707 11. Nguyen DM, Schrump DS (2004) Growth factor receptors as targets for lung cancer therapy. Semin Thorac Cardiovasc Surg 16:3–12 12. Ruco LP, Stoppacciaro A, Ballarini F et al (2001) Met protein and hepatocyte growth factor (HGF) in papillary carcinoma of the thyroid: evidence for a pathogenetic role in tumourigenesis. J Pathol 194:4–8 13. Maulik G, Shrikhande A, Kijima T et al (2002) Role of the hepatocyte growth factor receptor, c-Met, in oncogenesis and potential for therapeutic inhibition. Cytokine Growth Factor Rev 13:41–59 14. Abd El-Rehim DM, Pinder SE, Paish CE et al (2004) Expression and co-expression of the members of the epidermal growth factor receptor (EGFR) family in invasive breast carcinoma. Br J Cancer 91:1532–1542 15. Normanno N, Bianco C, De Luca A et al (2003) Targetbased agents against ERBB receptors and their ligands: a novel approach to cancer treatment. Endocr Relat Cancer 10:1–21 16. Salomon DS, Brandt R, Ciardiello F et al (1995) Epidermal growth factor-related peptides and their receptors in human malignancies. Crit Rev Oncol Hematol 19:183–232 17. Normanno N, De Luca A, Bianco C et al (2006) Epidermal growth factor receptor (EGFR) signaling in cancer. Gene 366:2–16 18. Stern DF (2000) Tyrosine kinase signalling in breast cancer: ErbB family receptor tyrosine kinases. Breast Cancer Res. 2:176-83” 19. Ménard S, Casalini P, Campiglio M et al (2004) Role of HER2/neu in tumor progression and therapy. Cell Mol Life Sci 61:2965–2978
M. Zachariadis et al. 20. Fleming TP, Saxena A, Clark WC et al (1992) Amplification and/or overexpression of platelet-derived growth factor receptors and epidermal growth factor receptor in human glial tumors. Cancer Res 52:4550–4553 21. Smith JS, Wang XY, Qian J et al (2000) Amplification of the platelet-derived growth factor receptor-A (PDGFRA) gene occurs in oligodendrogliomas with grade IV anaplastic features. J Neuropathol Exp Neurol 59:495–503 22. MacDonald TJ, Brown KM, LaFleur B et al (2001) Expression profiling of medulloblastoma: PDGFRA and the RAS/MAPK pathway as therapeutic targets for metastatic disease. Nat Genet 29:143–152 23. Santoro M, Carlomagno F, Melillo RM et al (2004) Dysfunction of the RET receptor in human cancer. Cell Mol Life Sci 61:2954–2964 24. Arighi E, Borrello MG, Sariola H (2005) RET tyrosine kinase signaling in development and cancer. Cytokine Growth Factor Rev 16:441–467 25. Asai N, Jijiwa M, Enomoto A et al (2006) RET receptor signaling: dysfunction in thyroid cancer and Hirschsprung's disease. Pathol Int 56:164–172 26. Stephens P, Hunter C, Bignell G et al (2004) Lung cancer: intragenic ERBB2 kinase mutations in tumours. Nature 431:525–526 27. Shigematsu H, Takahashi T, Nomura M et al (2005) Somatic mutations of the HER2 kinase domain in lung adenocarcinomas. Cancer Res 65:1642–1646 28. Fletcher JA, Rubin BP (2007) KIT mutations in GIST. Curr Opin Genet Dev 17:3–7 29. Miettinen M, Lasota J (2005) KIT (CD117): a review on expression in normal and neoplastic tissues, and mutations and their clinicopathologic correlation. Appl Immunohistochem Mol Morphol 13:205–220 30. Kitamura Y, Hirotab S (2004) Kit as a human oncogenic tyrosine kinase. Cell Mol Life Sci 61:2924–2931 31. Rajalingam K, Schreck R, Rapp UR et al (2007) Ras oncogenes and their downstream targets. Biochim Biophys Acta 1773:1177–1195 32. McCubrey JA, Steelman LS, Chappell WH et al (2007) Roles of the Raf/MEK/ERK pathway in cell growth, malignant transformation and drug resistance. Biochim Biophys Acta 1773:1263–1284 33. Molina JR, Adjei AA (2006) The Ras/Raf/MAPK pathway. J Thorac Oncol 1:7–9 34. Chang F, Lee JT, Navolanic PM et al (2003) Involvement of PI3K/Akt pathway in cell cycle progression, apoptosis, and neoplastic transformation: a target for cancer chemotherapy. Leukemia 17:590–603 35. Fresno Vara JA, Casado E, de Castro J et al (2004) PI3K/Akt signalling pathway and cancer. Cancer Treat Rev 30: 193–204 36. Liu W, Bagaitkar J, Watabe K (2007) Roles of AKT signal in breast cancer. Front Biosci 12:4011–4019 37. Martelli AM, Nyakern M, Tabellini G et al (2006) Phosphoinositide 3-kinase/Akt signaling pathway and its therapeutical implications for human acute myeloid leukemia. Leukemia 20:911–928 38. Verma A, Kambhampati S, Parmar S et al (2003) Jak family of kinases in cancer. Cancer Metastasis Rev 22:423–434 39. Rawlings JS, Rosler KM, Harrison DA (2004) The JAK/ STAT signaling pathway. J Cell Sci 117:1281–1283
74
Molecular Carcinogenesis
40. Khwaja A (2006) The role of Janus kinases in haemopoiesis and haematological malignancy. Br J Haematol 134:366–384 41. Valentino L Pierre J (2006) JAK/STAT signal transduction: regulators and implication in hematological malignancies. Biochem Pharmacol 71:713–721 42. Amati B, Frank SR, Donjerkovic D et al (2001) Function of the c-Myc oncoprotein in chromatin remodeling and transcription. Biochim Biophys Acta 1471:M135–145 43. Lutz W, Leon J, Eilers M (2002) Contributions of Myc to tumorigenesis. Biochim Biophys Acta 1602:61–71 44. Thomas WD, Raif A, Hansford L et al (2004) N-myc transcription molecule and oncoprotein. Int J Biochem Cell Biol 36:771–775 45. Cowling VH, Cole MD (2006) Mechanism of transcriptional activation by the Myc oncoproteins. Semin Cancer Biol 16:242–252 46. Sala A (2005) B-MYB, a transcription factor implicated in regulating cell cycle, apoptosis and cancer. Eur J Cancer 41:2479–2484 47. Milde-Langosch K (2005) The Fos family of transcription factors and their role in tumourigenesis. Eur J Cancer 41:2449–2461 48. Vogt PK (2001) Jun, the oncoprotein. Oncogene 20: 2365–2377 49. Safe S, Abdelrahim M (2005) Sp transcription factor family and its role in cancer. Eur J Cancer 41:2438–2448 50. Seth A, Watson DK (2005) ETS transcription factors and their emerging roles in human cancer. Eur J Cancer 41:2462–2478 51. Musgrove EA (2006) Cyclins: roles in mitogenic signaling and oncogenic transformation. Growth Factors 24:13–19 52. Lee MH, Yang HY (2003) Regulators of G1 cyclin-dependent kinases and cancers. Cancer Metastasis Rev 22:435–449 53. Barton MC, Akli S, Keyomarsi K (2006) Deregulation of cyclin E meets dysfunction in p53: closing the escape hatch on breast cancer. J Cell Physiol 209:686–694 54. Yam CH, Fung TK, Poon RY (2002) Cyclin A in cell cycle control and cancer. Cell Mol Life Sci 59:1317–1326 55. Tashiro E, Tsuchiya A, Imoto M (2007) Functions of cyclin D1 as an oncogene and regulation of cyclin D1 expression. Cancer Sci 98:629–635 56. Gladden AB, Diehl JA (2005) Location, location, location: the role of cyclin D1 nuclear localization in cancer. J Cell Biochem 96:906–913 57. Tsantoulis PK, Gorgoulis VG (2005) Involvement of E2F transcription factor family in cancer. Eur J Cancer 41:2403–2414 58. Johnson DG, Degregori J (2006) Putting the Oncogenic and Tumor Suppressive Activities of E2F into Context. Curr Mol Med 6:731–738 59. Semple JW, Duncker BP (2004) ORC-associated replication factors as biomarkers for cancer. Biotechnol Adv 22: 621–631 60. Karakaidos P, Taraviras S, Vassiliou LV et al (2004) Overexpression of the replication licensing regulators hCdt1 and hCdc6 characterizes a subset of non-small-cell lung carcinomas: synergistic effect with mutant p53 on tumor growth and chromosomal instability--evidence of E2F-1 transcriptional control over hCdt1. Am J Pathol 165:1351–1365 61. Liontos M, Koutsami M, Sideridou M et al (2007) Deregulated overexpression of hCdt1 and hCdc6 promotes malignant behavior. Cancer Res 67:10899–10909
1001 62. Levine AJ (1997) p53, the cellular gatekeeper for growth and division. Cell 88:323–331 63. Bartkova J, Horejsi Z, Koed K et al (2005) DNA damage response as a candidate anti-cancer barrier in early human tumorigenesis. Nature 434:864–870 64. Gorgoulis VG, Vassiliou LV, Karakaidos P et al (2005) Activation of the DNA damage checkpoint and genomic instability in human precancerous lesions. Nature 434:907–913 65. Sherr CJ, Roberts JM (1999) CDK inhibitors: positive and negative regulators of G1-phase progression. Genes Dev 13:1501–1512 66. Giacinti C, Giordano A (2006) RB and cell cycle progression. Oncogene 25:5220–5227 67. Sharpless NE (2005) INK4a/ARF: a multifunctional tumor suppressor locus. Mutat Res 576:22–38 68. Gallagher SJ, Kefford RF, Rizos H (2006) The ARF tumour suppressor. Int J Biochem Cell Biol 38:1637–1641 69. Kim WY, Sharpless NE (2006) The regulation of INK4/ARF in cancer and aging. Cell 127:265–275 70. Kinzler KW, Vogelstein B (1997) Cancer-susceptibility genes. Gatekeepers and caretakers. Nature 386:761–763 71. Bloom J, Pagano M (2003) Deregulated degradation of the cdk inhibitor p27 and malignant transformation. Semin Cancer Biol 13:41–47 72. Alkarain A, Slingerland J (2004) Deregulation of p27 by oncogenic signaling and its prognostic significance in breast cancer. Breast Cancer Res 6:13–21 73. Kudo Y, Kitajima S, Ogawa I et al (2005) Down-regulation of Cdk inhibitor p27 in oral squamous cell carcinoma. Oral Oncol 41:105–116 74. Sicinski P, Zacharek S, Kim C (2007) Duality of p27Kip1 function in tumorigenesis. Genes Dev 21:1703–1706 75. Cardozo T, Pagano M (2007) Wrenches in the works: drug discovery targeting the SCF ubiquitin ligase and APC/C complexes. BMC Biochem 8:S9 76. Chu EC, Tarnawski AS (2004) PTEN regulatory functions in tumor suppression and cell biology. Med Sci Monit 10:RA235–241 77. Leslie NR, Downes CP (2004) PTEN function: how normal cells control it and tumour cells lose it. Biochem J 382:1–11 78. Chow LM, Baker SJ (2006) PTEN function in normal and neoplastic growth. Cancer Lett 241:184–196 79. Maehama T (2007) PTEN: its deregulation and tumorigenesis. Biol Pharm Bull 30:1624–1627 80. Adams JM (2003) Ways of dying: multiple pathways to apoptosis. Genes Dev 17:2481–2495 81. Danial NN, Korsmeyer SJ (2004) Cell death: critical control points. Cell 116:205–219 82. Hinds MG, Day CL (2005) Regulation of apoptosis: uncovering the binding determinants. Curr Opin Struct Biol 15:690–699 83. Reichmann E (2002) The biological role of the Fas/FasL system during tumor formation and progression. Semin Cancer Biol 12:309–315 84. Adams JM, Cory S (2007) Bcl-2-regulated apoptosis: mechanism and therapeutic potential. Curr Opin Immunol 19:488–496 85. Adams JM, Cory S (2007) The Bcl-2 apoptotic switch in cancer development and therapy. Oncogene 26:1324–1337 86. Dimri GP (2005) What has senescence got to do with cancer? Cancer Cell 7:505–512
1002 87. Di Micco R, Fumagalli M, di Fagagna F (2007) Breaking news: high-speed race ends in arrest--how oncogenes induce senescence. Trends Cell Biol 17:529–536 88. Schmitt CA (2007) Cellular senescence and cancer treatment. Biochim Biophys Acta 1775:5–20 89. Schmitt E, Paquet C, Beauchemin M et al (2007) DNAdamage response network at the crossroads of cell-cycle checkpoints, cellular senescence and apoptosis. J Zhejiang Univ Sci B 8:377–397 90. Itahana K, Campisi J, Dimri GP (2004) Mechanisms of cellular senescence in human and mouse cells. Biogerontology 5:1–10 91. Cheung AL, Deng W (2008) Telomere dysfunction, genome instability and cancer. Front Biosci 13:2075–2090 92. Schinzel AC, Hahn WC (2008) Oncogenic transformation and experimental models of human cancer. Front Biosci 13:71–84 93. Lindahl T (1993) Instability and decay of the primary structure of DNA. Nature 362:709–715 94. Holmquist GP (1998) Endogenous lesions, S-phaseindependent spontaneous mutations, and evolutionary strategies for base excision repair. Mutat Res 400:59–68 95. Kovtum IV, McMurray CT (2007) Crosstalk of DNA glycosylases with pathways other than base excision repair. DNA repair 6:517–519 96. Branzei D, Foiani M (2008) Regulation of DNA repair throughout the cell cycle. Nature Rev Mol Cell Biol 9:297–308 97. Rajagopalan H, Nowak MA, Vogelstein B et al (2003) The significance of unstable chromosomes in colorectal cancer. Nat Rev Cancer 3:695–701 98. Hsieh P, Yamane K (2008) DNA mismatch repair: molecular mechanism, cancer, and ageing. Mech Ageing Dev 129:391–407 99. Bayani J, Selvarajah S, Maire G et al (2007) Genomic mechanisms and measurement of structural and numerical instability in cancer cells. Semin Cancer Biol 17:5–18 100. Holand AJ, Cleveland DW (2009) Boveri revisited: chromosomal instability, aneuploidy and tumorigenesis. Nature Rev Mol Cell Biol 10:478–487 101. Wyman C, Kanaar R (2006) DNA double-strand break repair: all's well that ends well. Annu Rev Genet 40:363–383 102. Cahill D, Connor B, Carney JP (2006) Mechanisms of eukaryotic DNA double strand break repair. Front Biosci 11:1958 –1976 103. Scott SP, Pandita TK (2006) The cellular control of DNA double-strand breaks. J Cell Biochem 99:1463 –1475 104. Helleday T, Lo J, van Gent DC et al (2007) DNA double strand break repair: from mechanistic understanding to cancer treatment. DNA Repair (Amst) 6:923 –935 105. van Gent DC, Hoeijmakers JHJ, Kanaar R (2001) Chromosomal stability and the DNA double-stranded break connection. Nat Rev Genet 2:196–206 106. Abraham RT (2001) Cell cycle checkpoint signaling through the ATM and ATR kinases. Genes Dev 15:2177 –2196 107. Durocher D, Jackson SP (2001) DNA-PK, ATM and ATR as sensors of DNA damage: variations on a theme? Curr Opin Cell Biol 13:225 –231 108. Shiloh Y (2001) ATM and ATR: networking cellular responses to DNA damage. Curr Opin Genet Dev 11:71 –77 109. Paulsen RD, Cimprich KA (2007) The ATR pathway: fine tuning the fork. DNA Repair (Amst) 6:953 –966
M. Zachariadis et al. 110. Riches LC, Lynch AM, Gooderham NJ (2008) Early events in the mammalian response to DNA double-strand breaks. Mutagenesis 23:331 –339 111. Reinhardt CH, Yaffe MB (2009) Kinases that control the cell cycle in response to DNA damage: Chk1, Chk2, and MK2. Curr Opin Cell Biol 21:245–255 112. Cimprich KA, Cortez D (2008) ATR: an essential regulator of genome integrity. Nat Rev Mol Cell Biol 9:616–627 113. Zou L, Elledge SJ (2003) Sensing DNA damage through ATRIP recognition of RPA-ssDNA complexes. Science 300:1542–1548 114. Jackson SP (2006) ATM- and cell cycle-dependent regulation of ATR in response to DNA double-strand breaks. Nat Cell Biol 8:37–45 115. Sartori AA, Lukas C, Coates J et al (2007) Human CtIP promotes DNA end resection. Nature 450:509–514 116. Vousden KH, Prives C (2009) Blinded by the Light: The Growing Complexity of p53. Cell 137:413–431 117. Lukas C, Falck J, Bartkova J et al (2003) Distinct spatiotemporal dynamics of mammalian checkpoint regulators induced by DNA damage. Nat Cell Biol 5:255–260 118. Bekker-Jensen S, Lukas C, Kitagawa R et al (2006) Spatial organization of the mammalian genome surveillance machinery in response to DNA strand breaks. J Cell Biol 173:195–206 119. Lukas J, Lukas C, Bartek J (2004) Mammalian cell cycle checkpoints: signaling pathways and their organization in space and time. DNA Repair 3:997–1007 120. Bernstein C, Bernstein H, Payne CM et al (2002) DNA repair/pro-apoptotic dual-role proteins in five major DNA repair pathways: fail-safe protection against carcinogenesis. Mutat Res 511:145–178 121. Frieberg EC (2003) DNA damage and repair. Nature 412:436–440 122. Misteli T, Soutoglou E (2009) The emerging role of nuclear architecture in DNA repair and genome maintenance. Nature Rev Mol Cell Biol 10:243–254 123. Donigan K, Sweasy JB (2009) Sequence context-specific mutagenesis and base excision repair. Mol Carcinog 48:362–368 124. Hedge ML, Hazra TK, Mitra S (2008) Early steps in the DNA base excision/single-strand interruption repair pathway in mammalian cells. Cell Res 18:27–47 125. Scolnick DM, Halazonetis TD (2000) Chfr defines a mitotic stress checkpoint that delays entry into metaphase. Nature 406:430–435 126. Steigemann P, Wurzenberger C, Schmitz MH et al (2009) Aurora B-mediated abscission checkpoint protects against tetraploidization. Cell 136:473–484 127. Kops GJPL, Weaver BAA, Cleveland DW (2001) On the road to cancer: Aneuploidy and the mitotic checkpoint. Nature Rev Cancer 5:773–785 128. Eichhorn ME, Kleespies A, Angele MK et al (2007) Angiogenesis in cancer: molecular mechanisms, clinical impact. Langenbecks Arch Surg 392:371–379 129. Schedin P, Elias A (2004) Multistep tumorigenesis and the microenvironment. Breast Cancer Res 6:93–101 130. Masterson J, O'Dea S (2007) Posttranslational truncation of E-cadherin and significance for tumour progression. Cells Tissues Organs 185:175–179 131. Moschos SJ, Drogowski LM, Reppert SL et al (2007) Integrins and cancer. Oncology (Williston Park) 21:13–20
74
Molecular Carcinogenesis
132. Klein G, Vellenga E, Fraaije MW et al (2004) The possible role of matrix metalloproteinase (MMP)-2 and MMP-9 in cancer, e.g. acute leukemia. Crit Rev Oncol Hematol 50:87–100 133. Boyd JA, Barrett JC (1990) Genetic and cellular basis of multistep carcinogenesis. Pharmacol Ther 46:469–486 134. Barrett JC (1993) Mechanisms of multistep carcinogenesis and carcinogen risk assessment. Environ Health Perspect 100:9–20 135. Beckmann MW, Niederacher D, Schnurch HG et al (1997) Multistep carcinogenesis of breast cancer and tumour heterogeneity. J Mol Med 75:429–439 136. Cho KR, Vogelstein B (1992) Genetic alterations in the adenoma--carcinoma sequence. Cancer 70:1727–1731 137. Pfeifer GP, Denissenko MF, Olivier M et al (2002) Tobacco smoke carcinogens, DNA damage and p53 mutations in smoking-associated cancers. Oncogene 21:7435–7451 138. Gupta A, Rosenberger SF, Bowden GT (1999) Increased ROS levels contribute to elevated transcription factor and MAP kinase activities in malignantly progressed mouse keratinocyte cell lines. Carcinogenesis 20:2063–2073 139. Suh YA, Arnold RS, Lassegue B et al (1999) Cell transformation by the superoxide-generating oxidase Mox1. Nature 401:79–82 140. Luch A (2005) Nature and nurture - lessons from chemical carcinogenesis. Nat Rev Cancer 5:113–125 141. Verheyde J, de Saint-Georges L, Leyns L et al (2006) The role of Trp53 in the transcriptional response to ionizing radiation in the developing brain. DNA Res 13:65–75 142. Maity A, McKenna WG, Muschel RJ (1994) The molecular basis for cell cycle delays following ionizing radiation: a review. Radiother Oncol 31:1–13 143. Bernhard EJ, Maity A, Muschel RJ et al (1995) Effects of ionizing radiation on cell cycle progression. A review. Radiat Environ Biophys 34:79–83 144. Unsal-Kacmaz K, Chastain PD, Qu PP et al (2007) The human Tim/Tipin complex coordinates an Intra-S checkpoint response to UV that slows replication fork displacement. Mol Cell Biol 27:3131–3142
1003 145. zur Hausen H (2000) Papillomaviruses causing cancer: evasion from host-cell control in early events in carcinogenesis. J Natl Cancer Inst 92:690–698 146. Lowy DR, Schiller JT (2006) Prophylactic human papillomavirus vaccines. J Clin Invest 116:1167–1173 147. Thompson MP, Kurzrock R (2004) Epstein-Barr virus and cancer. Clin Cancer Res 10:803–821 148. Cathomas G (2003) Kaposi's sarcoma-associated herpesvirus (KSHV)/human herpesvirus 8 (HHV-8) as a tumour virus. Herpes 10:72–77 149. Szabo E, Paska C, Kaposi Novak P et al (2004) Similarities and differences in hepatitis B and C virus induced hepatocarcinogenesis. Pathol Oncol Res 10:5–11 150. Fry DG, Milam LD, Maher VM et al (1986) Transformation of diploid human fibroblasts by DNA transfection with the v-sis oncogene. J Cell Physiol 128:313–321 151. Gong M, Semus HL, Bird KJ et al (1998) Differential selection of cells with proviral c-myc and c-ERBB integrations after avian leukosis virus infection. J Virol 72:5517–5525 152. Sun SC, Ballard DW (1999) Persistent activation of NF-kappaB by the tax transforming protein of HTLV-1: hijacking cellular IkappaB kinases. Oncogene 18: 6948–6958 153. Marsh D, Zori R (2002) Genetic insights into familial cancers-update and recent discoveries. Cancer Lett 181: 125–164 154. Parl FF (2005) Glutathione S-transferase genotypes and cancer risk. Cancer Lett 221:123–129 155. Hein DW (2000) N-Acetyltransferase genetics and their role in predisposition to aromatic and heterocyclic amineinduced carcinogenesis. Toxicol Lett 112–113:349–356 156. Turesky RJ (2004) The role of genetic polymorphisms in metabolism of carcinogenic heterocyclic aromatic amines. Curr Drug Metab 5:169–180 157. Sissung TM, Price DK, Sparreboom A et al (2006) Pharmacogenetics and regulation of human cytochrome P450 1B1: implications in hormone-mediated tumor metabolism and a novel target for therapeutic intervention. Mol Cancer Res 4:135–150
Index
A Abdominal aortic aneurysms (AAAs), 889 branched concepts, 890–891 device fabric fatigue, 891 fixation, 891 metal fatigue, 891 migration, 891 neck angle, 890 neck quality, 890 repair, 876 Academic department, business structure, 665–667 Academic Health Sciences Centre (AHSC), 5, 698 Academic institutions, 710 Academic lecture definition, 605–606 delivery, 606–608 introduction, methods, results and discussion (IMRaD), 607 lexical density, 608 live link, 609 modern aspects, 609 planning, 606–608 powerpoint, 608 practicalities, 608–609 principles, 606 signposts, 608 types, 606 Academic surgeon, 5–6 Access to health care, 177–179 Achalasia, 802 Acute physiology and chronic health evaluation (APACHE) AAA, 515 score, 510–512 Adjuvant treatment, 804–805 Administration age, 756 ethics, 763–764 ethnicity, 756 faculty, 768 compensation, 766–767 development, 765 fellowships, 765 financial pressures, 755 fundraising, 763 gender, 756 interdisciplinary
clinical delivery, 761–762 collaboration, 761 model, 759 surgical education, 762 training programs, 761–762 leadership, 758 medical students, 764 partnership in industry, 763 philanthropy, 763 productivity metrics, 768 professionalism, 763–764 promotion, 768 quality, 763–764 recruitment, 765 residency, 764–765 retention, 765 strategy, 757–762 surgical department, 754–757 surgical divisions, 758–759 surgical innovation, 763 working hours, 756–757 Adverse outcomes, 256 Aesthetic surgery face, 928–929 scalp, 928–929 Aggregate studies, 33–34 Aligned resources, 711–713 Aligned vision, 711–713 Alpha-antagonists, 834 Alternative analysis, 405 Alternative models of organizational governance, 711 American Society of Anaesthesiology (ASA) grade, 510 Analysis of health-related quality of life data, 138–139 Aneurysmal disease, 877 Angiogenesis, 993–994 Angiography, 536 Animal experimentation, 231–233 Animal husbandry, 217–218 Animal husbandry, disease recognition, 218–219 Animal research acts of government, 213 anaesthesia (inhalational), 222–223 anaesthesia (non-inhalational), 222–223 analgesia, 220 ethics, 230–231 genetic modification, 216
1005
1006 history, 208–209 humane killing of animals, 219–220 legal requirements, 212–214 licence, 214 personal health, 216–217 postoperative care, 225–226 surgery, 225 Animal rights groups, 211 Annexin V, 972 Aortic debranching, 877–879 Aortocaval cardiopulmonary bypass, minimally invasive, 858 Apoptosis, 987–990 Area under the curve (AUC), 92–93 Art advantages for the surgeon, 750 aesthetic surgery, 748–749 anatomical, 742 clinical practice, 746 depth perception, 746 digital images and manipulation, 747–748 fine motor control, 744 form and shape, 745 hospitals, 749 illustration and research, 750 innovation, 750 interpretation of laparoscopic images, 746 interpreting visual information, 745–746 Leonardo Da Vinci, 742–743 memory and revision, 746 methods of illustration, 744 observational skills, 745 practical uses of art in surgery, 746–749 recording surgical history, 743 recording surgical technique, 743–744 spatial awareness, 744–745 surgeons, 750 surgical simulation, 750 teaching, 746–747 therapy, 749 Artificial neural networks (ANNs), 344–345, 521–523 Artificial organs, 788–789 Art in surgery, 746–749 Arts and humanities research council (AHRC), 679 ASPECT study, 806–807 Assessment of technical skills, 118 Assessment tool comparison, 122 Association of medical research charities (AMRC), 679 Atrial fibrillation, 3 catheter ablation, 852–853 surgical ablation Cox-maze III, 853 cryoablation, 855 energy sources, 854 microwave, 855 mini-maze procedures, 854 pulmonary vein isolation, 853–854 radiofrequeny ablation, 854–855 ultrasound, 855 Attitudes in safety, 266 Attributable risk, 45 Attrition bias, 432
Index Augmented reality (AR), 537 Average, 445–446 B Back-propagation, 522 Balancing scores, 520–521 Bar chart, 451, 452 Bariatric surgery biliopancreatic bypass, 799 endoluminal procedures, 799 laparoscopic adjustable gastric banding, 798 malabsorptive operations, 799 restrictive procedures, 798 Roux-en-Y gastric bypass, 800–801 sleeve gastrectomy, 798–799 vertical banded gastroplasty, 798 Barrett’s disease progression, 805–806 Barrett’s esophagus, 804–805 Bayesian methods, 345–346, 520 Bayesian networks, 351–359 cost distribution, 368 handling evidences, 356 incorporating expert judgments, 356 making predictions, 354 measuring model performance, 357–358 post-trial analysis, 369–371 pre-trial analysis, 366–369 probabilities, 366–367 survival, 368 trial analysis, 365–366 Bayesian techniques, 17–18 Bayes theorem, 352–353 Begg and Mazumdar method, 434 Behavior in safety, 266 Benefits of research, 698–700 Bias, 249–250 Bibliographic databases, 334, 629 cochrane, 630 embase, 630 medline, 630 PubMed, 630 Bimodal distribution, 448 Binomial distribution, 454 Biological hazards, 273–274 Biological hazards classification, 273 Biomarkers, 838–840 quality control, 339 reproducibility, 340 Biosciences business park, 709 Biosciences cluster, 708–709 Biotechnology and Biological Sciences Research Council (BBSRC), 678 Bivariate summary receiver operating characteristic analysis, 96 Bladder cancer endoscopic diagnostic techniques, 837 surgical approaches, 837–838 Blinding of treatment allocations, 63 Book authors, 612 competitors, 613
Index cover letter, 613 finding a publisher, 613–614 following accepted proposal, 614 market analysis, 612–613 overview, 612 proposal, 611–613 synopsis, 612 table of contents, 613 Bootstrapping, 461–462 Botulinum toxin, 803 Brain tumours, 943–945 Breast cancer age, 899 diagnosis, 896 ductal carcinoma in situ (DCIS), 899–900 recurrence, local, 898–899 surgery loco-regional resection, 897–898 management of the axilla, 900–901 margins, 898 oncoplastic resections, 900 post-operative radiotherapy, 898–899 reconstruction after total mastectomy, 900 systemic therapies, 901–903 hormonal manipulation, 901–902 neo-adjuvant chemotherapy, 902–903 Bridge to recovery, 863–864 Bridge to transplantation, 861–863 Briefing in surgery, 265–266 C Calcification scores, 539 Calibration, 524 Cancer, 975 clearance, 801–802 genetic predisposition, 999 molecular basis, 976–994 Carcinogen chemical, 995–996 radiation, 996–997 viral agents DNA, 997–998 RNA, 998–999 Carcinogenesis, multistep model, 976, 995 Cardiac surgery acute myocardial infarction, 865 aortic valve replacement, 869–870 cardiac magnetic resonance imaging, 867–868 cell delivery, 865 cellular engineering, 864 clinical research network, 871 diagnostics, 867–868 donor cells, 864–865 heart failure, 866 imaging, 867–868 ischemic cardiomyopathy, 865–866 mitral valve repair, 869 multi-slice CT coronary angiography (MSCTA), 868 percutaneous valve technology, 868–869 pulmonary valve replacement, 870
1007 robotic, 870–871 single positron emission computed tomography (SPECT), 868 three-dimensional echocardiography, 867 tissue engineering, 864, 866–867 Cardiac valve surgery, minimally invasive, 859–860 Carotid disease, 892 Carotid stenting, 885 Case–control studies, 35 Causal inference, 41–44 Causality criteria, 43–44 Cell cultures and functional assays, 969–972 Cell cycle regulators, 983–985 Cellular independence, 976–985 Censoring administration, 498 interval, 497 left, 496 right, 496 types, 496–498 Central tendancy, 444 Charities, 679–680 Checking for errors, 444 Checklists in safety, 265–266 Chemicals, 275–278 Chi-squared distribution, 458 Chromogenic in situ hybridization (CISH), 953 Citation report, 334–335 Clinical appointments, 713 Clinical epidemiology, 713 Clinical judgement analysis, 18–20 Clinical practice, 28 Clinical practice guidelines (CPGs) assessing study quality, 640–641 defining scope, 639–640 definition, 638–639 development, 639–641 effectivity, 642–644 external review, 641 formulating recommendations, 641 implementation of guidelines, 642 levels of evidence, 640–641 literature review, 640 rating guidelines, 642 updating guidelines, 641–642 users, 642–644 Clinical registries, national, 783 Clinical research, 713 Clinical research audit, 310 Clinical strategy, 713 Clinical trial registry, 301 Clinical trials expertise, 30–31 partnerships, 785 Clonal expansion, 975 Clonogenic assay, 972 Closed claim studies, 262–263 Coalition, 710 Cohort/longitudinal studies, 35 Co-investigators (Co-I), 684
1008 Collaboration advances to education, 706 advances to health service delivery, 706–707 advantages to biotech sector, 709 aligned capital, 707–708 allied professionals, 706 alternative models, 709–711 benefits, 705–706 commercial opportunities, 708 complications, 697 economic spin-offs, 708 evidence of value, 701–704 funding opportunities, 707–708 implications, 704–705 improved resource utilisation, 707 integration, 696–698, 702 intellectual property exploitation, 707 opportunity, 697–698 proximity, 707 recruitment of staff, 707 Collaboration within surgical research, 705–706 Colon cancer diagnosis, 816–817 staging, 817 Colorectal cancer chemotherapy, 820–821 imaging, 819–820 sentinel lymph node mapping, 820 surgery, 821–824 tumour outcomes and characteristics, 819–820 Colorectal, hospital outcomes, 824 Colorectal surgeon, outcomes, 823–824 Colorectal surgery enhanced recovery, 825–826 individualisation of treatment, 829 laparoscopic, 824–825 natural orifice transluminal endoscopic surgery (NOTES), 828–829 real-time intra-operative anatomy, 828 real-time intra-operative histology, 828 remote telepresence surgery, 826–828 robotic, 828 telementoring, 826–828 Commission of the European communities (CEC), 680 Communication, 771 assessment, 776–777 clinical units, 777 errors, 774 failure identification, 776–777 failures, 775–776 how, 774 impact, 777 improvement, 777–779 interventions, 777–779 models, 772–774 one way, 772 operating theatres, 776 outcomes, 777 post-operatibe handover, 776
Index shift handover, 776–777 standardizing, 777–778 team changing, 778 technology innovations, 778–779 two way, 772, 773 verbal, 773 what, 774 who, 774 why, 774 written, 773 Community networks, 711 Comparative genomic hybridization (CGH) analysis, 968–969 Competing risks, 502–503 Components of psychological satisfaction, 168–169 Composite tissue allotransplantation, 927–928 Computed tomography (CT), 530 Computers backup, 327 bedside teaching, 330 communications, 327–328 desktop, 322–323 education, 330 external hard drives, 327 file management, 328 hardware, 322–323 laptop, 323 logbook, 333 mindmaps, 325 network drives, 327 operating systems, 323 picture archiving, 332–333 presentations, 325, 330–332 programming, 328–329 programming languages, 323 robotics, 336 software, 323–324 spreadsheets, 325 video editing, 331 Computing skills, 324–332 Computing skills, needs based, 328–329 Concentration index, 191 Conducting qualitative studies, 245 Confidence intervals, 91, 460–462 Conflicts of interest, 288 Congress budget, 618–619 business meeting, 621 computing, 622 continuing medical education (CME), 623 evaluation, 623 exhibitions, 621–622 location, 617–618 organisation, 616–617 organising committee (OC), 616–617 organising sub-committees, 619 panel discussion, 620 planning, 620 programme, 619
Index publicity, 618 scheduling, 619 secretariat, 621 seminars, 620–621 telecast meetings, 622–623 timing, 617 trade, 621–622 workshops, 620–621 CONSORT statement, 560 Continuous distributions, 454–458 Contract and industry research, 708 Coronary artery surgery endoscopic atraumatic coronary artery bypass grafting (EndoACAB), 858 minimally invasive, 858–859 minimally invasive direct coronary artery bypass (MIDCAB), 858, 859 totally endoscopic coronary artery bypass grafting (TECAB), 858, 859 Correlation, 468–469 Cost-effectiveness analysis, 411, 415, 416, 420 discounting, 414 limitations, 419 measures of effect, 412–413 quality, 417–419 value, 416 Cost effectiveness modelling, 31–32 Cost-effectiveness ratios, ranking, 415 Cost-effectiveness threshold (CET), 415–416 Cost-minimisation analysis, 413 Costs biomedical services, 685 consumables, 685 directly allocated, 683 directly allocated investigator, 684–687 directly incurred, 683 directly incurred staff, 683–684 equipment, 685 estate, 687 indirect, 683 materials, 687 NHS trust, 685 postgraduate fees, 687 postgraduate stipends, 687 professional services, 685 project dedicated, 687 recruitment, 685 research facility, 688 research partners, 685 research staff, 687 subsistence, 685 travel, 685 Cost-utility analysis, 413 Covariables, evolutionary, 504 COX-2, 806 Cox regression, 498 Criteria for a useful diagnostic test, 84 Cross-sectional surveys, 34–35, 481 Cross-validation, 346–347 Curriculum development, 124
1009 D Data abstraction, 381 acquisition, 110–113 analysis, 110–113, 463 categorising, 442–444, 450 coding, 444 collection, 306–310 blinding, 310 methods, 483 tools, construction, 479 continuous, 450 graphical display, 449–451 numerical, 465–466 sampling, 462–463 security, 319 shape, 448–449 types, 442–443 Database foreign keys, 315 indices, 315 look-up tables, 316–317 models, 311–314 referential integrity, 316 relationships, 315–316 software, 310–311 systems, 310–319 Data handling, 442 software, 443–444 techniques, 443–444 Data monitoring committee, 67, 71–73 examples, 70 members, 69–70 roles, 68–69 Data ownership, 250 Data Protection Act, 672–673 Decision analysis, 399–406 critical appraisal, 407 healthcare, 400 limitations, 407–408 Decision model parameters, 401 Decision trees, 342–343 Defining outcome measures, 381 Delphi technique, 252 Delta learning rule, 522 Deoxyribonucleic acid (DNA), 951 arrays, 966–968 arrays, variation, 967–968 repair failure, 991–993 Department for Employment and Learning, Northern Ireland (DELNI), 677 Department of health (DH), 679 Describing data, 444–449 Design work, 478–479 Destinational therapy, 863 Detection bias, 432 Development of surgical research, 696 Device engineering, 788–789 Dexterity analysis in surgery, 118–119
1010 Diagnostics, 807–809 accuracy estimates, 88–91 studies, 560–564 endpoints, 84–85 meta-analysis, 93–97 odds ratio, 91 test data, 85 test definition, 84 Difference between dependent groups, 112–113 Difference between independent groups, 112 Directly allocated shared laboratory technicians, 688 Discrimination, 524 Disease-specific instruments to measure health-related quality of life, 132 Documentation, 306–308 Document revisions, 308 Dot plot, 450, 451 Double entry, 444 Duplication, 285–286 Dynamic Bayesian networks, 358 Dysplasia, 806 E E-cadherin, 806 Ecological studies, 33–34 Economic and Social Research Council (ESRC), 679 Educational research academic support, 602–603 audits of practice, 600 collaboration, 603 credibility, 603 data analysis, 601 data collection methods, 600–601 designing, 600–601 factors influencing, 602–603 fundamentals, 598–599 instruments for data collection, 601 literature reviews, 600 steps, 600–601 surveys, 600 time, 603 writing a project proposal, 601–602 Effect assessment methods, 434–435 Egger test, 435 Electromagnetic tracking markers, 536 Electron-beam tomography (EBT), 540 Electronic Journals, 334 Elements of surgical innovation, 241 Email, 325–326 IMAP4, 326 MAPI, 326 POP3, 326 SMTP, 326 Embryo research, 210–211 Endoleak, 879 Endoluminal therapy, 796–797 Endoscopes, 532 Endoscopic mucosal resection (EMR), 797 Endoscopic ultrasound, 535 Endoscopy, white light, 536 Endovascular repair (EVAR), 877
Index Endovascular treatment of thoracic aneurysmal disease (TEVAR), 877 Endpoints of surgical trials, 62–63 Engineering and Physical Sciences Research Council (EPSRC), 678 Equity in healthcare, 178, 179 Ergonomics, 785 Errors, 256 Esophageal cancer, 804–805 Esophageal cancer, tumor markers, 807 Esophagectomy, minimal access, 802 Esophagus, adenocarcinoma, 805–806 Establishment of cell lines, 969–970 Ethical permission, 250 Ethics amendments, 674–675 approval, 673–674 minor “non-substantial” amendment, 674–675 substantial amendment, 674 Ethnicity, 181 Evidence hierarchy, 10–11, 548 hierarchy (traditional), 558 quality, 548–549 synthesis, 17 Evidence-based medicine (EBM), 557 Evidence-based surgery (definition), 10 Expected value of partial perfect information (EVPPI), 425–426 Expected value of perfect information (EVPI), 363, 422–423 Expected value of perfect information (EVPI) limitations, 426–427 Expected value of sample information (EVSI), 426 Experimental research measures, 106–110 Exporting data for analysis, 318 Extracellular domain (ECD), 977 Extracorporeal lithotripsy, 834 Eye-tracking technologies, 125 F Fabrication, 285 Facial allotransplantation, 927–928 Factor analysis, 491–492 Factors to ensure study participation, 78–80 Factors to increase recruitment, 78 Fail-safe N, 435 Failure communication, 263–264 time, 495 False negative, 89 False positive, 89 FDA, device classification, 653 FDA regulation of medical devices, 652–654 F-distribution, 458 Feasibility, 107 Fiducial markers, 536 Fixed effects, 94 Flat file database, 311–312 Fluorescence endoscopy, 540 Fluorescence lifetime images, 541 Fluorescent in situ hybridization (FISH), 953 Fluoroscopy, 536
Index Fraud, authorship, 286–287 Fraud in research history, 284 prevalence, 284 reasons, 284–285 Free flap monitoring, 937 Full economic cost (FEC), 683 Functional neuro-imaging technology, 125 Funding, 784–785 Fundoplication partial, 802–803 total, 802–803 Funnel plot, 433 Future of surgical research, 7 G Galbraith plot, 433–434 Gastro-esophageal reflux disease (GERD), 802–804 Generalised linear models, 474 Generic instruments to measure health-related quality of life, 131–132 Generic inverse variance, 383 Gene therapy, 805 Genetic algorithms, 345 Genetic modification, 275 Genome analysis techniques, 966–969 Genomic instability, 991 Geography, 181–182 Gini coefficient, 190 Glasgow Aneurysm Score, 516 Good database design, 314–317 Grant funding basic science proposals, 594 budget, 593–594 clinical proposals, 594–595 costing, 584 criteria, 589 grant format, 593–594 grant structure, 593–594 innovation, 589 investigation plan, 594 investigator, 593–594 originality, 589 practicalities, 582–584 process, 584–585 programme grants, 595 reasons for application, 580 reasons for failure, 585 research environment, 593–594 review process, 592 sources, 580–581 surgical residents, 590–592 surgical sources, 590 surgical success rate, 591 who should apply, 582 writing, 592–593 Grants application, 663–665 approval, 682 collaborative, 681 conference, 681 fellowship, 681
1011 financing, 682 research, 681 resourcing, 682 travel, 681 Graphical methods, 433–434 Graphical User Interface (GUI), 317–318 Great vessel reconstruction, 879–882 Grey literature, 439 Growth factors (GFs), 977 Growth factors receptors (GFRs), 977–979 Growth signals, 976–985 H Harbord test, 435 Health and safety Law, 272–273 Health care assessment, 27–32 economics, 157–158 quality improvement, 162 quality reforms, 161–162 Health economics, 31, 713 Health gap analysis, 189–190 Health outcomes, 44 Health-related quality of life (HRQL), 129–136 domains sample size, 136 measurement, 130–131 missing data, 137–138 patient preference, 32 timing, 136–137 validation, 132–134 Health services research, 713 Health technology, 30–32 Health technology assessment (HTA), 362–364, 679 Heterogeneity assessment, 93–94 Hierarchical clustering, 341 Hierarchical database, 312 Hierarchical (multilevel) logistic regression analysis modelling, 517–518 Hierarchical summary receiver operating characteristic analysis, 96–97 Higher Education Funding Councils (HEFC), 677 Higher Education Institutions (HEIs), UK, 677 High intensity focused ultrasound (HIFU), 835 Histogram, 450, 451 Hospital activity statistics, 188 Hospital funding, 755 Hospital information system, 333 Hospital-Owned Academic Institutions, 710 Human reliability analysis, 261–262 Human Tissue Act, 671–672 Human Tissue Act, consent, 672 Hybrid pocedure, 879 Hypothesis alternative, 462 null, 462 testing, 39, 462–468 I Imaging, 807–809 operative, 530 post-operative, 530 pre-operative, 530
1012 Immunofluorescence (IF), 956 Immunohistochemistry, 954–956 Impact factors and misconduct, 287–288 Impact of stress, 146 Impact of the iterative framework, 371 Implementation of evidence-based surgery, 23, 24 Impossible values, 444 IMRD system, 550 Incidence rate, 443 Incident analysis, 260–261 Incremental cost-effectiveness ratio (ICER), 414–415 Incubators, 709 Inequality in health care delivery, 180–184 Inequality research, 183–184 Inference diagrams, 358 Information overload, 376 Information technology, 782–783 Information technology and safety, 265 Informed consent, 238 Ingredients of a clinical trial, 57 Innovations Abarnathy, Utterbach model, 849 ‘fluid phase,’ 850 ‘specific phase,’ 850 ‘transitional phase,’ 850 cardiothoracic surgery, 849–851 conflict of interest, 655 cost effectiveness, 654 determining value, 648–650 factors in surgery, 649 grants, 649 key questions, 650 mentoring, 654–655 recent, 785–790 risk, 649 surgery, 240–241 teaching, 654–655 vascular surgery, 875 In situ detection of apoptotic cells, 954 In situ detection of nucleic acids, 953–954 In situ detection of protein expression, 954–956 In situ techniques, 952–956 Institutional Review Board, 662, 784 Institutional technology transfer, 652–654 Instruments to measure health-related quality of life, 131–132 Integration, 713 Intellectual property, 650–651 International organisations, 680–681 Internet, 629–635, 783 administration in surgery, 633–634 anatomy resources, 631–633 audio-visual resources, 631 backup solutions, 327 browsing, 326–327 interview and exam preparation, 633 journal collections Elsevier, 630 science direct, 630 resources, 329 search engines, Google, 631 social networking, 634
Index training, 633 Intra-operative guidance, 530 Introducing time dependence, 405–407 Invasion, 994 Investigating uncertainty, 402–405 Investigation, 658 Investigative diversity, 658–663 Investigative facilities, 662–663 Investigative topics, 661–662 Investigators, 658–661 Isolation of cells, 969–970 Items in health-related quality of life domains, 133 J Joint governance alliances, 710 Joint ventures, 704 Journal, research, 546 K Kaplan–Meier, 498 Key stages of randomized controlled clinical trial, 58 Kurtosis, 445 L Laparoscopy, diagnostic, 807 Leadership academic leadership traits, 736 academic surgery, 736–737 clinical, 736 definition, 728–732 delivery development, 739–740 departmental, 658 developing, 737–738 four framework approach, 734–735 human resource framework, 735 improving clinical care, 736–737 Lewin’s styles, 733 management, 730–731 managerial grid, 734 managerial models, 733–734 political framework, 735 situational model, 733–734 structural framework, 734–735 surgical, 658 symbolic framework, 735 teams, 735 trait theory, 728 transactional, 729–730 transformational, 729–730 Learning curve of a surgical procedure, 60 Lecture named, 658 professorship, 658 Likelihood ratios, 90–91 Linear discriminant analysis, 344 Logical checks, 444 Logistic regression, 343 Lognormal distribution, 458 Long-term survivors, 505
Index Lorenz curve, 177, 179 Lymphadenectomy, extent, 801–802 M Magnetic resonance angiogram, 878 Magnetic resonance imaging (MRI), 529 Maintenance of certification, 116 Malignant melanoma, sentinel node biopsy, 937–938 Managing Research Misconduct, 288–289 Manual of operations and procedures, 308 Manuscript preparation, 328 Markers, reference, 537 Market-based healthcare, 755 Markov model analyzing, 407 construction, 406–407 Mass spectrometry, 338 Matching, 519–520 Mean, 444 arithmetic, 445 geometric, 445 harmonic, 445 weighted, 445–446 Measures of effects, 44, 45 Measures of non-technical performance, 145 Measures of performance in surgery, 144–145 Measures of spread, 446–448 Measures of stress-combined, 144 Measures of stress in surgery, 143–144 Measures of stress-objective, 144 Measures of stress-subjective, 144 Measures of technical performance, 145 Measures of variance, 91 Measuring inequality in health care delivery, 184–189 Median, 444 Medical device regulations, 671 Medical education, 597 Medical imaging, 529–542 Medical imaging in surgery (MIS), 529–542 Medical Research Council (MRC), 678 Medical software, 333–334 Mental capacity act, 673 Mentees, 722–723 Mentoring, 2 administration, 723–724 benefits, 717 commitments, 723 contract, 723 costs, 720–721 creating a formal scheme, 718 delivery, 718–719 evaluation, 723–724 formal, 718 identifying mentors, 722–723 informal, 718 matching mentors and mentees, 723 peer, 719 process, 720 relationship, 720–724 support, 720 training, 723
1013 Mentors distance (external), 719 meaning, 716–717 origin, 175–176 senior (internal), 719 Merger, 704 Meta-analysis, 38–39, 375–377, 380, 382, 392, 396, 505–506, 559–560 advantages, 378–379 biased inclusion criteria, 391 calculating overall effects, 383 cumulative, 394–395 external validity, 390–391 fixed effect model, 383–386 forest plot, 386–387 heterogeneity, 386–387 individual patient data, 393–394 internal validity, 390 learning curve, 388 meta-regression, 387 mixed-treatment comparison (MTC), 395 outcome measures, 381 quality assessment, 388–389 quality reporting, 390–391 random effect model, 383–386 sensitivity analysis, 387 software, 395 sub-group analysis, 387 survival data, 394 Metabolic surgery, 794 Metastasis, 994 Methods to detect publication bias, 433–438 Michigan surgical collaborative for outcomes research and evaluation (M-SCORE), 662 Microarrays, 338–339 Micrometastases, bone marrow, 807 Microvascular surgery, 924–925 Minimizing threats to research design validity, 104 Missing data, 444, 504 Mode, 444 Modelling, 537–538 Molecular techniques, animal models embryonic stem cell gene delivery, 972–973 gene delivery, 972–973 microinjection, 972 retrovirus-mediated gene transfer, 973 Monte Carlo simulation, 363 Moore’s law, 323 Mortality prediction model (MPM) methodology, 512 MR angiography, 531 MTT assay, 971 Multi-coded variables, 444 Multiple variables, 450–451 Multi-stage cluster sampling, 480 Multivariate sensitivity analysis, 403–404 Musculoskeletal disorders, 913 N Narrative review limitations, 376 Narrow band imaging (NBI), 540
1014 National Cancer Research Network (NCRN) UK–BOLERO, 837 National Institute for Health Research (NIHR), 679 National Institutes of Health, USA (NIH), 681 National Patient Safety Agency (NPSA), 674 National patient surveys, 170 National Research Ethics Service (NRES), 672 Natural Environment Research Council (NERC), 679 Natural orifice transluminal endoscopic surgery (NOTES), 536, 787–788, 795–796 Negative predictive values, 90 Negative pressure therapy, 927 Neoadjuvant chemotherapy, 804, 805 Net monetary benefit (NMB), 416 Neurosurgery, 941 computed tomography (CT), 942 functional deep brain stimulation (DBS), 946, 947 spinal cord stimulation, 946 gamma knife, 944, 945 magnetic resonance imaging (MRI), 942 Neurotrauma, 942–943 Neurovascular surgery, 946 New and Emerging Applications of Technology (NEAT) Programme, 679 Nonlethal genetic damage, 975 Non-parametric analysis, 112 Non-randomised trials, 560 Non-technical skill (NOTECHS), 145 Non-technical skills for surgeons (NOTSS), 145 Normal quantile plot, 434 Northern blot, 959 Nuclear transcription factors, 982–983 Nucleic acid sequence analysis (sequencing), 964–966 Number needed to treat, 45 O Obesity surgery, 797–801 Objectivity, 249–250 Observational teamwork assessment for surgery (OTAS), 145 Observed/expected (O/E) ratios, 515 Odds ratio, 443 Off-pump cardiac surgery, 855 evidence, 857–858 rationale, 856–857 Off-pump coronary artery bypass (OPCAB), 850 Oncogenes, 976–985 On-pump surgery, minimally invasive, 858 Open Access Online Journals, 301–302 Ordered forest plot, 434 Organizational accident model, 259, 260 Originality, 547–548 Orthopaedics autologous chondrocyte implantation (ACI), 917 biomarkers, early osteoarthritis, 919 biomechanical testing, 917 diagnostics, 918 epidemiology, osteoarthritis, 914–915 imaging biochemical, 919 computed tomography (CT), 918
Index magnetic resonance imaging (MRI), 918 pain, 919–920 late disease, 915–916 matrix-induced autologous chondrocyte implantation (MACI), 917 microscopy, 917 pre-disease, genetic markers, 914 surgery delivery of treatment, 916 hip resurfacing, 915 tendon repair, 915–916 training, orthopaedic competence assessment project (OCAP), 920–921 unicompartmental knee, 915 OSATS-based global rating scales, 145 Other Government Departments (OGDs), 689 Outcome measures, 156–157 Outliers, 444 Overall satisfaction score, 170–171 Overview quality assessment questionnaire (OQAQ), 559 P Palliative management, 805 Patient dissatisfaction, 169–170 Patient expectations, 167 Patient reported outcome (PRO), 130 Patient reported outcome measures (PROMs), 157 Patient safety, 782–783 Patient satisfaction, 167–170 Pay for performance strategies, 159–160 Peer regulated innovation, 241 Penile cancer, surgery, 838 Percentage, 443 Performance bias, 432 Peripheral arterial disease, 892 Personal digital assistants (PDA), 323 PET, 532 Phosphodiesterase-type 5 (PDE5) inhibitors, 834 Photodynamic therapy, 797 Physiology and operative severity score for the enumeration of mortality and morbidity (POSSUM) methodology, 512–515 Plagiarism, 286 Planning patient recruitment, 72 Plasmid transfections, 970–971 Plastic surgery breast, 929 cleft lip, 930 cleft palate, 930 diagnostics, 936–938 flap survival, 936 gene therapy, 935–936 imaging, 936–937 minimally invasive techniques, 928 multidisciplinary team, 930 musculoskeletal substitute, 933 nerve substitute, 933–935 obesity surgery, 929 regenerative medicine, 932–935 robotics, 938 scar-free healing, 930–932
Index skin substitute, 932–933 soft tissue substitute, 933 tissue engineering, 932–935 tissue healing, 936 vascularisation, 935 wound repair, 930–932 wound repair modulation corticosteroid injection, 931 recombinant platelet-derived growth factor (PDGF), 931 transforming growth factor-beta (TGF-b), 931 Pneumatic dilation, 803 Poisson distribution, 454 Polymerase chain reaction (PCR) analysis, 961–964 Population expected value of partial perfect information (PEVPPI), 425, 426 Population expected value of perfect information (PEVPI), 423–425 Population selection, 479 Positive predictive value, 90 POSSUM-AAA, 515 POSSUM-P, 515 POSSUM-RAAA, 515 POSSUM-vascular, 515 Predictive supervised methods, 343–346 Premarket approval application (PMA), 653 Pre-operative data, co-registration, 536 Preparation for data acquisition, 309 Pre-requisites for planning educational research, 600 Presentation act, 625 audio-visual tools, 627 content, 626 feedback, 628 other tools, 628 slides, 626–627 verbal style, 626 Prevention of publication bias, 438–439 Pricing, principles, 689–690 Primary care characteristics, 182–183 Primary keys, 315 Principal components analysis, 341 Principal investigator (PI), 684 Privacy legislation, 784 Probabilistic sensitivity analysis, 404–405 Probabilistic sensitivity analysis, interpreting, 417 Probability density function, 454 Probability distributions, 451–458 Probability distributions, discrete, 454 Probability sampling, 479–480 Process of surgical evaluation, 59–60 Proportion, 443 Prospective meta-analyses, 438–439 Prostate cancer, 835 biomarkers, 837–838 chemoprevention, 844–845 events, REDUCE prevention study, 844 screening, 844 surgery, radical prostatectomy, 835–837 Prostate specific antigen (PSA), 34 Protooncogenes, 976, 977 Prototyping, 651–652
1015 Psychological factors of satisfaction, 168 Psychological stress, the Lazarus theory, 142–143 Publication bias, 429, 432–439 editorial, 300–302 evidence, 294 evidence-based medicine, 430–431 framework for reduction, 297–302 implications, 296–297 individual investigator, 298 institutional monitoring, 298–300 meta-analysis, 302, 430–431 reviewer, 300–302 sources, 294–296 P-values, 463–464 Q Qualitative comparative analysis, 18 Qualitative data, 442 Qualitative methods, 247–249 Qualitative research, 101–102, 243–245 Quality assessment tools, 562 definition, 558 measurement, 559 Quality-adjusted life year (QALY), 413–414 Quality assessment (univariate analysis), 93–94 Quality assessment of diagnostic accuracy studies (QUADAS) systematic reviews, 562 tool, 86, 88 Quality control and data integrity, 318–319 Quality of care assessment, 152–154 benchmarking, 158–160 definition, 152 key elements, 152, 153 measurement, 154–158 process measures, 155–156 public health implications, 160–161 structural variables, 154–155 Quality of life (QOL), 130 Quality of reporting of meta-analyses (QUOROM), 560 Quality-related (QR) funding, 678 Quantitative data, 442 Quantitative research, 102–103 Question, formulating, 570 Questionnaire distribution, 487 R Radiation, 278, 804, 805 Radio-frequency (RF), 535 Radiofrequency ablation (RFA), 835 Random effects, 94 Randomised controlled trials (RCTs), 36–38, 560 Randomised controlled trials limitations, 376–377 Range, 444 Rank correlation tests, 434–435 Rate, 443 Ratio, 442 Reading, psychology, 546–547 Real-time adaptive imaging, 534–536 Recalibration, 525
1016 Receiver operating characteristics (ROCs) analysis, 91–93 curves, 562 Reconstructive surgery, 923–924 Reconstructive techniques, 924–926 Recruitment exclusion criteria, 77–78 Recruitment inclusion criteria, 77–78 Recruitment skills, 76–77 Rectal cancer endorectal ultrasound (ERUS) detection of recurrence, 819 restaging, 818 staging, 818 magnetic resonance imaging (MRI), 819 staging, 817 surgery local transanal excision, 822 neoadjuvant therapy, 822 radical, 822 total mesorectal excision (TME), 821 transanal endoscopic microsurgery (TEMS), 822–823 Reference management, 328 2D Registration, 532 3D Registration, 532 Regression analysis linear, 470–472 nonlinear, 473–474 Regression tests, 435 Relationship between variables, 112 Relative diagnostic odds ratio, 94 Relative index of inequality, 191 Relative rate, 443 Relative risk, 45, 443 Reliability of research, 109 Renal cancer, advanced, 835 Renal surgery, 834–835 Replicative potential, 990–991 Reporting in diagnostic tests, 86 Reporting requirements, 783–784 Research administrative infrastructure, 665 approvals, 493, 673–675 awards, 755 capability funding, 678 clinical, 705 councils, 678 design, 103–106 design construction, 104–106 discovery, 705 dissemination, 113 funding, 113, 681 funding (administrative), 682 funds policy, 667 governance, 669–675 health care delivery, 699–700 health determinants, 699 health services, 706 investigator responsibilities, 675 medical, 699 NHS trust approval, 675 population epidemiology, 706
Index promotion, 113 question, 13, 600 question features, 101, 102 regulation, 672–673 clinical trials regulation, 670–671 medicines for human use, 670–671 sponsor approval, 675 sponsor responsibilities, 675 surgery, 657 tool development, 107 translational, 700 Research Ethics Committee (REC), 673 Research Framework Programmes, 680 Reverse causality, 42–43 Ribonucleic acid (RNA), 951, 952 Risk, 250, 443 Risk assessment, 279–281 Risk stratification modelling (RSM), 508 Risk stratification modelling (RSM), logistic regression, 516–521 RNA interference, 971 Robot-assisted laparoscopic fundoplication (RALF), 795 Robotic assisted laparoscopic prostatectomy (RALP), 786 Robotics, 785–787 esophagectomy, 795 upper gastro-intestinal surgery, 794–795 Ruptured AAA (rAAA), 882–885 S Safety animal material, 274 control measures, 280–281 control mechanisms, 279–281 healthcare, 256–258 health surveillance, 281 human material, 274–275 improvement, 267 reporting of injuries, 281 Sample selection, 479 Sample size, 480 Sampling, 479–480 Sampling error, 481 Satisfaction survey design, 171 Scales, 488 attitude measurement, 489–490 other, 490 semantic differential, 490 Scales in health-related quality of life domains, 133 Scattered light endoscopy, 540 Scatter plot, 451, 453 Schwarzer method, 434–435 Science and Technology Facilities Council (STFC), 679 Science Research Investment Fund (SRIF), 678 Scientific paper abstract, 551 acknowledgements, 553 authorship, 550–551 background, 551 bibliography, 553 case reports, 576 choosing, 547
Index conclusions, 552–553 conference discussion, 553 core components, 549–550 declarations, 553 diagrams, 573 discussion, 552–553 editorial, 553 figures, 552 flowcharts, 573 formatting, 573–575 images, 573 images in surgery, 576 importance, 553–554 introduction, 551 letters, 576 manuscript submission, 575 materials, 551–552 methods, 551–552 references, 553, 573–575 results, 552 reviewer comments, 575–576 rewriting, 572–573 structure, 571–572 summary, 551 supplementary files, 553 tables, 552, 573 techniques, 576 title, 550 types, 548–549 understanding, 547 writing, 571 Searching databases, 46–53 Secondary care characteristics, 183 Secondary research, 38 Selection modelling, 436–438 Selection of health-related quality of life domains, 133 Self-organising maps, 342 Senescence, 987–990 Sensitivity, 89–90 analysis, 417 analysis methods, 435–438 sub-group analysis approach, 437–438 Service Delivery and Organisation R&D Programme (SDO), 679 Seventh Framework Programme (FP7), 680 Signal transducers, 979–982 Simplified acute physiology score (SAPS) methodology, 512 Simplified imputation methods, 435–436 Simulated operating theatre, 122–124 Simulation-based training, 266–267 Single-coded variables, 444 Single molecule analysis, 952–966 Skewness, 445 Skill acquisition, 116 Skin cancer, 936 Smart phones, 323 Socio-demographic characteristics, 181–182 Socio-economic characteristics, 180–181 Sources of bias in diagnostic studies, 86–88 Sources of evidence, 14–15 Sources of recruitment, 77
1017 Southern blot, 957–959 Spearman’s rank co-efficient, 470 Specificity, 89–90 SPECT, 532 Spinal surgery, 947–948 Spinal tumours, 943–945 Spiral computed tomography angiogram (CTA), 878 Spread, 445 Standard deviation, 444 Standard error, 448 Standardization of clinical processes, 264–265 Standard of quality in health-related quality of life, 134–139 Standards for reporting of diagnostic (STARD) accuracy, 86, 87, 562 Standards of Publication, 300–301 Start up investments, 713 Statistical knowledge, 31 Statistical packages, 335, 444 Statistical power, 40, 480 Statistical simulation techniques, 520 Statistical technique categories, 111–112 Statistical terms, 442 Statistical tests, 464–468 Statistics, 442 Stem-and-leaf plot, 450, 451 3D Stereoscopic displays, 537 Stereotactic, 536 Stereotactic brain biopsy, 944 Stratified random sampling, 480 Stress and surgical performance, 146 Stress and surgical training, 147–148 Stress in surgery, 142–143 Structural equation modelling, 492–493 Structure of PubMed, 46–53 Study design, 33–39 cohort, 482 longitudinal, 481–482 panel, 482 questionnaire, 484–486 advantages, 487–488 disadvantages, 487–488 trend, 482 Study population, 39 Study types, 33, 34 Sub-group analysis, 524 Subjectivity, 249–250 Subject selection bias, 431–432 Summarising data, 444–445 Summary receiver operating characteristic (SROC) analysis, 95–96 Support vector machines, 345 Surgical certification, 116 Surgical competence, 116 Surgical Education Research Programme, 599–602 Surgical knowledge, 11 Surgical meeting purpose, 615–616 Surgical mortality score (SMS), 512 Surgical performance taxonomy, 117–118 Surgical proficiency, 116
1018 Surgical research contribution, 588–589 degrees, 589 Surgical research papers catalogue, 548 Surgical safety, 122 Surgical skill assessment, 118 Surgical smoke, 275 Surveys advantages, 483 building stages, 478–481 disadvantages, 483 research, errors, 483–484 types, 481–483 Survival analysis non-parametric, 498–500 parametric, 501–502 semi-parametric, 500–501 standard, 498 Survival data features, 496–498 Survival, empirical function, 499 Systematic reviews, 38–39, 375–396, 559–560 Systematic reviews, observational studies, 391–393 Systematic sampling, 479 Systemic stress, Selye’s theory, 142 T t-distribution, 458 Techniques for molecular analysis in solution, 956–966 Technology convergence, 706 Technology licence agreement, 652 Technology transfer, 713 Telerobotically assisted laparoscopic cholecystectomy (TALC), 794–795 Tests of normality, 110–111 Theatre safety, 275 The Hardman index, 516 The prostate cancer prevention trial (PCPT), 844 The recruitment team, 76 The Trim and Fill Method, 435–436 The Yerkes–Dodson law, 143 Thoracic vascular disease, 891–892 Thoraco-abdominal aortic aneurysm repair, 860 Thoracoscopy, diagnostic, 807 Thyroid cancer multiple endocrine neoplasia, 909 RAF proteins, 908–909 RET PROTO-oncogene, 908 Thyroid disease diagnostics, 909–910 imaging, 909–910 PET CT scanning, 910 Thyroid surgery instrumentation, harmonic scalpel, 907–908 minimally-invasive video-assisted thyroidectomy (MIVAT), 907 robotics, 906–907 training, 910–911 Time-dependent covariates, 503–504 Time lead bias, 432 Timescale, 496 Tissue distraction, 925–926
Index Tissue expansion, 925 Tissue microarrays (TMAs), 956 Title, 570 Tracking, 537 Trail enrollment, 63–64 Training, 789–790 certification, 308–309 programme, 28 Transformations, 458–460 Translational discovery research, 713 Translational surgical research, 3–5 Transmission biological, 279 chemicals, 279 radiation, 279 Transparent approach to costing full economic cost methodology (TRAC FEC), 681 Treatment networks, 16–18 Trial costs, 64 Trial design, 56–57 Trial eligibility criteria, 63 Trial process of randomisation, 63 Trial registries, 438 Trial sample size, 63–64 Triangulation in qualitative research, 253 True negative, 88, 89 True positive, 88, 89 Tumor suppressor genes, 976 Tumour, invasion, 806 TUNEL assay, 954 Type 1 error, 39–40 Type 2 error, 39–40 Types of publication bias, 431–432 U UK Research Office (UKRO), 680 Ultrasound, 536 Unimodal distribution, 448 Univariate sensitivity analysis, 403 University clinical partnership (UCP), 701 University Owned Health Science Centres, 710 Upper gastrointestinal (UGI) barium examination, 807 computed tomography, 808 debates, 801–802 endoscopic ultrasound, 808 global molecular profiling, 810 magnetic resonance (MR), 808 positron emission tomography (PET), 808–809 prognostic indicators, 808–809 staging, 808–809 surgical skills training, 809 survival, 810 training, 809 volume outcome, 809 Urology diagnostic imaging dynamic contrast-enhanced MRI (DCE-MRI), 842 magnetic resonance spectroscopic imaging (MRSI), 842–843 PET, 841–842
Index PET-CT, 841–842 point-resolved spectroscopy (PRESS) voxel excitation, 842 ultra-small super-paramagnetic iron oxide (USPIO), 843 ultrasound, 843–844 metabonomics, 840 tissue engineering, 840–841 V Validation, 108, 490–491 Value of information (VOI), 421–427 Variables external, 503 internal, 503 Variance, 444 Vascular biochemistry and haematology outcome models (VBHOM), 515 Vascular disease prevention, 892 Vascular surgery diagnostics, 888 growth factor therapies, 886 imaging, 888 living dermal substitutes, 886–888 stem cell therapies, 888 tissue repair, 886 training, 888–889 Venous disease, 885, 892 Ventricular assist devices, 860–861 Video-based assessment in surgery, 119–120 Viral infections, 970–971 Virtual organization, 704 Virtual reality (VR), 335–336, 532 Virtual reality simulators as assessment devices, 120–122 Virtual worlds, 327, 328
1019 Visualisation, 537 Voice over Internet Protocol (VOIP), 327, 328 Volume of evidence, 15 Volume-outcome assessment ethics, 205 funnel plot, 198 institution, 203 methodological framework, 197–201 outcome measures, 201–202 policy, 204–205 public health, 204–205 quality of life, 202 readmission rates, 201 research, 205 surgeon, 202–203 W Waste disposal, 281 Web 2.0, 335 Web-based clinical resources, 333 Weibull hazard function, 501, 502 Weight models, 436–437 Wellcome trust, 680 Western blot, 959–961 White light reflection imaging, 541 Willingness-to-pay (WTP), 416, 422, 423 Word processing, 324–325 Workforce challenges, 755–757 Wound care, 892 Wound debridement, 926–927 Wound management, 926–927 X X-ray, 530