EVIDENCE BASED DENTISTRY
0011–8532/02 $15.00 .00
WHAT IS EVIDENCE BASED DENTISTRY? Gary R. Goldstein, DDS The volume of literature and lectures directed at the modern dental practitioner has created some problems. How does one resolve the often contradictory information? How does one determine what is a cuttingedge technique and what is useless? In resolving a clinical decision, evidence rather than empiricism should dictate treatment. Evidence based dentistry (EBD), based on the concepts developed at MacMaster University,13, 14, 17–22 presents guidelines to determine the validity of study results and whether they can be applied to clinical practice. The foundation for evidence based practice was laid by David Sackett who has defined it as ‘‘integrating individual clinical expertise with the best available external clinical evidence from systematic research.’’23 Evidence based dentistry supplies guidelines to help the clinician make an intelligent decision. In and of itself, EBD does not give definitive answers. It does not exchange the tyranny of the expert for the tyranny of the literature. As Sackett’s definition states, EBD relies first on clinical expertise. This expertise is especially critical in dentistry, where the number of randomized, controlled clinical trials and prospective cohort studies is limited. In a perfect world, full of quality prospective studies, one would only have to pull up a well-performed metaanalysis or systematic review of the evidence on the clinical question to solve the problem at hand. Unfortunately, these studies are too few, and clinicians must apply the best available evidence to make a decision. The Cochrane Collaboration, an international nonprofit organization whose goal is to make up-to-date, accurate information on the effects of health care available worldwide, has an Oral Health Group that has produced some systematic reviews. Their web site (http://hiru.mcmaster. ca/cochrane/default/htm) is an excellent place to see what the evidence based dental practice in the future will be like. From the Department of Prosthodontics, New York University College of Dentistry; and Department of Dental Material Science, New York University Graduate School of Arts and Sciences, New York, New York
DENTAL CLINICS OF NORTH AMERICA VOLUME 46 • NUMBER 1 • JANUARY 2002
1
2
GOLDSTEIN
The internet has made it easy to initiate an evidence based practice (see article by Felton on page 45 of this issue). Guidelines for EBD are applicable to peer-reviewed literature and also to publications and lectures that provide a case report or, at best, a case series done under conditions that may not be similar to those seen in the average dental office. Armed with the tools of EBD, the clinician can readily evaluate the mass of data and choose, in an educated manner, what to use and what to discard. Unfortunately, most of what is seen in dentistry is product testing done in laboratories, not operatories. The studies are usually univariate analyses, because the researcher has been trained to homogenize the study so that only one variable is tested. Clinicians, however, live in a multivariate environment. For example, an in vitro study on a dental cement might deal with retention of castings on extracted teeth. Retention, however, is not the only variable that a clinician evaluates in choosing a cement. A clinician must also be concerned with postoperative sensitivity, film thickness, setting time, working time, longevity, ability to clean up, setting expansion, and so forth. One might also wonder how good the retention would be in a clinical milieu where isolation, crevicular fluid, saliva, and intraoral humidity become confounding variables. Clinicians, seeing only one variable tested, should be reluctant to change their cement based on the limited laboratory study. Needed instead are controlled, long-term clinical trials to help clinicians make decisions, but such studies are expensive and require a long time to supply the information. Chambers questioned whether ‘‘there is clinical evidence showing that this restorative material will last longer in patient’s mouths then it will be on the market’’ (see article by Chambers on page 29 of this issue). Using EBD is quite simple:3 1. 2. 3. 4. 5.
Create an answerable question. Track down the best evidence to answer the question. Critically appraise the information. Apply the results to one’s patients. Evaluate one’s performance.
The Journal of Prosthetic Dentistry has published a series similar to the User’s Guide to the Medical Literature,13, 14, 17–22 specific to dentistry, to help appraise the information.1, 2, 5, 6, 8, 11, 12, 15, 16 Although the guidelines differ for the different clinical question being asked, certain characteristics pertain to all studies. THE USE OF EVIDENCE BASED DENTISTRY IN DETERMINING THERAPY Was the Assignment of Patients to Treatment Randomized? Randomization eliminates allocation bias. In theory, randomization ensures that variables, over which the study has control and the un-
WHAT IS EVIDENCE BASED DENTISTRY?
3
known variables that come in to play in all studies, are equally distributed among the test groups. To ensure equal distribution, the study population (N) must be sufficiently large. A randomized controlled trial (RCTs) is considered the optimal research design and is the reference standard for most clinical questions. Not all RCTs, however, are properly planned and carried out. The reader must still examine the methodology. Also, as Sackett concluded, ‘‘some questions about therapy do not require randomized trials (successful interventions for otherwise fatal interventions) or cannot wait for the trials to be conducted. And if no randomized trial has been carried out for our patient’s predicament, we follow the trail to the next best external evidence and work from there.’’23 Feinstein9 has questioned the blind faith often put in randomized trials and has suggested that prognostic stratification is critical to the utilization of the data. He maintains that if data are to be evaluated in prognostic subgroups, those subgroups should be identified, where possible, before the study starts, and that subjects should be allocated to those subgroups before they are randomly allocated to treatment.10 For example, in a study on implants in which the site (anterior mandible versus posterior maxilla) is a major variable, it would be sensible to identify the site before randomizing to ensure that chance alone does not place most of the anterior mandibles in one group and most of the posterior maxillae in the other. Another potential confounder would be smoking. Although it would be unwieldy, if not impossible, to identify every possible variable, certain dominant ones known to affect the outcome of the therapy should be identified at the start of the project. Were All Patients Who Entered the Trial Properly Accounted For and Attributed For at its Conclusion? It is critical that all patients who enter a trial are properly accounted for at its conclusion. It is not enough to say that a certain number of patients dropped out. One must include the dropouts in the statistical analysis (see article by Clive on page 137 of this issue). The most common reason patients drop out of a therapy trial is because they are unhappy with the therapy. Some subjects die, and some move out of the area, but the number in these categories should be relatively equal in the control and test groups. If the drop-out rate exceeds 20%, the clinician should be concerned about the external validity or generalizability of the project. Were Patients, Their Clinicians, and Study Personnel Blinded to Treatment? Blinding means that someone was not aware of the treatment being rendered. Double-blinded means that both the evaluators and the patients
4
GOLDSTEIN
were unaware of the therapy being rendered. Blinding is easily done in a drug trial in which the pills look and taste the same and the patient is identified only by a code number unknown to the evaluator looking at the outcome. Blinding can also be easily done in a study of toothpastes or mouthwashes. It is not always possible to blind a clinical trial. For example, in a study comparing implant-retained overdentures with either two or four fixtures in place, it would be impossible to blind the patient or the researcher if intraoral examinations were necessary. Although a nonblinded trial is not ideal, it can still be an excellent experiment that can generate usable, reliable data.
Were the Groups Similar at the Start of the Trial? To ensure validity, it is critical that the cohorts (groups) be similar in all pertinent demographic, medical, and dental factors. Although in a large study randomization should ensure equivalence, it is the investigators responsibility to assess equivalence among cohorts in detail.
Aside From the Experimental Intervention, Were the Groups Treated Equally? Anything one studies, one alters. Patients who agree to participate in a study tend to be more compliant than the average. Knowing they are to be examined may cause them to exercise better home care before presenting in an effort to please the investigator. It is tempting for investigators to recall a test group more often when the outcome is uncertain or side effects are suspected. Co-interventions, such as an extra prophylaxis, can affect the primary outcome being examined and the validity of the study. All groups need be treated equally.
Were All Clinically Important Outcomes Considered? The reader must decide whether all clinically important outcomes have been considered. If, for example, in evaluating a new cement for ceramic restorations, the investigator reports only that the restoration was in place after the time of the study, it is obvious that other important considerations have been ignored. If the investigator also evaluates postoperative sensitivity, film thickness, setting time, working time, longevity, ability to clean up, setting expansion, and so forth, the important clinical factors have been evaluated. More commonly, the investigation might evaluate only two of the factors. Some clinicians would find the study adequate; other readers might not. An implant study, for example, might speak of prosthesis stability and neglect the number of implants
WHAT IS EVIDENCE BASED DENTISTRY?
5
remaining. If six implants were placed and three were lost, the prosthesis might be stable, but the clinician has cause to question the data.
Was Follow-up Sufficiently Long and Complete? Too often a study is not long enough to be valid to the clinician (chronology bias). Although a 1-year follow-up may be sufficient in a study of the efficacy of tetracycline-impregnated cord, the same followup time is not adequate in a study on a new composite resin restoration. For restorative procedures, a minimum of 3 to 5 years may be necessary to convince a dentist to change therapy.
Were Objective and Unbiased Outcome Criteria Used? Outcome criteria are chosen by the investigator, and it is easy to err by choosing an assessment that best serves the theory of the investigator. The adage, ‘‘I would not have seen it if I didn’t believe it,’’ readily comes into play. Picture a study that compares a Lexus with a Yugo and chooses the following criteria for the study: Does Does Does Does Does
it it it it it
have have have have have
an engine? a radio? four wheels? windshield? seat belts?
Using these criteria, one concludes that the Lexus and Yugo are similar. Any rational person, however, clearly sees that results based on questionable outcome assessments are useless. In more sophisticated studies, such a flaw may not be so obvious.
Will the Results help Clinicians in Caring for Their Patients? The critical question for clinicians is whether the results will help them provide better care for their patients, because that question involves all the others. If the methodology is good, if the statistically significant results have clinical relevance, and if the data interpretation is rational, one would lean towards accepting the study. If, however, the population is not representative of a clinician’s practice or if the inclusion and exclusion criteria do not match the practice population, clinicians should be hesitant about applying the results to the population they are treating.3
6
GOLDSTEIN
USING EVIDENCE BASED DENTISTRY TO EVALUATE THE NEED FOR A DIAGNOSTIC TEST Was There an Independent, Blind Comparison with a Reference Standard? A gold (reference) standard is important. In histopathology, the biopsy is considered the gold standard, but even the biopsy does not result in 100% agreement among pathologists. The disagreement is magnified when the pathologists are deprived of the clinical findings supplied by the surgeon. If a reference standard exists, one might question the need for the new test. If the test cannot offer the advantages of being less expensive, or less invasive, or easier to perform, one should question its use. Unfortunately, often there is no reference standard, or the reference may be controversial. Lack of a suitable reference standard does not mean that the new test is not useful, but a heavier burden of proof is demanded from the investigator, and the clinician must exercise more caution. Were the Methods for Performing the Test Described in Sufficient Detail to Permit Replication? If the reader cannot perform the test, it is of no use. Were Sensitivity, Specificity, Positive Predictive Value, Negative Predictive Value, and Likelihood Ratios Presented? It is not the reader’s responsibility to undertake statistical analysis when reading an article. Rather, it is the researchers’ obligation to supply the appropriate data (see article by Brunette on page 87 of this issue). Because EBD puts the onus of decision making on the clinician, readers must be familiar with the terms so they can determine if the new test would have merit in their practices. Will the Patient be Better Off as a Result of the Test? Routine testing, if it does not affect the diagnosis, prognosis, or treatment, has questionable value. If the results do not potentially change the course of treatment, the test is unnecessary. A patient who fell and knocked out the coronal portion of a tooth would benefit from a radiograph to determine the extent of the fracture but not from a pulp test to determine vitality. An adolescent with an ulceration from biting the cheek would be better served by a reexamination in a week rather than by a biopsy. Evidence based dentistry will surely be abused.4 Insurance compa-
WHAT IS EVIDENCE BASED DENTISTRY?
7
nies have already developed evidence based care policies that require dentists to prove that patients need the services.3 The possibility of abuse does not mean that dentistry should reject EBD. Indeed, dentists have been practicing EBD, in part, for many years. When clinicians tell patients to brush and floss, they do so because the evidence supports the efficacy of these interventions. When dentists advocate fluoride, they do so because the evidence supports its efficacy. Although many areas of dental practice are supported by numerous high-quality research projects, many more areas are supported only by anecdotal data. Hence, the validity of the data and who evaluates it become critical. Aurbach4 has questioned: ‘‘Who will be the anointed one or group that determines which evidence is valid? Who will set the research agenda and determine where the results will be maintained? Who will validate the research? Who will maintain the data base to make sure that it is up to date? How will the results be used?’’3
It is obvious that to control the data, clinicians need to own it. If clinicians are not sophisticated enough to force good research practices by their ability to evaluate and reject poor science, they will be at the hands of third parties who can use dubious research as justification to control clinicians’ practices. The sooner dentistry as a profession universally embraces EBD, the sooner the profession will command the use of research and prevent its misuse. WHAT EVIDENCE BASED DENTISTRY IS NOT Evidence based dentistry is not a veil to mask the same old, inadequate research. It is disturbing to see lecturers invoke EBD and present the same anecdotal lectures they gave before, with different slide titles. As the profession of dentistry becomes more sophisticated, researchers and lecturers will be forced to grow also. Evidence based dentistry does not take the clinical decisions out of clinicians hands and put them into the hands of the literature. In fact, the opposite is true. Evidence based dentistry gives guidelines for the clinician and relies first on clinical expertise. Evidence based dentistry does not mean that third parties will control dental practices. In fact, educated dentists, understanding the literature, will be able to prevent the misrepresention of data by commercial interests. Evidence based dentistry does not mean the clinician need not study basic and dental material sciences. In fact, the opposite is true. To evaluate the research presented, clinicians need a solid background on which to base their evaluations and decisions. Evidence based dentistry does not mean clinicians abandon everything they learned in dental school. It does not force clinicians to go backwards to justify things the profession universally accepts.
8
GOLDSTEIN
WHO BENEFITS FROM EVIDENCE BASED DENTISTRY • The ultimate beneficiaries of EBD are members of the public, who will reap the rewards of better care. The internet allows patients, as well as professionals, access to health care information. The public, however, does not have the tools to evaluate the data adequately and must rely on their educated dentists to help sort fact from fiction. Patients will be more educated, more involved in their treatment decisions, and more appreciative of quality care. • Dentists, who will also benefit from EBD. Instead of conducting free product testing for dental product manufacturers, practitioners will have at their disposal more valid research on which to predicate their clinical decisions. • Researchers, who will benefit by being called upon to do the clinical testing necessary before new products are placed on the market. References 1. Anderson JD: Need for evidence based practice in prosthodontics. J Prosthet Dent 83:58–65, 2000 2. Anderson JD, Zarb GA: Evidence based dentistry: Prognosis. J Prosthet Dent 83:495– 500, 2000 3. Anderson V: Evidence based care, is the defense ready? Dental Economics 28–32, 2000 4. Aurbach FE: Evidence based dentistry: A practitioner’s perspective. J Am Col Dent 66:17–20, 1999 5. Carr AB, McGivney GP: Measurement in dentistry. J Prosthet Dent 83:266–271, 2000 6. Carr AB, McGivney GP: Users’ guides to the dental literature: How to get started. J Prosthet Dent 83:13–20, 2000 7. Chambers D: Research for practitioners or research for researchers? J Am Coll Dent 65:2–4, 1998 8. Eckert SE, Goldstein GR, Koka S: How to evaluate a diagnostic test. J Prosthet Dent 83:386–391, 2000 9. Feinstein AR: An additional basic science for clinical medicine: II. The limitations of randomized trials. Ann Intern Med 99:544–550, 1983 10. Feinstein AR: An additional basic science for clinical medicine: III. The challenges of comparison and measurement. Ann Intern Med 99:705–712, 1983 11. Felton DA, Lang BR: The overview: An article that interrogates the literature. J Prosthet Dent 84:17–21, 2000 12. Goldstein GR, Preston JD: How to evaluate an article about therapy. J Prosthet Dent 83:599–603, 2000 13. Guyatt GH, Sackett DL, Cook DJ: Users’ guides to the medical literature. II. How to use an article about therapy or prevention. A. Are the results of the study valid? Evidence-Based Medicine Working Group. JAMA 270:2598–2601, 1993 14. Guyatt GH, Sackett DL, Cook DJ: Users’ guides to the medical literature. II. How to use an article about therapy or prevention. B. What were the results and will they help me in caring for my patients? Evidence-Based Medicine Working Group. JAMA 271:59–63, 1994 15. Jacob RF, Carr AB: Hierarchy of research design used to categorize the ‘‘strength of evidence’’ in answering clinical dental questions. J Prosthet Dent 83:137–152, 2000 16. Jacob RF, Lloyd PM: How to evaluate a dental article about harm. J Prosthet Dent 84:8–16, 2000 17. Jaeschke R, Guyatt G, Sackett DL: Users’ guides to the medical literature. III. How to
WHAT IS EVIDENCE BASED DENTISTRY?
18.
19. 20. 21. 22. 23.
9
use an article about a diagnostic test. A. Are the results of the study valid? EvidenceBased Medicine Working Group. JAMA 271:389–391, 1994 Jaeschke R, Guyatt GH, Sackett DL: Users’ guides to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? The Evidence-Based Medicine Working Group. JAMA 271:703–707, 1994 Laupacis A, Wells GA, Richardson S, et al: Users’ guides to the medical literature V. How to use an article about prognosis. JAMA 272:234–237, 1994 Levine M, Walter S, Lee H, et al: Users’ guides to the medical literature IV. How to use an article about harm. JAMA 271:1615–1619, 1994 Oxman AD, Cook DJ, Guyatt G: Users’ guides to the medical literature VI. How to use an overview. JAMA 272:1367–1371, 1994 Oxman AD, Sackett DL, Guyatt GH: Users’ guides to the medical literature. I. How to get started. The Evidence-Based Medicine Working Group. JAMA 270:2093–2095, 1993 Sackett D, Richardson WS, Rosenberg W, et al: Evidence based Medicine: How to Practice and Teach EBM. New York, Churchill Livingstone, 1997 Address reprint requests to Gary R. Goldstein, DDS NYU College of Dentistry Division of Restorative and Prosthodontic Sciences 345 East 24th Street Clinic 5W New York, NY 10010–4086 e-mail:
[email protected]
EVIDENCE BASED DENTISTRY
0011–8532/02 $15.00 .00
THE QUESTION James D. Anderson, BSc, DDS, MScD
HOW QUESTIONS ARISE There are two aspects to the clinical practice of dentistry. The surgical component includes all the manipulation of hard and soft tissue that is performed every day in dental practice. Examples are tooth preparation and restoration, scaling, orthodontics, and prosthesis fabrication. The other element involves decision making. The diagnosis of unlocalized dental pain, the prognosis for a periodontally compromised tooth, the choice of posterior restorative materials, and the risks/benefits assessment of third molar extractions are examples. Early in the career, decision making may be the most difficult aspect of clinical practice. There is an overwhelming array of choices with little or no structure on which to build an approach to solving the problems. As a practitioner gains experience, he or she acquires the advantage having seen the results of previous decisions, good and bad, and can recall how a problem was dealt with previously. The practitioner also develop habits that make each task easier. Habits, too, are the result of decisions made but not re-examined. As a start, the thoughtful practitioner will ask first if there is a compelling reason to intervene for a patient, and second if there is a compelling reason to intervene at this time. The answers to these questions can be obvious or elusive. The patient who has severe, throbbing pain and tender swelling over the apex of a heavily carious lateral incisor with a large periapical radiolucency clearly needs treatment and needs it promptly. On the other hand, whether or when to treat the young patient with impacted but asymptomatic third molars is less
From the Faculty of Dentistry, University of Toronto; and the Craniofacial Prosthetic Unit, Toronto-Sunnybrook Regional Cancer Centre, Toronto, Ontario, Canada
DENTAL CLINICS OF NORTH AMERICA VOLUME 46 • NUMBER 1 • JANUARY 2002
11
12
ANDERSON
obvious. With experience, practitioners build up a mental library of circumstances that can be recognized when next encountered. This is practice by pattern recognition. Because of the infinite variety in the combinations of circumstances encountered every day, the choices made are commonly extensions of previous experiences. For example, the extension of resin-bonded prosthesis designs from the front of the mouth to the posterior segments is logical, provided provision is made for the extra occlusal load. When no previous experience is available as a guide, a knowledge of basic biologic principles can guide decision making. For example, for an edentulous patient who has had a maxillectomy, ensuring that the design for a denture includes bilateral support will guide the impression procedures. Decision making in clinical practice thus is supported by pattern recognition when experience exists. When experience does not exist, the practitioner falls back on extensions from previous experiences or inferences from basic biologic principles. Continuing education guides and reinforces these strategies. A comfort level develops, which is the confidence one gains with years in practice. All these approaches are molded by the single practitioner’s clinical and educational exposure, that is, by one person’s sample of the profession’s accumulated knowledge and judgment. Because the practice behaviors of dentists are highly divergent, there is clearly great variation in each practitioner’s sample of knowledge and experience. Hence, the decisions reflect different biases and knowledge gaps among different clinicians. This consequence is the problem that evidence based practice (EBP) is intended to address. The first step in EBP is to acknowledge that such gaps exist in one’s personal knowledge and experience. Or as Will Rogers put it, ‘‘Everybody is ignorant, only on different subjects.’’ WHICH QUESTIONS? In the flow of daily practice, virtually no decisions are made in a complete information vacuum. (Such decisions would best be made with the flip of a coin.) When there is no definitive information on a given problem, there is nearly always some influence, whether it be patient preference, the practitioner’s knowledge of basic biologic principles, or the practitioner’s habits. Decisions are made, therefore, without empiric information about the consequences of the decision. For example, is endodontic treatment and full-coverage restoration of a nonvital molar more cost-effective than extraction and replacement with an implantsupported prosthesis? If practitioners recognize that they do not have empiric evidence on a current problem, suddenly the practice day becomes filled with uncertainty, even for the experienced practitioner. As in medicine, this uncertainly is, in fact, the nature of dental practice. The practitioner must decide which questions to pursue in the limited time available. Clearly, the thoughtful practitioner will seek evidence to answer
THE QUESTION
13
questions that directly affect patient management. Doing so is ethical practice: it puts the patient’s perspective on the problem ahead of the practitioner’s. The patient may want to know if chewing will be easier with a fixed implant-supported prosthesis than with an implant-supported overdenture. The practitioner, on the other hand, may be more concerned with implant survival. So, the first criterion in selecting which questions to pursue is to choose questions from the patient’s perspective. The fact that the question has arisen means that it can arise again, so the second criterion suggests that practitioners seek evidence on questions that assist in staying current and in preparing for the next occasion. Often in the pursuit of this information, however, the literature does not provide a definitive answer. To ration time effectively, the third criterion suggests choosing the questions that are most likely to yield a clear answer. Of course the searcher cannot know in advance whether the answer is available to be found. Common problems, however, are more likely to have a better body of literature than rare problems. Finally, of course, the searcher should choose interesting questions that spark the learning process. WHY BOTHER? For the Patient As noted previously, the patient’s questions and the practitioner’s questions are not always the same. Articulating the question makes it more likely that the practitioner’s quest for scientific information will correspond with the patient’s perception of what is important. Thus, there is better opportunity to include in the question issues that balance the potential for good with risk of harm. Similarly, the question should reflect the patient’s wishes and priorities, concerns about costs, and cultural issues. An implant-supported fixed reconstruction cannot be done for an edentulous patient without significant surgical procedures and considerable discomfort and cost over a prolonged period of time. The patient should expect that the additional discomfort, costs, and time taken will yield a worthwhile extra benefit in terms of comfort, chewing efficiency, and appearance beyond conventional dentures. In addressing these concerns, the practitioner can easily be sidetracked into surrogate outcomes that do not provide a direct measure of success for the patient. For example, in the landmark 15-year report of implant success by Adell and others,1 the authors reported rates of continuously stable prostheses as high as 100%. Significant numbers of patients, however, had to be reoperated on as many as three or more times to maintain continuous prosthesis stability. Although reoperation is less common now, implant treatment is still not without such risks, and they may be of primary concern to the patient. A clearly articulated question that probes such issues focuses the treatment priorities for the patient and assists the provider in offering appropriate counsel on the potential for harm.
14
ANDERSON
For the Searcher The most direct approach to finding the answer to a clinical question is to telephone a colleague and ask. Doing so doubles the sample of knowledge and experience that is brought to bear on the problem. Given the variety of practice decisions that are made worldwide, however, this sample is still is unimpressive. There remains, also, the specter of the blind leading the blind. With the availability of easy access to the worldwide literature, there is now no reason why that vast resource of information cannot be applied to the individual clinician’s patient problem, other than the clinician’s inability to use it effectively. So perhaps the clinician should waste no time in getting to the literature to hunt down the evidence. The problem with this approach is that numerous articles will likely be found that seem to address the clinical issues. As a result, time will be wasted going through them to find the one that deals most directly with the issues and provides the strongest evidence. Thus, for finding the best evidence, there are two advantages to taking the trouble to articulate a carefully crafted clinical question. One relates to efficiency in constructing a search, and the other relates to reviewing the found titles as quickly as possible. By carefully crafting a question, the searcher learns to be more specific. The search terms selected for the search become more specific and thus are more likely to exclude concepts that are peripheral to the central point. More precise selection is likely to influence the choice of outcome measure, that is, the result desired by the patient or the outcome the patient seeks to avoid. When these issues are articulated carefully, the search terms will yield a smaller number of articles whose titles and abstracts must be reviewed individually. Similarly, a carefully crafted question provides criteria against which found articles can be reviewed for closer inspection. As the titles and abstracts of articles are scanned, the searcher is asking, ‘‘Do I want to read this article in detail?’’ If the answer is no, the searcher wants that answer quickly, to be able to proceed to the next article. Having the criteria enunciated clearly in the question facilitates a quick judgment. Here again, the choice of outcome measures is often critical. Articles that address the same problems as those being researched using the same interventions but recording different outcomes are of general interest but are not necessarily relevant. Being able to ascertain quickly that the outcome reported is not the outcome of interest allows the searcher to move on to the next article more quickly. Another advantage of articulating a clearly defined question can be found in the communication between cooperating providers. In referring patients to specialists, general practitioners can focus the attention of the specialist and at the same time circumscribe the specialist’s responsibility. It therefore is easier for the general practitioner to fulfill the duty to coordinate specialist services. Finally, a significant benefit of taking the trouble to frame clinical questions is the opportunity to organize the questions for later reference.
THE QUESTION
15
Lee et al6 suggest the development of critically appraised topics (CATS) that form a personal library of answers to clinical questions that have arisen. Of course, such a library needs to be updated from time to time, but it serves as a starting point for future searches and at the very least provides a compendium of accumulated best evidence on issues already encountered. TYPES OF QUESTIONS To fill the knowledge gaps, the busy practitioner needs a strategy to yield the greatest return in information in the least amount of time. The earlier questions relating to unlocalized pain, periodontally involved teeth, posterior restorative materials, and third molars are vague. They do not define what the practitioner really wants to know about those issues. Sackett et al9 suggest that a searcher might want to obtain either background information or foreground information. Background information relates to a general understanding of a disorder, test, treatment, product, other matter. For example, questions such as, ‘‘What is the wear rate of this posterior composite material?’’ or ‘‘What are the nerve pathways responsible for unlocalized pain?’’ are background questions. These questions usually have two components. They start with who, what, where, when, why, or how and a verb that connects them to the item of interest. Foreground questions, on the other hand, are more specific and relate to the management of the patient. For example, ‘‘In patients with unlocalized dental pain, is a cold test more sensitive than an electric pulp test in identifying a pulpitis?’’ or, ‘‘In patients with asymptomatic impacted third molars, will removing the teeth cause greater loss of bone support at the distal of the second molars than not removing them?’’ are foreground questions. These questions usually have four components: (1) a population; (2) an intervention; (3) an alternative intervention; and (4) an outcome (the result of the test, treatment, or exposure). The patient is a member of a population that is usually described by demographics, diagnosis, symptom, or exposure. The patient, for example, may be a man in his fifties, who is a smoker, with a complaint of loose teeth. Some of these factors may be irrelevant, but the relevant factors are the features that define the population of interest. An intervention describes the action being considered, which usually is a diagnostic test, a treatment, or an exposure. The alternative intervention serves as a reference against which the test or treatment of interest is compared. One might, for example, compare fixed implant-supported prostheses against implant-supported overdentures. Finally, the outcome is the result sought from the test or treatment or the unhappy event one wishes to avoid, such as a diagnosis of apical periodontitis, or chewing efficiency, or implant failure.
16
ANDERSON
At any time the searcher may need answers to both background and foreground questions. As students, practitioners asked many background questions to learn the biologic principles, disease processes, and properties of materials. Experienced practitioners, dealing with all the combinations of circumstances encountered in practice, are more interested in practical management issues that need to be specifically defined. Framing a Question An example illustrates the usefulness of framing a clinical question as an aid to retrieving an answer quickly. A dentist saw his edentulous patient on annual follow-up 2 years after inserting fixed, implant-supported prostheses. The patient complained of discomfort at one of the implants in the mandible. On examination, the implant was found to be loose and had to be removed. The clinician now is unsure whether the prosthesis can be expected to continue to function on the four remaining well-distributed implants. The alternative is to tell the patient that the remaining implants are too few to support the prosthesis. Preservation of the remaining implants may require that new implants be inserted and the prosthesis be remade or at least heavily modified. This alternative is an invasive, costly, and time-consuming solution that the patient seeks to avoid. The patient asks the dentist if he is more likely to lose his prosthesis if he continues to function with just four implants.
The clinician converts the patient’s problem into a question: ‘‘In edentulous patients with fixed implant-supported prostheses, is the risk of implant failure greater when it is supported by only four implants than when it is supported by more implants?’’ The population is made up of edentulous patients who have implant-supported prostheses. The intervention in this case is an exposure to the use of just four implants. The alternative is the use of more implants (with the obvious implications for surgery, cost, time, discomfort, and so forth) The outcome is implant failure, which could be defined in many different ways. These phrases of the question will directly steer the choice of terms in the search strategy and the assessment of the found titles. HOW A QUESTION STEERS A SEARCH Specific Definition of Search Terms Using the concepts defined in the question, the clinician searches MEDLINE by first entering the term edentulous as a descriptor of the patient population. The software maps the term to ‘‘jaw, edentulous’’ and ‘‘mouth, edentulous,’’ both of which describe the situation of concern. The next term to enter describes the population in more detail, that is, those having an implant-supported prosthesis. This term maps
THE QUESTION
17
to several Medical Subject Heading (MeSH) terms that describe this patient, including ‘‘dental prosthesis, implant-supported’’ and ‘‘dental implants.’’ These terms are relevant to the problem, so both are selected. Exposure to four implants sounds like a narrow circumstance that could not be easily generalized in a search of the literature. Because the situation of four implants is of interest, however, the number four could be entered as a text word. The search software will then look for all occurrences of the word four (and words containing ‘‘four’’) in the titles and abstracts. Finally, the searcher enters a term describing the outcome measure, which is ‘‘implant failure.’’ The software maps this term to ‘‘prosthesis failure.’’ Combining all these terms yields no information in the current database. Repeating the search in the 1993–1996 database yields five articles that may answer the patient’s question.
Skimming Titles and Abstracts in Found Literature The clinician now wants to scan the found titles and abstracts quickly to identify the best one or two articles that are most likely to answer the patient’s question. Here, again, the details of the question facilitate the process. Each title (and abstract, if necessary) is scanned, and the content is compared with the population, maneuvers, and outcomes articulated in the question. Of the five titles found in the search, the first is a case series by Leimola-Virtanen7 that followed four implants in the mandibles of 39 patients for 3 to 10 years. Implant and prosthesis success rates are provided. This article thus seems to address the patient’s question quite closely, except the prostheses used were denture prostheses, not fixed prostheses. In addition, being a case series, the article offers no control against which to compare the success rates found in the patients with only four implants. This article therefore is not a strong piece of evidence to use in answering the patient’s question. The title of the next article, by Jemt and Lekholm,5 seems to deal more with varying amounts of remaining bone. Nothing is said about the number of implants or prosthesis type used. A quick check of the abstract against the criteria in the question confirms that this article will not help answer the question. The title of the third article describes a study by Bra˚nemark and others2 that compares the use of four implants against six implants in edentulous patients. By the title alone, this article seems to satisfy two of the criteria specified in the question. A check of the abstract reveals it to be a study that uses a cross-sectional design that provides a control group to assess the success rates in the four-implant group against a control group with more implants. This article thus provides much stronger and more focused evidence of the implant and prosthesis success rates that could be expected when only four implants are available. The fourth article by Zarb and Schmitt10 provides a title and abstract
18
ANDERSON
that are too vague to identify the details of either the maneuvers or the outcomes. With the relatively focused article by Bra˚nemark et al available, there seems little value in retrieving this article and reading it in detail. Finally, the title of the fifth article, by Jemt and others4 suggests that the article deals with overdentures exclusively and thus is not relevant to the patient’s problem. This review of the found titles has revealed an article that seems to address the practitioner’s question directly and provides a study design that permits useful comparisons of success rates to support an answer to the patient. Although the evidence is not compelling (the study is not a randomized trial), it is the best available evidence that bears directly on the question. The patient can thus be informed that leaving his prosthesis to function on four implants is unlikely to pose greater risk of implant or prosthesis failure than there was when there were more implants. The patient is thus spared the time, cost, and discomfort of further implant surgery while avoiding any extra risk of failure. SUMMARY This exercise of isolating the strongest article from the found titles should take no more than 1 to 2 minutes. Thus, the whole process of searching for the best evidence should take no more than 5 minutes. In medical practices where evidence based practice is done routinely, this process can be completed in less than 1 minute.8 Obviously, the evaluation could not have been made as expeditiously without the benefit of the specific details articulated in the question. The question focused the search terms and expedited the identification of the strongest evidence that directly addressed the patient’s problem from among the found titles. It provided the dentist with good (but not compelling) evidence to support an answer to the patient. It also provided the dentist with a new piece of information to use the next time the problem of reduced implant support comes up. The dentist has thus enjoyed the satisfaction of quickly identifying new knowledge and the confidence that comes with its use. In addition, the information has provided the dentist with a small but important block against the deterioration of clinical judgment skills. References 1. Adell R, Lekholm B, Rockler B, et al: A 15-year study of osseointegrated implants in the treatment of the edentulous jaw. International Journal of Oral Surgery 6:387–416, 1981 2. Bra˚nemark PI, Svensson B, van Steenberghe D: Ten-year survival rates of fixed prostheses on four or six implants ad modum Branemark in full edentulism. Clinical Oral Implants Research 6:227–231, 1995 3. Elderton RJ: Variation among dentists in planning treatment. Br Dent J 154:201–206, 1983
THE QUESTION
19
4. Jemt T, Chai J, Harnett J, et al: A 5-year prospective multicenter follow-up report on overdentures supported by osseointegrated implants. Int J Oral Maxillofac Implants 11:291–298, 1996 5. Jemt T, Lekholm U: Implant treatment in edentulous maxillae: A 5-year follow-up report on patients with different degrees of jaw resorption. Int J Oral Maxillofac Implants 10:303–311, 1995 6. Lee H, Sauve J, Faroukh M, et al: The critically appraised topic–A standardized aid for the presentation and storage of evidence based medicine. Clinical Research 41:A543, 1993 7. Leimola-Virtanen R, Peltola J, Oksala E, et al: ITI titanium plasma-sprayed screw implants in the treatment of edentulous mandibles: A follow-up study of 39 patients. Int J Oral Maxillofac Implants 10:373–378, 1995 8. Sackett DL, Straus SE: Finding and applying evidence during clinical rounds: The ‘‘evidence cart’’. JAMA 280:1336–1338, 1998 9. Sackett DL, Straus SE, Richardson WS, et al: Evidence based Medicine. How to Practice and Teach EBM, ed 2. Edinburgh, Churchill Livingstone, 2000 10. Zarb GA, Schmitt A: The edentulous predicament. I: A prospective study of the effectiveness of implant-supported fixed prostheses. J Am Dent Assoc 127:59–65, 1996 Address reprint requests to James D. Anderson, BSc, DDS, MScD Faculty of Dentistry University of Toronto 124 Edward Street Toronto, Ontario M5G 1G6 Canada e-mail:
[email protected]
EVIDENCE BASED DENTISTRY
0011–8532/02 $15.00 .00
THERAPY Anecdote, Experience, or Evidence? Gary R. Goldstein DDS, and Jack D. Preston, DDS
How does a practitioner determine what therapy to use? Often, the decision depends on the age of the practitioner and the experiences gained in practice. The younger practitioner depends mainly on what was taught in dental school. All dental schools have a core technique, usually derived by faculty consensus, that allows a student to develop competency in one approach to a therapeutic problem. Trying to teach a novice multiple techniques usually results in the student’s mastering none. Educators have agreed that teaching one technique well allows the student to enter practice and satisfy the needs of the public. Unfortunately, dental schools have been unfairly criticized as teaching outdated and often unrealistic techniques. This criticism is not true. Dental school faculty almost universally teach time-tested and scientifically sound procedures. Ethics dictate that patients in dental schools be protected and not subject to whimsical trends in treatment. Internal review boards mandate that research be structured to ensure the patient’s rights are preserved. The clinician, unencumbered by such constraints, often makes forays into other treatment modalities, some successful, others disappointing. Once in practice, the clinician is influenced by observations based on experience. Such observations, however, are often flawed, and associations thought to be causal are instead, only casual. Anecdotal evidence from colleagues may mold decision-making. With the broad communication now possible using with the internet, such anecdotes may come from a continent away and from a completely unknown
From the Department of Prosthodontics, New York University College of Dentistry; Department of Dental Material Science, New York University Graduate School of Arts and Sciences, New York, New York (GRG); and The University of Southern California School of Dentistry, Los Angeles, California (JDP)
DENTAL CLINICS OF NORTH AMERICA VOLUME 46 • NUMBER 1 • JANUARY 2002
21
22
GOLDSTEIN & PRESTON
person. Conversely, upon graduation some clinicians become comfortable with a particular procedure and may be wary of change. As clinicians expand their knowledge through lectures and by reading journals, they constantly modify their clinical methods. There is always a new restorative material on the market, a new surgical technique, a new piece of equipment, a new toothbrush, and a new toothpaste. Detailers joke that dentists are gadget enthusiasts who buy a product, use it once or twice, and store it in some cabinet, finding it years later and not remembering when, where, or why it was purchased. How do practitioners decide which treatments to use? Often, they are influenced by the prestige of the professor giving the lecture or writing the article. All too often, however, they are seduced by the show rather than by the science. Multiple projectors, enhanced digital presentations, or the glitz of the advertising become the main reasons for change. Companies market directly to the public who, with inadequate ability to evaluate the hype, pressure the practitioner to change therapy, often with inadequate research to justify the change. An example is a patient with an edentulous area who presents with the request for implants. What the patient is really saying is, ‘‘I want teeth.’’ It is the practitioner’s responsibility to understand that the patient is requesting the ability to chew better, speak better, or look better. It is the practitioner’s responsibility to determine the best therapy for that individual patient and to advise the patient of that therapy and any other suitable options. Another example is a patient who, having heard all the hype on tooth bleaching, requests the procedure when the problem is really recurrent decay around old, severely stained composite restorations that need replacement. Any procedure involves some risk, and increasing risk usually accompanies more complex therapy. The practitioner should decrease that risk as much as possible without unduly burdening the patient. Patients have a moral, ethical, and legal right to know the risks and benefits of any therapy that is recommended. Today, information may be obtained from a variety of sources. There are often newer procedures to supplant the approaches documented in textbooks. Reports in peer-reviewed journals are more current, depending upon the source and the publication delays. Today, many practitioners obtain information over the internet, through conversations with other practitioners, and through newsletters and non–peer-reviewed periodicals and journals. These less scientific sources can be useful. For example, the problems of root fracture when cementing dowels and the fracture of porcelain complete-coverage restorations when using the first-generation resin/ionomer cements were first made public in these forums. Regardless of how information is obtained, anyone seeking newer approaches to improve the delivery of dental service must apply the rules of evidence in evaluating a suggested technique. Failure to consider all aspects of a therapy have sometimes proven disastrous (e.g., the teratogenic effects of thalidomide) or merely ineffective after encouraging initial results (e.g., early treatments for
THERAPY: ANECDOTE, EXPERIENCE, OR EVIDENCE?
23
AIDS). Therefore, the alert practitioner walks a tightrope between endangering patients with a therapy that has an undetected accompanying risk and failing to provide optimal therapy that would be of substantial merit. Anyone considering a new course of care must invoke the rules of evidence and evaluate the strengths of a report against its inadequacies. A technique or regimen may have statistical significance but lack clinical merit. How, then, does the practitioner maintain balance on that tightrope and best serve the patient? The rules of evidence have been well established; their benefit lies in their knowledgeable application. A report may not furnish all the information desired, or the data may be reported in such a manner that they are difficult to evaluate. Bias from well-meaning researchers is common, and dentistry is filled with volumes of pseudoscientific reports in which results have been derived from a false premise or a flawed research design. Unfortunately, some established dental procedures have gained acceptance because a charismatic champion of the technique was a convincing advocate. Often, procedures that had merit were based on a falsely attributed cause and have been successful for reasons other than those to which their success has been ascribed. When considering the merits of a report or lecture, the practitioner must clearly understand the purpose of the study and how the investigators sought to establish their premise. The results of the study must relate directly to this purpose statement. Anything not established as a purpose of the study should not be given primary consideration. Subjects enrolled in the study must all have an equal chance to obtain the study parameter (e.g., drug, treatment regimen, material) rather than the alternative approach (e.g., placebo, previously accepted technique or regimen, no therapy). Those in the treatment and alternative groups must be equivalent in all pertinent respects. Before being enrolled in a study, a person should go through a complex screening process that establishes them in the appropriate cohort. Patients have a dental or medical problem and choose a treatment facility. They may enter that facility at different stages of the disease process and hence may have a different prognosis. They may be motivated by cost, location, or the reputation of the facility or the treating doctor. After screening, the patient is referred to the researcher, whose study population is further filtered by the informed-consent or volunteer process. Investigators also tend to include persons perceived to be compliant to ensure their continuance in the study and to rule out apparently less-compliant or difficult candidates. An additional series of eligibility decisions are then made to reduce the population further. Inclusion and exclusion criteria must be clearly established. They are necessary but must be pragmatic and relevant. As investigators cull the potential population using appropriate demographic criteria, they also rule out persons with potentially confounding comorbidities. Ultimately, clinicians must ask if the results are applicable to their patient population. If the sample group is excessively homogenized, the study population may
24
GOLDSTEIN & PRESTON
not be representative of the clinicians’ patients, and the study may have decreased validity. For example, in a well-done study on IPS Empress inlays and onlays, the population (N 130) consisted of 27 one-surface inlays, 38 two-surface inlays, 40 three-surface inlays,8 and 25 onlays.1 A significant percentage of the population consisted of class I restorations; therefore, the data may not be pertinent to a clinician who does not normally perform class 1 restorations. Exactly what a study is to measure must be determined in advance, and the methods of measurement of the effects must be clearly and specifically stated. The precision of the measurements (or the converse, the error of the study) must be established before the study is initiated. It is not enough to know that the microscope used had a precision of 5 m. The ability of investigators to repeat their measurement is crucial. How many persons were involved in making the measurements? Was their equipment calibrated to ensure that the measurements were equivalent? The method by which the study is to be analyzed must also be established a priori. Too often, investigators gather data only to find that statistical analysis is compromised by the procedures used. The outcome assessment must be relevant. Investigators sometimes are encumbered by the dogma that the only legitimate way to do an experiment is to vary one factor at a time.1 This univariate approach is at odds with the multivariate climate in which the clinician functions. For example, in reviewing the current literature for dental luting cements, Rosenstiel et al6 listed 10 different clinically important parameters. A study that concentrates on only one factor may not supply enough information to warrant a change in material. Readers must be acutely aware of the structure of the study before trying to ascertain its applicability to their patients. The design is determined by the direction of inquiry, who determines the therapy, and the presence of a control group. Prospective studies are those in which the therapy is initiated at the start of the study. The advantage of a prospective trial is that, theoretically, the investigator can control all aspects of the treatment and minimize the effect of confounding variables. Retrospective studies are those in which the therapy was initiated before the beginning of the study. The disadvantage of this study design is the inability of the investigator to control inadvertent confounding variables. Studies can be further divided into comparative studies, (also called analytical studies) that have a control group, and descriptive studies with no control group. The hierarchy of evidence can be listed as 1. Comparative studies • Prospective studies Randomized, controlled trials (RCTs) – assignment to therapy is under the control of the investigator Cohort study – two matched study groups (cohorts) are assem-
THERAPY: ANECDOTE, EXPERIENCE, OR EVIDENCE?
25
bled and followed. Because the patient self-selects the treatment, the assignment to therapy is not under the control of the investigator. • Retrospective studies Case-control study – similar in design to the cohort study except that the outcome was present at the time the study began 2. Descriptive studies • Case series • Case report The hierarchy of evidence gives the reader a primer to use when comparing conflicting evidence. The RCT study is the standard for questions regarding therapy. Because the study is prospective, and the therapy is under the command of the investigator, an RTC minimizes the bias inherent in other designs. The control, usually the standard of care, allows the reader to make direct comparisons; hence, this design provides the best evidence. Understanding the terminology is more important than simply recognizing the terminology. Randomization, however, cannot make up for a poorly planned and implemented study. Just because the design is an RCT does not relieve the reader from the responsibility of examining the methodology. Randomization is effective only when the study population is of compelling size. The appropriate study population size in a clinical trial varies for different questions. Determining the size of the study population requires a knowledgeable best guess by the researcher and consultation with a statistician to determine the power of the study. Randomized, controlled trials are not always possible because of cost, time, or ethics. For example, it would not be ethical, in performing a study on the hazards of smoking, to randomize a cohort to a regimen that forced a participant to smoke two packs of cigarettes a day to see if there was a harmful result. Rather, matched cohort studies, although not ideal, are accepted as the norm to answer a question of harm.4 Sackett et al have concluded that ‘‘Evidence-based medicine is not restricted to randomized trials and meta-analyses. It involves tracking down the best external evidence (from systematic reviews when they exist; otherwise from primary studies) with which to answer our clinical questions.’’7 The determination of an acceptable control refers back to the question the study is trying to answer. In a study to determine the efficacy of a new drug, a placebo could be an acceptable control. The pharmaceutic industry, however, is concerned about the extent of the placebo effect, which can be as high as 30%. A current trend is to have a three-group (instead of the typical two-group) RCT. One group receives the new drug, one group receives the placebo, and the third group (the control)
26
GOLDSTEIN & PRESTON
receives no treatment. The difference between the no-treatment group and the placebo group is the placebo effect, whereas the difference between the placebo group and the treatment group is called the therapeutic effect.2 In therapy trials, the most useful control for the clinician is the current standard of care. The usefulness of a study on a new headache medication is enhanced if the control is aspirin, ibuprofen, or acetaminophen rather than a placebo, because most clinicians would not prescribe or take a placebo for a headache. Unfortunately, most dental therapy articles are descriptive rather than comparative. In the typical case study or case series, practitioners evaluate their own work. Despite the integrity of the clinician, the study cannot have the same validity as one in which an independent, blinded observer assesses the outcome. Another problem of descriptive studies is that authors sometimes want to project their data beyond the scope of their project. For example, it is easy when doing a case series on implant product X to compare it with another case series done on implant product Y. This comparison is dangerous, because the two studies had different populations, in different settings, receiving different therapy from different investigators. The groups are almost always dissimilar, and treatment regimens are almost always different. Although such a comparison is acceptable in the discussion section, it should never appear in the conclusions. Conclusions can report only the results of the present study in answer to the initial question or hypothesis. Journals and authors usually express results in positive numbers, a practice which can be misleading to the clinician. For example, an 85% success rate might sound impressive, but viewing the same results as a negative (a 15% failure rate) may have more impact on the decision making process.3 An example is a recent article in the Journal of the American Dental Association evaluating Class V restorations with and without mechanical retention.5 The authors claimed that ‘‘restoration of Class V lesions without using mechanical retention could be expected to succeed in seven of 10 restorations over a three year period,’’ but clinicians must determine if a 30% failure rate after 3 years is an acceptable result in their practices. Results are also presented in terms of statistical significance, and unfortunately statistical significance does not always relate to clinical significance. For example, an investigator may use an extremely accurate measurement device which can report attachment loss around teeth in tenths of a millimeter. After 2 years of treatment with drug X, the study shows a statistically significant attachment loss of 0.01 mm when compared with scaling and root planning. But are the results clinically significant, especially if the drug therapy causes an after-effect? The clinical relevance of a statistically significant finding is best determined by the clinicians reading the report and determining if the results are applicable to their patients. The clinician must also understand the difference between a biologic response and a clinical response. A new mouthwash may demonstrate the ability to kill more bacteria or viruses (a biologic response), which
THERAPY: ANECDOTE, EXPERIENCE, OR EVIDENCE?
27
may have no clinical relevance. If, however, a clinical response such as a decrease in periodontal disease, caries, or malodor can be demonstrated with the use of the mouthwash, the data have relevance for the clinician. Another example is the family of glass ionomer restorations and cements. Fluoride release (a biologic response) is meaningless to the practitioner unless a documented decrease in caries (a clinical response) can be demonstrated. When confronted with evidence that conflicts with the current standard of care, an experienced clinician can be biased in evaluating data. ‘‘I’ve been doing this for years and it works in my hands,’’ may not be an acceptable excuse to disregard compelling studies. The definition of success is controversial, and consensus is often difficult to achieve. How long should a restoration last? If it is still in place, but staining compromises the esthetics, is the procedure a success or failure? How long should an implant last? If there is a loss of osseointegration along one wall of the implant, but the fixture is rigid and there is no pain, is the implant a success or a failure? What is an acceptable success rate for molar endodontics? If there is a small periapical radiolucency, but the patient has no pain, is it a success or failure? If the patient has intermittent recurrent pain, but the radiograph demonstrates a perfect fill, is it a success or a failure? Success is also tempered by the cause of the failure. Recurrent decay or periodontal problems that compromise a full-coverage restoration and are caused by poor home care are different from the same problems caused by defective margins. A patient fracturing a restoration by biting into an olive pit is different from a failure caused by an overlooked occlusal prematurity. Also, changes in the clinician’s advice to patients can cause embarrassing moments if the dentist is not willing to admit that current good research has caused a change in thinking. Should a clinician restore a patient’s lost molars? Patients need teeth to masticate, to phonate, and for esthetics. If the lost molars are not in the esthetic zone, the patient has no problem eating or speaking, and extrusion of the opposing dentition has not occurred or is not a concern (the opposing molars are also missing) why restore? This argument has intensified with an article by Witter et al9 in the Journal of Dental Research that showed 9-year prospective evidence questioning the rationale for restoring the missing molars, and the controversy will persist as researchers supply more data. SUMMARY In dentistry, most changes in therapy come from new techniques and products that are introduced to the market. Clinicians (and patients) can be overwhelmed by advertisements and marketing, some obvious and some (e.g., paid clinical reports in non–peer-reviewed journals) not so obvious. Because most advances are made with small case studies, which are at a lower level of evidence, it is imperative that data clinicians
28
GOLDSTEIN & PRESTON
read or see have the greatest validity possible. This validity is imperative to achieve evidence-based dentistry that uses relevant, high-quality, clinically oriented research that provides better information for the clinician and better treatment for the patient. References 1. Brunette DM: Critical Thinking. Understanding and Evaluating Dental Research. Chicago, Quintessence Publishing Co, 1996 2. Chambers D: The big placebo. J Am Coll Dent 66:2–5, 1999 3. Goldstein GR, Preston JD: How to evaluate an article about therapy. J Prosthet Dent 83:599–603, 2000 4. Jacob RF, Lloyd PM: How to evaluate a dental article about harm. J Prosthet Dent 84:8–16, 2000 5. McCoy RB, Anderson MH, Lepe X, et al: Clinical sucess of class V composite resin restorations without mechanical retention. J Am Dent Assoc 129:593–599, 1998 6. Rosenstiel SF, Land MF, Crispin BJ: Dental luting agents: A review of the current literature. J Prosthet Dent 80:280–301, 1998 7. Sackett D, Richardson WS, Rosenberg W, et al: Evidence-based Medicine: How to Practice and teach EBM. New York, Churchill Livingstone; 1997 8. Struder S, Lehner C, Brodbeck U, et al: Short-term results of IPS-Empress onlays and inlays. Journal of Prosthodontics 5:277–287, 1996 9. Witter DJ, Creugers NHJ, Kreulen CM, et al: Occlusal stability in shortened dental arches. J Dent Res 80:432–436, 2001 Address reprint requests to Gary R. Goldstein, DDS Division of Restorative and Prosthodontic Sciences New York University College of Dentistry 345 East 24th Street Clinic 5W New York, NY 10010–4086 e-mail:
[email protected]
0011–8532/02 $15.00 .00
EVIDENCE BASED DENTISTRY
THE ETHICS OF EXPERIMENTING IN DENTAL PRACTICE David W. Chambers, EdM, MBA, PhD
This article is not about the experiments dental researchers conduct in laboratories or controlled clinical trails. It is about the far more common experiments dentists conduct in their offices—for example, the first time a new procedure is performed following a continuing education course, using a material ordered as a sample, performing endodontics on a molar more complex than any attempted in recent years, proceeding with a large case in which several alternatives look equally attractive. There is a very simple and well-known rule of ethics for performing procedures in which there is some attendant risk: Primum non nocere— above all, cause no harm. This injunction is often attributed to the Hippocratic Oath, and it has become famous among malpractice attorneys and writers of editorials. The truth is that primum non nocere does not appear in the Hippocratic Oath, and it is doubtful advice.6 It is a Latin gloss on the older Hippocratic admonition that might better be translated, ‘‘You have been given great power as a doctor; use it for good and not for evil.’’ It would be unwise to make avoiding harm the ultimate standard for a care provider. The only certain way to assure avoiding harm would be to avoid undertaking treatment altogether. Attempting to do good for patients is attendant with risk. This article addresses the problem of treating patients in an ethical fashion when there is no way of guaranteeing success. Such situations are common and unavoidable in dental practice.
From the Office of Academic Affairs and Scholarship, School of Dentistry, University of the Pacific, San Francisco, California
DENTAL CLINICS OF NORTH AMERICA VOLUME 46 • NUMBER 1 • JANUARY 2002
29
30
CHAMBERS
GENERAL APPROACHES TO ETHICS The recent interest in ethics in medicine and dentistry reflects the growing range of choices in the professions. One hundred years ago when dentists primarily treated pain caused by advanced caries, fast forceps were the measure of quality. As dentists began to understand caries and periodontal diseases, diagnostic acumen assumed importance, and a range of treatment skills was required. Still, the number of procedures available per condition was small, and patients were both unaware of alternatives and usually quite willing to follow the judgment of the dentist. Today, patients visit the dentist in the complete absence of symptoms for preventive reasons and to seek cosmetic enhancements. They often bring their own opinions with them. Disease entities have expanded to include malocclusions, temporomandibular joint considerations, and oral cancers, and the options for treating even the most basic of conditions—caries—have become bewilderingly vast. Once a condition needing intervention is identified, there are frequently many choices of methods and materials for treatment. Industry and continuing education speakers pressure dentists to consider the merits of the alternatives they favor. As choices multiply, the opportunities for making right and wrong choices expand. The profession has recognized this situation and has turned to the field of ethics for guidance. The basic texts in dental ethics are those by Ozar and Sokol12 and Rule and Veatch.15 An organization known as PEDNET—Professional Ethics in Dentistry Network—is devoted to promoting awareness and discussion of dental ethics, and its members welcome contact at
[email protected]. Dental schools across the country are adding courses in ethics to their curricula. A national Alliance for Oral Health has been created, embracing 61 organizations involved in health care such as the American Dental Association (ADA), insurers, specialty groups, the military, public health groups, allied dental health professionals, examiners, schools, and so forth. The American College of Dentists, long concerned with ethics and professionalism, has an excellent, small handbook (available at www.facd.org). The winter, 1996, issue of the Journal of the American College of Dentists contrasts multiple approaches to ethical analysis of a single case involving managed care.10 Of the many approaches to ethics, the most basic is grounded on ethical principles. Principles animated the revision of the ADA Code of Ethics and Professional Conduct completed in 1998. In this approach, a set of ethical principles (shown in Table 1) is used as a touchstone for reflection and conduct. Obtaining informed consent from patients, for example, is appropriate based on the principle of autonomy—the patients’ right to decide what is to be done with their bodies. The principle of veracity can be cited as reason for explaining procedures and their consequences in clear, understandable language. Such principles offer general guidance, although conflicts can arise among the principles. For example, a patient may want veneers when what is needed is periodontal therapy. Autonomy and beneficence clash in this case. The issue
THE ETHICS OF EXPERIMENTING IN DENTAL PRACTICE
31
Table 1. COMMON ETHICAL PRINCIPLES Principle Autonomy Beneficence Competence Integrity Justice Nonmaleficence Veracity
Definition The right of the patient, the dentist, and any other competent individual who is involved to determine what should be done by and to them An obligation to help others, normally assumed in exchange for privileges granted a group such as professionals The capacity to perform as one promises or as expected Consistency throughout one’s actions and language; being guided by core values Fairness in the distribution of rewards and obligations and in the processes by which distribution is made; sometimes tested by a willingness to trade places with others one deals with Avoiding unnecessary harm to others Telling the truth and creating environments where honest views are expressed
addressed in this article—experimenting in dental practice—can be framed as a conflict between beneficence (helping the patient and other patients in the future) and nonmaleficence (not harming the patient). Some dental ethicists are pushing beyond the principles approach. Their work is prompted by questions such as, ‘‘How does a person recognize when he or she is dealing with an ethical issue?’’, ‘‘What happens when principles are in conflict?’’, and ‘‘Shouldn’t ethics lead to right action as well as right thought?’’ Murial Bebeau has applied the work of Rest and Narvaez14 to dentistry in proposing an approach to ethical issues in terms of moral sensitivity, moral reasoning and judgment, moral motivation and commitment, and (at the highest level) moral character and competence. Ozar and Sokol12 and Rule and Veatch15 have worked though many cases in dentistry, offering some thoughts on how competing claims can be addressed and which values take precedence. Bruce Peltier has written about the difficulties of taking ethical action and has offered practical advice.13 A Discursive Approach to Ethics The discursive approach to ethics builds on the traditional methods presented previously.4 This approach sets a context that places greater emphasis on people than on principles, and it favors ethical behavior over reflection. Attention is paid to how language is used to create ethical communities. Dentistry takes place in a social context.8 There is an understanding on the patients’ part that dentists are well trained, perform only those procedures they have high confidence will be successful, value the patients’ welfare and their own reputation, are part of a network of professionals available for backup, and will not take advantage of patients by performing unnecessary work or charging more than is fair.
32
CHAMBERS
Patients also realize that they are expected to be present and prompt for appointments, to pay their bills, to answer honestly when asked about their health, and to comply with reasonable requests for home care and postoperative recommendations. This general therapeutic alliance is understood by reasonable adults. It is the background for the jury system, and it makes health care possible and efficient. No book contains these rules, and they are normally discussed only when something unexpected happens. Patients participating in insurance fraud or dentists who performs unnecessary work generally understand that they are acting outside the normal bounds of right and wrong. In other cases, the therapeutic alliance is ambiguous. The patient knows a damaged tooth must be fixed. But there are choices: considerations of function, appearance, and cost must be understood and weighed. Or the patient may be uncertain whether to remain with the current dentist. The hours are inconvenient, the staff may not show respect, and the dentist is abstemious with explanations. Again, an understanding must be reached. These are not cases of universal expectations that form a treatment alliance. They represent alternatives in a range of variation that contains individuality. Some dentists are known to be expensive or to focus on esthetics. Others are known to take a holistic approach. Some patients have personal traits that make them difficult to deal with; others require an inordinate amount of attention. As long as the office team and the patient can come to an understanding about what is mutually acceptable, the treatment alliance can be preserved across a wide range of individual variation. Of course, there is a limit to individual agreements that exceed public acceptability. Dentists cannot perform medicine even if the patient agrees to medical procedures, and insurance fraud is unacceptable, even with patients’ collusion. Discursive ethicists are concerned with ethical communities and agreements that promote civil good. Making and keeping promises is central to a discursive view of ethics.7 A definition that is used in this article is Ethics is the creation, adjustment, and maintenance of communities in which participants can reach their potentials. Several aspects of this definition go beyond the traditional concept of ethics. First, ethics is a community activity; it concerns the relationships among people. There are no private ethics. Ethics is something people do together. Second, ethical understandings are created. This is different from some traditional notions that there are abstract ethical principles that must be discovered or with which all people would agree. Discursive ethics is not ethical relativism; some actions such as lying, murder, and seeking to avoid the penalties of violating agreements are universally abhorred. The general treatment alliance mentioned previously contains such examples. Discursive ethics also recognizes that there can be ethical violations within specific communities. A husband can cheat on his wife in ways that might not bother other couples. A dentist can violate the confidence of a patient without violating the ADA Code or any generally accepted set of ethical rules. Third, discursive ethics is concerned with the obligation to create ethical communities and
THE ETHICS OF EXPERIMENTING IN DENTAL PRACTICE
33
to adjust them when necessary, as well as with avoiding breaches of established codes. Creating systems that put people in ethical jeopardy is as wrong as violating the rules of such a system. Some dentists have argued, for example, that the conditions of some reimbursement mechanisms are unethical. (They are probably wrong, however, in pleading that it is ethical to violate these conditions if they have voluntarily agreed to a contract that contains them.) Discursive ethics uses all the methods of traditional ethical theories to create ethical communities. Ethics is often defined as the study of right and wrong, and some ethical theories seem to accept that distinguishing right from wrong is the entirety of the ethical problem. Other theories use the determination of right and wrong as a step in the ethical process. In traditional ethical theory, judgments of right and wrong are often made by third parties. In discursive ethics, however, the number of categories is broader than the right/wrong dichotomy, judgment plays a smaller role, and the perspective is entirely from within the community. It may be too crude to categorize people or actions as only ethical and unethical. Some people are ethically insensitive. They just do not understand ethical issues; they are surprised when others call ethical lapses to their attention. They do not pay as close attention to what is expected as others would like. Some people are ethically awkward. They try to do good, but they are unskilled. A colleague once described a situation in which the dentist prescribed narcotics for the same patient four times in a single day. He said he knew he was doing wrong but he just could not be assertive with this particular patient. A third category is ethical abuse. Ethical abuse is more than breaking the rules. Abusers want the rules to remain in place precisely so they can take advantage of others who follow the values of the community. Scam artists take advantage of the expectation that trust will be part of relationships. Insurance frauds defend the insurance system. Patients who fail to honor their financial obligations steadfastly profess a relationship with the dentist. Ethical abusers want the benefits of participation in an ethical community without the obligations of such participation. (Civil disobedience, by contrast, is a willingness to step outside a community whose ethics the conscientious objector finds offensive. It is an open disobeying of the community’s norms.) The response to ethical insensitivity or awkwardness is normally to increase group concern and to try to help the individual. In the case of abuse, the community distances the person from the group to preserve the group. Dentists with addiction problems and those with poor clinical judgment or skill receive remedial treatment or training. Those who refuse remediation or engage in purposeful deception lose the privileges of dental practice. Those who embarrass the profession are shunned. The Ethics Test Dentists are in partnership with three ethical communities. The first partnership is with each individual patient. Dentists operate within the
34
CHAMBERS
general treatment alliance, as modified by individual circumstances. The second relationship is with the profession. It is inherent in professionalism that the acts of individual members affect the reputation of all colleagues, and the reputation of the profession is an asset available for use by the individual practitioner. Regardless of participation in organized dentistry, any who call themselves dentists are part of the community, precisely because patients and the public see it this way. The third relationship is with the public at large. Customs in a community, laws, and general civil expectations apply in all cases. Being aware of the three communities and the mutual ethical expectations placed on all members of these communities is useful in creating the ethics test. It is helpful to know when one is in an ethical situation. Academics can always create a hypothetical context that would make a particular act of a dentist an ethical issue, but dentists need a more practical way of identifying, from an internal perspective, situations in which the community is suffering from tension and abuse. If the test is to be useful, it must work from the point of view of those in the community. Here is the guideline: An ethical situation exists whenever members of the community are compromised in their potentials. If the dentist makes money by overtreating or undertreating or mistreating a patient, it is an ethical situation. If an associate receives less compensation than promised or a poorer mix of patients than promised, it is an ethical issue. If a group of patients has less access to care than contracted for in their insurance coverage or care that is limited, it is an ethical issue. From the discursive perspective, it is possible to fashion an ethics test. The test is oriented to the communities involved and not toward abstract principles or personal feelings of right or wrong. The test has two parts: • If you believe members of the community (patients, colleagues, or society generally) would be offended or outraged by an action on your part provided that they knew all the relevant details—do not do it! • If you believe members of the community would be concerned by an action on your part provided they knew all the relevant details—discuss it with them. Notice that both parts of the rule directly connect the ethical community to actions. The admonition, ‘‘Don’t do anything that would outrage those with whom you have a relationship,’’ is obvious. The injunction to discuss actions that might be of concern is more novel. It speaks directly of ethics being the creation and adjustment of communities. Talking about ethical concerns goes to the point of clarifying and renegotiating relationships. One of the conditions for membership in a group is giving others the right to withdraw from the relationship if one intends to change it. The principle of autonomy is important in this concept. Veracity, another ethical principle, is also important. When discussing an ethical concern one must be honest—as one certainly expects of others in the community. Informed consent is largely a process
THE ETHICS OF EXPERIMENTING IN DENTAL PRACTICE
35
of establishing and adjusting mutual expectations in an ethical community limited to the dentist and patient in a specific situation. The concept can be generalized.
EXPERIMENTS IN DENTAL PRACTICE Dental practice makes use of science in several ways. Fundamental principles are learned in dental school and updated through reading, discussions with friends, and continuing education. Manufacturers also provide information of varying degrees of accuracy and usefulness. By far the most common way dentists learn is through observing the outcomes of their work in their own practices on their patients in their own hands.9 This information is potentially of great value; whether it does in fact improve practice depends on how each practitioner responds. A common understanding of the word experiment is a carefully designed and controlled attempt to reveal truth in a research context. In his classic The Reflective Practitioner,16 however, Donald Schon shows that there are other common uses of the term. An ethical issue is involved in the translation of research findings into practice. Ethical issues are also involved in the experiments that are conducted on a regular basis in practice. Most dental experiments involving patients are performed in offices by dentists who are not trained as researchers and normally do not think of themselves as experimenting. Experimenting is what takes place, however, when a dentist performs his or her first bonding case or first posterior composite. It is an experiment when the dentist says ‘‘Let’s keep a watch on that tooth.’’ The first injection in dental school or the first endodontics case falls into the same category. The dental profession even experiments on a wholesale basis in initial licensure examinations when unlicensed dentists perform independent care on patients with a national success rate approximating 80% (one in five state board experiments fails11). An experiment is any planned and purposeful action where the results can be observed and the outcomes contain risk. Table 2 shows several categories of experiments. Two of these are discussed along with the rules of ethical experimentation in practice, and the final two are then considered briefly.
Scientific Investigation There may be a reluctance to accept the idea that practitioners perform experiments in their practices because of the dominant concept of experimentation that comes from science. The characteristics of strict experimental design, randomized control groups, precisely defined parameters, and sophisticated statistical analyses are not possible in dental practice. Dentists who are interested in this type of experimentation
36
CHAMBERS
Table 2. TAXONOMY OF EXPERIMENTS Type Scientific investigation
Experimental practice
Heroic measures Doing nothing
Characteristics Extreme uncertainty regarding outcomes, rigorous control, nonpractice context, purpose to discover general principles, results in publications High probability of success, careful observation rather than control, realistic settings, purpose to discover more effective methods, results in improved practice High probability of failure, little control, all else has failed Unknown outcomes, no control, changes in practice unrelated to outcomes
normally associate themselves with universities or other research programs. Experimental Practice Experimenting in practice is more common than it might sound. It occurs regularly following continuing education programs, reading the literature, or talking with colleagues. A visit from a supplier or to the annual convention is another stimulus. Any new class of procedures is an experiment. There is a common misconception that the ADA seal of approval, publications in peer-reviewed journals, Food and Drug Administration (FDA) endorsement, and other scientific validation protects a practitioner from experimenting. Unproven products, materials, procedures, and equipment are only one source of risk in therapy. Another source contributing to risk is the dentist. There is risk in a technique when it is tried for the first time, regardless of how much scientific research has been conducted or how many other dentists have used the technique successfully. The third major source of risk comes from the patient. To the extent that the patient in the chair is exactly the same as the average patient in the research studies, the risk is reduced, but it is never eliminated. Even a generally established procedure performed by an experienced practitioner can present risk if the patient has unusual conditions, systemic complications, or idiopathic expectations. Of course, there are also interactions among the three primary categories of risk—between therapy and dentist, therapy and patient, and patient and dentist, and the interaction of all three factors. Previous success involving any one or two of the categories of risk does not eliminate risk in the others. A dentist who fails in treatment using a product well-tested in the literature is not immune from questioning about whether he or she was properly trained and experienced in the use of the product or whether the use of the product was appro-
THE ETHICS OF EXPERIMENTING IN DENTAL PRACTICE
37
priate in the particular circumstance. The recent concern over peerreviewed literature is in many ways unfortunate. It creates an impression that only the product or procedure risk matters. The proliferation of journals that focus on products and procedures and the small number devoted to differences among dentists or among patients creates a misperception that therapy is the major or even the only important source of experimental risk in practice. THE ETHICS OF PRACTICE EXPERIMENTS The fundamental rule for experimentation in practice is if your patients or colleagues would be shocked to learn that you had tried the treatment, do not do it; if they would be concerned, discuss it with them; if there would be no concern, proceed. Discussing treatments one uses with patients is a matter of informed consent. Discussions with colleagues are often informal, such as case discussions at component society meetings, but they could be formalized as literature searches or seeking the advice of known experts. An experiment is not necessarily a failure because it does not go as planned; it is always a failure when it should not have been attempted in the first place. A motorcyclist who weaves between lanes of automobile traffic may sustain injury or worse because he or she is a poor rider or because an automobile drivers makes an unexpected maneuver. The risk lies not so much the cyclist’s skill as the poor judgment in being between the cars. Discursive ethics is concerned with creating ethical circumstances as well as with acting ethically. There are four ethical standards for experimentation in practice: 1. 2. 3. 4.
The action is undertaken for improving patient oral health. The action is within standard of care. There is a probable expectation of success based on evidence. The action is performed reflectively, systematically, and with measured outcomes.
Patients’ Interests First The patient’s interests must always be the primary concern, and the reasons for experimentation must always be to improve patient oral health. Placing patients at risk in hopes of finding a faster or more profitable way of delivering care is unethical. It is true that all three parties (dentist, profession, and patient) are at risk in most practice experiments, but patients cannot be co-opted into endeavors in which they bear risk for the sake of other’s potential gain. It is insufficient to argue that patients tacitly agree to general experimentation by agreeing to care. (Treatment in dental schools is a possible exception to the rule.) A special challenge to the principle of patients first involves the
38
CHAMBERS
difference between the interests of patients individually and collectively. Can an individual patient be expected to bear the risk for improvements that will benefit patients generally? This problem is handled in research by informing patients that they are participating in an experiment, that they may receive either a standard treatment or an experimental one, and the expected outcomes of each. In such circumstances, patients must consent to participate in a set of therapies that include uncertain alternatives. As a general practice, informed consent is vital when attempting a novel treatment. Consent has the following advantages: (1) it forces the dentist to think through what is being done in a rigorous fashion; (2) it offers some legal protection; and (3) it clarifies exactly what is in the patient’s interests. Sometimes dentists undertake heroic or innovative treatments on the assumption that patients would prefer these courses of action. (Certainly, dentists would prefer the successful outcomes if the odds were not an issue.) Sometimes, a conversation with the patient about the risks involved reveals that the risks are acceptable but the proposed outcome is not what the patient prefers. Certainly, honest, informed consent serves as a check that the innovative treatment is being done for the patient’s benefit and not the dentist’s. If the dentist must disclose that a novel treatment is being undertaken primarily for his or her benefit, the ethical rule ‘‘if there is a concern, discuss it with those involved’’ will preserve the dentist’s integrity (or the dentist will lie, most often through incomplete disclosure). Standard of Care The second criterion for ethical experimentation is grounded in the standard of care. The standard of care is a legal concept and one that is rather fuzzy at the edges—precisely where office experimentation is involved. In an important sense, the standard of care is an operational form of the ethical rule ‘‘if one’s colleagues would be shocked at what was done, do not do it.’’ The normal form of the argument in the standard of care is that a particular example of therapy for a given patient and performed by a dentist of certain qualifications falls into a class that other practitioners would accept. LaForte resections are reserved for specialists, often those with specific training. Surgical extractions can be done by general dentists, but there will be some question about what other surgical experience the practitioner has and what protocols were followed. The standard of care does allow for experimentation, but what constitutes acceptable innovation is subject to review by the standard of what one’s professional peers are doing. Grounds for Expecting Success Third, there must be probable reason to expect success with the new product or procedure or patient. This baseline of probable success can
THE ETHICS OF EXPERIMENTING IN DENTAL PRACTICE
39
be established by studying its scientific basis, in conversations with people who have first-hand knowledge and experience, or through the dentist’s own experience with similar situations. In a highly abstract sense, every treatment is a novel application of product and process, dentist experience, and patient characteristics. Practically, each case is an example from a class of similar factors. With extensive experience with similar products or procedures, with dental experience in similar cases, and familiarity with given categories of patients, the risk goes down. There are no sharp categories regarding grounded experimentation. The burden of proof increases rather sharply, however, when the dentist has to answer that he or she has never used this therapy or any like it, has little or no experience in such treatment, or has never done such work on this type of patient. Before trying something new, dentists must ask themselves, ‘‘On what grounds am I willing to justify taking this risk?’’
Systematic Approach The final criterion dictates that unusual treatments require unusual care in their execution. Experimentation cannot be capricious. Dentists are expected to reflect on alternatives and their benefits and risks and to share the results of their reflections. The treatment also must be delivered in a careful fashion, and the results must be recorded. It is valuable in some cases to prepare a written protocol for innovative treatments. At an absolute minimum, the reasons for performing experimental work must be entered in the chart. Recording the outcomes of experimental procedures is critical. There is much to be gained from recording outcomes on a routine basis for all treatment, but experimental procedures are a special case. When exposing patients, one’s self, and the profession to risk, it is imperative to learn as much from the experience as possible. Recording outcomes is necessary to reduce the exposure of further patients and others to similar risk. If a treatment seems reasonable based on the patient’s interests, standard of care, and available evidence but results differ from expectations, the dentist will need to have good information about the outcomes. Saying that, ‘‘It just didn’t turn out as planned,’’ or, ‘‘We’ll have to do more such experiments to clarify the situation,’’ are signals of ethical jeopardy. The preceding discussion has focused on office experiments that realistically have a high probability of success. The experiment is ethical, provided that it meets the criteria of aiming to improve patient care within the standard of care, is based on treatment that is known to have a reasonable basis for successful outcomes, and is undertaken in a reflective fashion. When some of the criteria approach the borderline, honest communication with the patient will resolve the matter. If any criteria are not met, office experimentation is unwise. Patients cannot consent to risks others would regard as foolish.
40
CHAMBERS
Heroic Experiments Heroic experiments are high risk. Although they may be undertaken in the patient’s best interests, they normally fail two other tests: being within the standard of care and having evidence of probable success. Normally, heroic efforts are considered only when there is no other valid alternative. Professional groups and the public at large normally frown on such interventions because they expose both the individual patient and the system for deciding what is appropriate behavior to risk. Dentists who may be attracted to such interventions are well counseled to investigate the standard of care carefully. The fundamental justification for heroic effort is that all other conventional alternatives have been exhausted and that great risks are justified to protect the patient from grave harm. There are presumed trade-offs between the criterion for evidence of probable success and the criterion for improving the patient’s well being. For such trade-offs to be considered valid, there is a greatly heightened requirement for informed consent. The patient’s true interests must be carefully explored, and there must be overwhelming evidence that the patient understands the risks associated with various outcomes (including no treatment) and that the patient has made a completely uncoerced decision. The criteria are written in capital letters when cases of experimentation in the dental office deviate from standard circumstances. There may also be cases in which the patient agrees to heroic treatment that would shock the profession or the public. A private agreement between the patient and the dentist—for example, to practice outside legal limits—is still unethical because there are communities to consider other than the patient. The Invisible Experiment Doing nothing is quite literally impossible. Sins of omission are still sins, as anyone who has been sued for failure to diagnosis periodontal disease will verify. Doing nothing in the context of this article means adopting a hyperconservative approach and seeking to avoid experimentation in the office by doing only what has been done successfully in the past. As long as patients do not change, as long as their expectations remain unaffected by media or reimbursement plans, and as long as no other dentists innovate, this is a sound strategy. Professionals, however, have an ethical responsibility to their colleagues to practice to an evolving standard of care. Technically speaking, a dentist should reveal as part of informed consent that therapies being offered are behind the times or that a definitive diagnosis is not being made because of outdated knowledge. READING THE LITERATURE This article has explored the ethics of experimentation in dental practice. There is also a well-developed literature on the ethics of re-
THE ETHICS OF EXPERIMENTING IN DENTAL PRACTICE
41
search.17 An area between these two raises some interesting ethical questions. What is the right or wrong in moving knowledge from the scientific literature to the office practice? As much as practitioners might wish it were otherwise, responsibility for using the scientific literature in dentistry rests almost entirely with the dentist. Certainly, there is bad science, and some of it is published in peer-reviewed journals or other sources that attempt to present themselves as authoritative. The ADA and the FDA perform a valued service in establishing standards for products and materials, but many products do not seek this approval, including some effective products that fall outside the FDA’s mandate. There are also some sound products whose developers choose not to list with the ADA because of the length of time required for approval or the restrictions on advertising that the ADA places on products. Further, these organizations review only products, materials, and devices that make therapeutic and some cosmetic claims; supplements, for example, fall outside their purview. When a clinically proven product fails to perform in a particular dentist’s hands, manufacturers reflexively argue that the failure results from the dentist’s technique. Even peer review is not a sufficient standard. In 1998, the Journal of the American Medical Association published an entire issue on the medical literature. Included in the publication were a number of papers that examined the uses and impact of peer review. In several respected medical journals, the agreement among reviews was low, and there were even cases in which, over the entire period studied, the consistency between peer reviewers and the decision to publish was negative—the higher the rating by reviewers, the less likely the manuscript was to be published.3 The situation in dentistry is unknown. The only dental journal that annually publishes the acceptance rate of manuscripts and the concordance between reviewers and decision to publish is the Journal of the American College of Dentists. The rate of concordance in that journal is moderately high, between 0.60 and 0.80. The credibility of published research findings cannot be assured even by the best external reviewers. Three problems cannot be resolved through the review process: (1) internal versus external validity, (2) generalizability, and (3) the baseline problem. Because the individual dentist cannot transfer responsibility for any of these problems to the research or the journalistic communities, the practitioner must exercise ethical practices in these areas as well. In fact, the solution to this problem has already been addressed—dentists must perform reasonable experiments in their own practices using the ethical standards discussed previously. Internal versus External Validity Steady advances in the theory and practice of experimental design and hypothesis testing have brought both basic science and clinical
42
CHAMBERS
dental research to a high level of sophistication. The standards for judging the scientific rigor of research are well understood and are fairly consistently applied by reviewers. The problem is inherent in the theory of research design itself.2 The rigor that has been developed is largely in the area known as internal validity. Controls, placebos, cross-overs, statistical tests, and so forth all work to increase the likelihood of valid conclusions in the context in which the research was conducted. A welldesigned study of patients in a nursing home tells about that nursing home; a clinical trial of a new material conducted at a university applies to that university. Scientific rigor is important, and reviewers are customarily sensitive to the fine points of experimental design. External validity—accuracy in general circumstances such as various dental practices—requires high internal validity in the research, but internal validity does not guarantee external validity. Generalizability External validity is commonly discussed under the heading of generalizability,1 that is, whether the results of a clinical trial on a certain product in specific conditions can be generalized to other settings, particularly to the office of the dentist who is reading the study and may wish to use the product. Generalizability is a gradient. The more similar the study conditions described in the literature are to the office where the results will be applied, the greater the external validity and the less likely the practitioner will be surprised. External validity, however, will always be lower in an application than in the study on which the application is based. An appropriate analogy is shipping cookies across country: sometimes they arrive only slightly damaged and stale, but they never improve during the trip. Responsibility for estimating generalizability of research results does not rest with the research community; it rests with individual practitioners. There is no way for the researcher to know all of the circumstances in which results might be applied. Only the individual dentist knows the difference between his or her practice and the circumstances described in the literature. In this sense, all dental research consists of two experiments—one conducted by the researcher, and another conducted by the dentist. The dentist is responsible for the second experiment, and the ethical nature of the second experiment should follow the rules already developed. The Baseline Problem There is much discussion today regarding evidence based dentistry. Although the term has been used to describe a variety of activities, the basic approach seems to be a concern that dental practice be based more securely on evidence from scientific studies. Certainly, the issues of
THE ETHICS OF EXPERIMENTING IN DENTAL PRACTICE
43
internal validity and generalizability must be considered as tempering the widespread use of this approach. Another issue is also troublesome. The concept of evidence based dentistry was borrowed from medicine, and the concept may not carry over effectively to dentistry. Physicians spend a substantial amount of their practice time diagnosing a broad range of conditions, but treatment is delegated to nurses, other physicians, therapists, and even to patients using prescribed medications. Dentists diagnose a much smaller number of more conditions, and they treat those conditions themselves. Problem-solving is a smaller part of a dentist’s role than treatment, and dentists develop intimate, intuitive experience of the outcomes of treatment because of their direct involvement in it. In other words, dentists have a rich baseline understanding of patient conditions. The baseline problem is a sophisticated issue in scientific decision making.5 The most basic explanation of the baseline problem is that valid decisions are made based on what is known in a general sort of way about classes of conditions (the baseline knowledge) and on what can be found out by inquiry (the evidence). When trying to determine a value, such as pocket depth readings or the expected rate of decay observed in an incipient carious lesion, the best strategy is to combine the baseline knowledge and the evidence. Dentists do so intuitively when they shade the probing depth reading based on other probings in the area or modify their estimate of expected rate of caries advancement based on both the lesion itself and baseline factors such as the age of the patient, other evidence of caries in the mouth, and an assessment of home care. When the decision involves a course of action rather than a value estimate, a different logic applies. The rule is always go with either the baseline or with the evaluation evidence, whichever has a higher probability of being accurate. To extract or to treat endodontically, to bleach or not to bleach, to use an implant or a crown are decisions that are mutually exclusive—one action excludes the other. Most carious lesions are best treated based on the individual practitioner’s experience in the practice (baseline) rather than the literature (external evidence). The same is true, to varying degrees, for many other treatment decisions in practice. It must be remembered, however, that whether the dentist follows practice patterns or the literature in a particular case, if there is any probability for surprise, a practice experiment is being conducted, and the appropriate ethics must be observed. References 1. Brennan RL: (Mis)Conceptions about generalizability theory. Educational Measurement: Issues, Practice 19:5–10, 2000 2. Brunette DM: Critical Thinking: Understanding and Evaluating Dental Research. Carol Stream, IL, Quintessence, 1996 3. Callaham ML, Baxt WG, Waeckerle JF, et al: Reliability of editors’ subjective quality ratings of peer reviews of manuscripts. JAMA 280:229–231, 1998
44
CHAMBERS
4. Chambers DW: Looking for virtue in a virtuous society – discursive ethics and dental managed care. J Am Coll Dent 63:39–42, 1996 5. Chambers DW: The roles of evidence and the baseline in dental decision making. J Am Coll Dent 66:60–68, 1999 6. Chambers DW: Above all, check your references. J Am Coll Dent 67:2–3, 2000 7. Chambers DW: Promises. J Am Coll Dent 67:51–55, 2000 8. Chambers DW, Abrams RG: Dental Communication. Sonoma, CA, The Ohana Group, 1986 9. Chambers DW, Eng WRL Jr: Practice profile: The first twelve years. Journal of the California Dental Association 12:25–32, 1994 10. Ethics of managed care. J Am Coll Dent 63: entire issue, 1996 11. Licensure Results. News & Views [the newsletter of the American College of Dentists] 26:5, 1998 12. Ozar D, Sokol D: Dental Ethics at Chairside: Professional Principles and Practical Applications. Washington, DC, Georgetown University Press, 1999 13. Peltier B: Reflection, introspection, and communication: A psychologist’s view of dental ethics. J Am Coll Dent 67:33–38, 2000 14. Rest JR, Narvaez D: Moral Development in the Professions: Psychology and Applied Ethics. Hillsdale, NJ, Lawrence Erlbaum, 1994 15. Rule J, Veatch R: Ethical Questions in Dentistry. Chicago, Quintessence, 1993 16. Schon DA: The Reflective Practitioner: How Professionals Think in Action. New York, Basic Books 1983 17. Toward responsible research conduct: The role of scientific societies. J Dent Res 75:823– 860, 1996 Address reprint requests to David W. Chambers, EdM, MBA, PhD University of the Pacific 2155 Webster Street San Francisco, CA 94115 e-mail:
[email protected]
EVIDENCE BASED DENTISTRY
0011–8532/02 $15.00 .00
CONDUCTING A SEARCH OF THE LITERATURE David A. Felton, DDS, MS
The explosion of dental literature over the past decade in absolutely unparalleled in the history of dentistry. Never before has so much literature been in print, and never before has the clinician been expected to have current knowledge of such a variety of materials and techniques to provide the best possible dental care. It is impossible for a clinician to read everything that is published on a monthly basis or to be expert in all aspects of dentistry. Frequently, however, the clinician needs to be able to draw on published reports to recommend treatment modalities to patients without relying solely on the dentist’s own individual experience or other empiric methods. The purpose of this article is not to provide concise information on how to conduct an overview of the literature, but rather to show how to search the literature for articles that may be appropriate in answering a particular question related to patient care. A review of the methodology of conducting an overview has been published elsewhere.1–3 Perhaps the most time-consuming portion of any review of the published literature on a given subject is the actual literature search itself. Several sources should be included in a comprehensive search of the literature. These sources are traditional references (such as Index Medicus or Index to Dental Literature), peer-reviewed dental journals in print, CD-rom, and on-line electronic formats, electronic databases such as PubMed, MEDLINE, and Grateful Med, and contacts with appropriate source individuals.
From the Department of Prosthodontics, The University of North Carolina School of Dentistry, Chapel Hill, North Carolina
DENTAL CLINICS OF NORTH AMERICA VOLUME 46 • NUMBER 1 • JANUARY 2002
45
46
FELTON
TRADITIONAL REFERENCE SOURCES Unless the clinician has access to a health sciences library, the use of traditional source materials such as Index Medicus or the Index to Dental Literature may be limited. Determining under which traditional headings in the Index to Dental Literature a particular topic is listed can also be time-consuming. Each volume of these indices generally covers the topics published in a single year, and searching through the array of these indices is often daunting. When reviewing these indices, the reader is urged to begin with the most current year’s index and to work backward in time, unless the exact publication date of an article on a particular subject is known. The exception for this technique might be a search for a treatment material or method that is antiquated or no longer practiced, such as the gold foil technique or the clinical use of a particular all-ceramic crown material that is no longer manufactured. For these historical searches, review articles might be useful initial sources for the topic of interest. PEER-REVIEWED JOURNAL SOURCES Most information today is acquired from the multitude of dental and medical journals to which dentists subscribe. Most often, membership in a professional organization entitles the member to receive the organization’s designated journal on a quarterly, bi-monthly, or monthly basis. Not all these journals, however, are peer-reviewed. Peer review generally implies that a submitted manuscript is blindly reviewed by one or more experts on the general topic of the manuscript, that suggestions for improving the manuscript are returned to the authors, and that, following revisions, the manuscript is again reviewed and copy-edited for clarity before being accepted for publication. Provided that the reviewers are skilled in the precepts of evidence based dentistry and apply those precepts when reviewing the manuscript, one can generally expect that most peer-reviewed articles are accurate. Subscribing to a periodical generally entitles the member to have access to the publisher’s on-line journal source. This on-line source gives the reader access to all published manuscripts from that journal, including all back issues that have been entered into the on-line source. Generally, the suscriber logs onto the publisher’s website, enters the specific journal log-in information, selects a password, and begins the search process. Key words can often be used to search the databases and generate a list of related articles published by particular journal. After reviewing the titles, the reader can select the articles of interest; often, an abstract of the article is available for review. Finally, many publishers allow the reader to select and review the entire article on-line, including all tables and figures. One advantage of this technique is that it allows free access to the journal as long as the clinician subscribes to it (individually or through membership in a sponsoring professional organization); it also allows
CONDUCTING A SEARCH OF THE LITERATURE
47
the clinician to discard old journal issues that may be consuming valuable space in the office or home. The disadvantage of this search technique is that it allows a search for articles in only one particular journal, rather than providing a more comprehensive listing of all articles published on any given topic. This technique may prove too limiting when treatment decisions require a more comprehensive approach. Several journals typically underwrite dental conferences or symposia and often provide a CD-rom or on-line review of the conference proceedings, for an appropriate fee. This review may serve as an additional source of information for the busy clinician. Finally, several journals provide CDrom disks of their published manuscripts for persons without internet access. ELECTRONIC DATABASES One of the easiest and most cost-effective methods to search the literature is through the use of the PubMed service from the federal government. This service can be accessed on the internet at http:// www.ncbi.nlm.nih.gov/PubMed. The PubMed system was developed by the National Library of Medicine, located at the National Institutes of Health (NIH), and through the National Center for Biotechnology Information (NCBI). PubMed serves as an excellent search tool for accessing dental, medical, and biomedical literature citations and provides links to full-text journals at the web sites of participating publishers. Publishers participating in the PubMed service electronically submit their articles just before or at the time of publication. In addition to the biomedical literature, PubMed provides access and links to various databases, such as those that contain DNA and protein sequences, population-study data sets, and assemblies of complete genomes through their integrated system. For the practicing clinician, PubMed provides free access to MEDLINE and Internet Grateful Med, the bibliographic databases that serve as an excellent source for obtaining current literature citations. MEDLINE is PubMed’s premier bibliographic database covering medicine, dentistry, nursing, veterinary medicine, preclinical sciences, and the health care system. MEDLINE contains more than 11,000,000 citations dating back to the mid-1960s and contains bibliographic citations and author abstracts from more than 4000 biomedical journals published in the US and 70 foreign countries. Only about 80% of the current journals participate in the MEDLINE citation service, however, so some information may not be provided. When one accesses the MEDLINE system and types in the keyword dentistry, the system lists more than 243,500 citations of the dental literature. These citations are listed according to key words (or under Medical Subject Heading [MeSH]) provided by the authors and publishers. Internet Grateful Med provides free access to MEDLINE, AIDSLINE, AIDSDRUGS, AIDSTRIALS, BIOETHICSLINE, DIRLINE, HISTLINE, OLD-
48
FELTON
MEDLINE, POPLINE, TOXLINE, SPACELINE, SDILINE, HSRPROJ, HealthSTAR, and ChemID. Internet Grateful Med allows the use of Loansome Doc Document Delivery service, through which the entire text of published journal articles can be individually ordered. This service is conducted through a local or regional library, and a fee (which depends on the library used, but can be $8.00 or more per order) is charged for the service. If a limited number of articles is required, however, this service may be more cost-effective than ordering a journal subscription. The clinician must take this vast array of electronic information and limit it to the area of interest. Clearly, search strategies must be employed to reduce the volume to a usable size for the busy reader. The first step is to develop a logical question. As with any evidence based assessment of the literature, the hierarchy of literature categories comes into play. Thus, the use of ramdomized, controlled-clinical trials (RCT), prospective clinical trials, retrospective analysis, cross-sectional trials, and case based assessments must be appropriately selected in the key word selection. Simple additions or deletions of letters or words can make huge differences in the number of citations that MEDLINE lists. Recently, the author conducted two searches of the dental literature to determine (1) the success rates of single-tooth implant therapy, and (2) an outcome assessment of root canal therapy. Listed in Tables 1 and 2 are the search strategies employed, along with the number of citations MEDLINE provided for each search strategy. The addition of an ‘‘s’’ to one word or changing a key word from ‘‘treatment’’ to ‘‘therapy’’ had a significant effect on the number of citations listed. Similarly, the addition of RCT, prospective trial, and retrospective trial affected the outcomes for the two individual search strategies employed. Occasionally, it is necessary to use trial and error to limit the search to a manageable list. At last check, from the 243,500 dental citations, use of the key words dentistry and RCT yielded a total of 20 citations; dentistry and prospective trial yielded 48 citations; dentistry and retrospective trial yielded 38 citations; dentistry and cross-sectional studies yielded 735 citations; and dentistry and case-controlled studies yielded 20,507 citations. Thus, it can be
Table 1. RESULTS OF SEARCH STRATEGIES FOR SINGLE-TOOTH IMPLANT THERAPY Search Strategy
No. Medline Citations Listed
Single-tooth implant treatment Single-tooth implant treatment and RCT Single-tooth implant treatment and prospective trial Single-tooth implant treatment and retrospective trial Single-tooth implant therapy Single-tooth implants therapy Outcome of single-tooth implant therapy Outcomes of single-tooth implant therapy Treatment outcomes of single-tooth implant therapy
208 0 0 0 161 36 39 8 30
RCT randomized, controlled trial.
CONDUCTING A SEARCH OF THE LITERATURE
49
TABLE 2. RESULTS OF SEARCH STRATEGIES FOR OUTCOME ASSESSMENT OF ROOT CANAL THERAPY Search Strategy
No. Medline Citations Listed
Outcome of root canal therapy Outcome of root canal therapy and RCT Outcome of root canal therapy and prospective trial Outcome of root canal therapy and retrospective trial Outcomes of root canal therapy Outcome of endodontic therapy Outcomes of endodontic therapy Outcome of root canal treatment Outcomes of root canal treatment Outcome of endodontic treatment Outcomes of endodontic treatment
179 0 1 0 20 120 14 189 20 125 15
RCT randomized, controlled trial.
concluded that the field of dentistry has a long way to go to provide the clinician with an adequate amount of conclusive evidence for planning treatment interventions in patient care. Once the list of citations has been produced, each can be opened to review an abstract of the article (if available) for initial review. If a useful article is found, one can improve search results by selecting the related articles link to find others that MEDLINE has assessed. The entire text of the article can then be ordered through a link to the publisher (if available), or through the Loansome Doc Document Ordering service described previously. When the entire article has been obtained, however, it is essential that the precepts of evidence based dentistry be accurately applied to determine whether the research methods, use of control groups, appropriate sample size, and appropriate statistical tests have been suitably employed so that the article provides a valid presentation of data. Otherwise, to paraphrase the ancient Romans, ‘‘caveat lector’’ or, ‘‘Let the reader beware!’’ References 1. Felton D, Lang B: The overview: An article that interrogates the literature. J Prosthet Dent 84:17–21, 2000 2. Oxman A, Guyatt G, for The Evidence based Medicine Working Group: How to use an overview. In Sakett DL: Evidence based Medicine: Users’ Guides to the Medical Literature. Hamilton, Canada, McMaster University Health Sciences Centre, 1993 3. Oxman A, Cook D, Guyatt G, for The Evidence based Medicine Working Group: Users’ guide to the medical literature. VI. How to use an overview. JAMA 272:1367–1371, 1994 Address reprint requests to David A. Felton, DDS, MS Department of Prosthodontics Room 404 Brauer Hall, CB 7450 Chapel Hill, NC 27599–7450 e-mail:
[email protected]
EVIDENCE BASED DENTISTRY
0011–8532/02 $15.00 .00
EVIDENCE BASED DENTISTRY Design Architecture Catherine Hayes, DMD, DMSc
Dentists often need to make clinical decisions based on limited scientific evidence. To base a clinical dental practice on scientific evidence more effectively, clinicians must have the skills to evaluate the dental literature critically. In dentistry and dental education, clinical decision making is traditionally based on expert opinion. These opinions usually coincide with standard practice. Recently, however, there has been a shift to support expert opinion or standard practice with evidence. The shift toward evidence based dentistry provides an opportunity for the transfer of scientific information into clinical decision making (Fig. 1). Simply defined, evidence based dentistry focuses on scientific evidence in guiding clinical decisions. The practice of evidence based dentistry requires reviewing the results of all research relating to a particular clinical issue and assessing the validity of the findings. An additional step is to determine if the study’s results will help in caring for a particular patient or group of patients or assessing the external validity (generalizability) of the study. For example, if a particular study evaluates the effect of a specific treatment on a limited patient population, the findings may not be applicable to the practice of a particular clinician. TYPES OF RESEARCH STUDIES To evaluate research studies critically, clinicians must have a working knowledge of the principles of scientific research and an understandFrom the Department of Oral Health Policy and Epidemiology, Harvard School of Dental Medicine, Boston, Massachusetts
DENTAL CLINICS OF NORTH AMERICA VOLUME 46 • NUMBER 1 • JANUARY 2002
51
52
HAYES
Figure 1. Clinical decision making.
ing of the various types of research studies. Briefly, there are two broad categories of research: basic science and clinical research. The principles that govern the validity of scientific research are common to both branches of scientific research. It is more challenging to ensure that a study is free of bias with clinical research than with basic science or laboratory research, because in the laboratory the researcher has more control over the environment and other variables that may influence the results of the study. This article focuses on assessing the validity of clinical research studies. It is important to understand the hierarchy of evidence in clinical research. All clinical research studies are encompassed under the broad heading of epidemiologic studies. Epidemiology is defined as the study of the distribution and determinants of disease frequency in human populations.2 The distribution of disease refers to who is at risk for a particular disease. For example, older men have the highest risk for oral cancer. The determinants of disease are the factors that affect the individual’s risk of developing a disease. For example, tobacco use increases an individual’s risk for developing oral cancer and is thus considered a risk factor. A risk factor may increase an individual’s likelihood of developing a disease (as smoking increases the risk of lung cancer), or it may decrease an individual’s likelihood of developing disease (as fluoride decreases the risk of dental caries). In clinical research the aim is to quantify risk relationships as well as benefits of specific treatments to improve the health of the public. Epidemiologic studies include studies that follow the natural course of disease or treatment effects as well as studies in which the investigators intervene in assigning a treatment for a particular condition or in using a preventive agent to decrease likelihood of disease. These studies can be categorized into two broad categories: descriptive and analytical studies. Descriptive Studies Descriptive studies describe the general characteristics of the distribution of a disease, particularly in relation to person, place, and time.
EVIDENCE BASED DENTISTRY: DESIGN ARCHITECTURE
53
Descriptive studies commonly seen in the dental literature are case reports and case series studies, which are detailed reports of an individual patient (case report) or a group of patients (case series) with a particular disease or who have received a particular treatment. Case series studies abound in the dental literature. An example of a case series study is one in which investigators report on patients treated in their practice with a particular implant system. This report may be a long-term study in which the investigator reports on a variety of treatment outcomes. Although this study may provide interesting information to clinicians, it cannot demonstrate the superiority of one treatment over another without the use of an appropriate comparison group. It is impossible to know what effect a particular treatment has on these outcomes without making a comparison with another treatment. This comparison is possible only with an analytic study design, described later. Cross-sectional surveys are another type of descriptive study that report the status of an individual with respect to the presence or absence of both exposure and disease assessed at one point in time. These studies are also limited in their ability to demonstrate definitively the benefits of a particular treatment or the significance of a particular exposure. For example, a study that examined 500 individuals, including a complete oral examination, a medical examination, and an interview regarding a variety of health, dietary, and sociodemographic factors, reports on the association between oral health and diet. The investigators report that individuals with good oral health also had a healthy diet, indicating that a healthy diet contributes to adequate oral health. With a cross-sectional study it is impossible to conclude anything about causality. Adequate oral health might enable a person to consume more fruits and vegetables that constitute a healthy diet, a conclusion that is quite different from the conclusion that adequate diet results in good oral health. Essentially, in a cross-sectional study it is impossible to determine if A causes B or vice versa; this situation is analogous to the ‘‘chicken-and-egg’’ phenomenon. In summary, descriptive studies are often referred to as hypothesisgenerating studies. They are often the first step in investigating a particular scientific question. Analytic Studies Analytic studies differ from descriptive studies in that they include an appropriate comparison group that permits the testing of epidemiologic hypotheses. Causality can be investigated with analytic studies. The two broad subcategories of analytic studies are intervention and observational studies. Intervention studies or clinical trials are considered to be the gold standard for clinical research studies. Because the examiner assigns the exposure or treatment, it is often possible to blind both the subject and the examiner to the treatment assignment, creating a double-blinded
54
HAYES
study that minimizes bias of the study findings. Also, the ability to assign subjects randomly into treatment groups ensures that the only difference between study groups is the intervention being evaluated. In a randomized study, each subject has an equal likelihood of being assigned to any of the study groups, thus reducing the influence of bias. This process creates groups that are relatively similar with respect to all variables except for treatment, thus balancing the study groups in terms of known and unknown confounders. Randomization to create similar study groups is possible only with clinical trials and therefore significantly increases the validity of these studies in comparison with other clinical research study designs. Whenever possible, a clinical research question should be addressed with a double-blind, randomized, controlled clinical trial. Such a trial is not always feasible for ethical or logistic reasons, leading investigators to choose one of the other study designs. In observational studies, investigators observe the natural course of events, noting which subjects are exposed or not exposed, which have had a particular treatment and which have not, and which have or have not developed the outcome. There are two subcategories of observational studies: cohort studies and case-control studies. In a cohort study, subjects are selected on the basis of presence or absence of a particular exposure (treatment) and then followed to determine the association between the exposure (treatment) and outcome. All subjects must be free of the disease of interest at the time the exposure is defined. Cohort studies are efficient for the study of rare exposures, such as occupational exposures (e.g., to asbestos), provide the ability to examine multiple effects of a single exposure, and provide the ability to determine the temporal relationship between exposure and disease. Cohort studies also have disadvantages: they are inefficient for the study of rare diseases, they may be expensive and time consuming, and they have the potential for loss-to-follow-up bias that may affect the validity of the study.2 An example of a cohort study in dental research is following individual smokers and non-smokers to determine their risk for developing periodontal disease. The study subjects must be free of periodontal disease when the study begins. The second class of observational studies are case-control studies in which subjects are selected on the basis of whether or not they have the disease of interest. Case-control studies are efficient for studying rare diseases and diseases with long latency periods and have the ability to examine multiple causes of a single disease. The disadvantages of casecontrol studies include their inefficiency for the study of rare exposures, the difficulty in establishing a temporal relationship between exposure and disease, and their susceptibility to selection and recall bias.2 An example of a case-control study is a study examining the association between oral cancer and smoking. Oral cancer cases are compared with a similar group of individuals who do not have oral cancer to determine the difference in smoking rates between the groups. This approach was used when it was first discovered that smoking is a significant risk factor
EVIDENCE BASED DENTISTRY: DESIGN ARCHITECTURE
55
for lung cancer. It is important that cases and controls be selected from the same source population to ensure that study subjects are similar except in respect to the diagnosis of the study disease. In summary, the study design chosen to address a specific research question must take into account the nature of the exposure or treatment and the nature of the outcome as well as ethical and logistic considerations. For example, if one were studying the effect of two treatments on a particular disease, to randomize subjects ethically to one treatment or the other, there must be sufficient belief that either treatment may offer benefits to the study participant and that neither treatment poses any risk. This assurance is often not possible, and researchers therefore choose one of the other analytic approaches. It is important to decide if the disease or outcome is considered rare and thus decide which observational design is most efficient in addressing the specific questions, keeping in mind that bias and confounding are of greater concern in observational than in intervention studies. STUDY SAMPLES Clinical research is conducted using samples of subjects selected from the population of individuals who have the disease of interest. For example, if investigators are interested in evaluating a specific treatment for the replacement of missing teeth, a sample of subjects who meet the study criteria are selected. Each investigator determines a priori the inclusion and exclusion criteria and the size of the sample to be used in a particular study. For example, a study may include adults over age 40 years who have at least six missing teeth. These characteristics are referred to as inclusion criteria. Anyone who smokes, who has received antibiotic therapy within the past 6 months, or who has a history of diabetes is not eligible to participate. These characteristics are referred to as exclusion criteria. The inclusion and exclusion criteria are based on characteristics that the investigator believes, from previous research or clinical experience, may affect the results of the study. Samples are used to estimate population values, because it is not practical to measure all individuals in a population. Most of the application of statistics in medicine and epidemiology involves making inferences from samples to populations. MEASURES OF ASSOCIATION In epidemiologic studies, it is important to quantify the relationship between exposure and outcome. This quantification is accomplished by calculating a relative risk or odds ratio, values that are referred to as measures of association. Table 1 demonstrates the method of calculation. First, the relative risk is defined as the ratio of the incidence of disease in the exposed group divided by the incidence of disease in the nonexposed
56
HAYES
Table 1. DISEASE STATUS Exposure Status
Positive
Negative
Total
Positive Negative Total
a c ac
b d bd
ab cd
Relative risk
a/(ab) incidence of disease in exposed subjects incidence of disease in nonexposed subjects (cd)
group. If there is no association between exposure and disease, the relative risk is equal to 1. If the exposure increases the incidence of disease, the relative risk is greater than 1. If the exposure is protective, the relative risk is less than 1. Example. A randomized, controlled clinical trial was conducted to evaluate the effect of two treatments (scaling and root planing versus systemic antibiotic therapy) on periodontal disease outcomes. Successful treatment was considered to be that which resulted in a probing pocket depth of less than 4 mm at the end of 12 months of follow-up. Of the 500 subjects in the scaling and root planing group, 350 were classified as successful cases, compared with 250 in the antibiotic treatment group. The results are shown in Table 2.
The relative risk of 0.71 indicates that the standard therapy, scaling and root planing, is more beneficial in treating periodontal disease. The classification of treatment success may be considered arbitrary, and the investigator may wish to evaluate several outcomes, such as actual attachment loss in millimeters. In case-control studies, a relative risk cannot be used, because by definition the cases in a case-control study already have disease. Instead an odds ratio is calculated using the same 22 table format. Essentially, the odds ratio determines the odds of being exposed among cases and controls. Example. A case-control study was conducted to determine the association between cigarette smoking and periodontal disease. Subjects with periodontal disease were compared with a similar group of subjects free of any periodontal
Table 2. EFFECT OF ANTIBIOTIC AND SCALING AND ROOT PLANING THERAPY ON PERIODONTAL DISEASE Treatment Antibiotic Scaling and root planing Total The relative risk 250/500 0.71 350/500
No. of Successful Outcomes
No. of Treatment Failures
250 350 600
250 150 400
EVIDENCE BASED DENTISTRY: DESIGN ARCHITECTURE
57
disease. The participants’ smoking status was then ascertained by self report and validated by coltinine levels. Of the 1000 subjects with periodontal disease, 400 were smokers, compared with 200 of the controls subjects. The results are shown in Table 3.
The interpretation of the odds ratio is the same as the relative risk. Therefore, in this example, the conclusion is that smokers are 2.7 times more likely to have periodontal disease than nonsmokers. CONFIDENCE INTERVALS The measures of association are calculated with data from the sample of individuals being studied; however, it is the population estimate of risk that is of interest. To estimate the population value of the measure of association, a confidence interval is calculated. A confidence interval is one method of statistical inference that allows statements to be made about the population using data from the sample. The most commonly used method is that of calculating a 95% confidence interval. The methods of calculation are beyond the scope of this discussion; interested readers are referred to a statistical text.1, 5 Briefly, the data can be used to calculate an interval that includes lower and upper limits. For example, in a study conducted to examine the association between diabetes and tooth loss, the relative risk was calculated to be 1.9, and the 95% confidence interval was calculated to be 1.2 to 2.7. That is, the data indicate that there is approximately a twofold increase in the risk of tooth loss among diabetics as compared with nondiabetics. It can be concluded with 95% confidence that the true risk is between a 20% increase and a 2.7-fold increase. Because the null value of 1.0 is not included in this interval, this result is statistically significant. ASSESSING VALIDITY In interpreting the results of any research study, one must consider three possible alternative explanations for research findings.2, 4 These alternative explanations are chance, bias, and confounding. Chance refers Table 3. INCIDENCE OF PERIODONTAL DISEASE IN SMOKERS AND NONSMOKERS Disease Status Tobacco Use
Positive
Negative
Smokers Nonsmokers Total
400 600 1000
200 800 1000
Odds Ratio ad/bc (400)(800) 2.67 (200)(600)
58
HAYES
to the probability that the results observed may be a chance occurrence and not necessarily the result of the treatment under study. Chance is assessed by statistical analysis of the research data and by calculating a P-value. The P-value is defined as the probability that what was observed, or something more extreme, occurred by chance alone. In scientific research, the cutoff for statistical significance has traditionally been set at 0.05. That is, if a P-value is 0.05 or less, the observation is considered to be statistically significant. Numerous statistical tests are used to calculate the P-value. The type of test used depends on the type of data being analyzed. Many statistical texts are available that provide details of statistical tests.1, 5 Bias refers to the divergence from the truth. In epidemiologic studies, investigators aim to determine the true relationship between a specific exposure and a specific outcome. Anything that obscures this true association may result in bias. For example, if investigators know the treatment status of a subject, they may pay closer attention to their evaluation of the outcome, thus introducing observation bias into the study and interfering with the results. Standardization and calibration of examiners as well as blinding of the examiner or investigator and the subjects are important steps that can be taken to decrease bias in clinical research studies. A more detailed discussion of bias is provided in the paper by Jacob and Carr.5 Confounding refers to the influence of a second variable or factor on the relationship between an exposure and outcome. This factor must be associated with both the exposure and the outcome. For example, in a study examining the relationship between smoking and oral cancer, alcohol intake could be considered a potential confounder, because it is an independent risk factor for disease and is also associated with smoking. Adjustment should be made for both known and suspected confounders in multivariate analysis of the data. CRITERIA FOR CAUSALITY If the findings of a study do not seem to be the result of chance, bias, or confounding, one must attempt to determine if a causal relationship exists. Several criteria are used in epidemiologic research to determine if an association is causal.3 These criteria include (1) consistency, (2) biologic plausibility, (3) strength of association, (4) temporal relationship, and (5) dose-response relationship. Consistency refers to the body of evidence from multiple studies. The results of the present study must be compared with previous similar studies to determine if the results are consistent. If, in fact, several studies demonstrate similar results, this consistency lends credence to the association’s being a causal one. The relationship between the exposure and outcome must make sense biologically. Also, the association should be strong. For example, a relative risk of 5.0 is more indicative of a true relationship than a relative risk of 1.2. If it can be demonstrated that an exposure during a specific window of time is related to the outcome, this temporal relationship provides evi-
EVIDENCE BASED DENTISTRY: DESIGN ARCHITECTURE
59
dence for causal relationship. For example, in the prevention of neural tube defects, it has been demonstrated that women who consume folic acid during the time before neural tube closure have a lower likelihood of giving birth to child with a neural tube defect than women who take folic acid outside this critical period, thus providing evidence of a temporal relationship between the exposure and the outcome. Similarly, demonstrating that the relationship becomes stronger with increasing amounts of the exposure also lends credence to a causal relationship. It is often not possible to satisfy all the criteria for causality. It is the overall body of evidence regarding the association between a particular exposure and outcome that allows the inference of causality to be made. A single study cannot demonstrate causality, because in a single study only one sample is taken from the entire population of individuals with a particular condition. It is possible that the sample is not representative of the entire population. Thus, several studies must consistently demonstrate similar findings before a conclusion of causality can be made. SUMMARY It is important for clinicians to understand the type of clinical studies that appear in the literature and the inherent strengths and limitations of each study. The three possible alternative explanations, chance, bias, and confounding, must be considered for any research study. Thus, it is important to evaluate research studies critically in light of this discussion and not simply to summarize the findings. Finally, conclusions about causality can only be made on the body of evidence, not on any single study. References 1. Campbell M, Machin D: Medical Statistics: A Common Sense Approach, ed 2. New York, John Wiley & Sons, 1993 2. Hennekens CH, Buring JE: Epidemiology in Medicine. Boston, Little Brown, 1983 3. Hill AB: Principles of Medical Statistics. New York Oxford University Press, Chapter XXIV, 1966 4. Hulley SB, Hulley SR: Designing Clinical Research. Baltimore, Williams & Wilkins, 1988 5. Jacob RF, Carr AB: Hierarchy of research design used to categorize the ‘‘strength of evidence’’ in answering clinical dental questions. J Prosthet Dent 83:137–152, 2000 6. Weintraub J, Douglass C, Gillings D: Biostats: Data Analysis for Dental Health Care Professionals. Chicago, Joshi International, 1985 Address reprint requests to Catherine Hayes, DMD, DMSc Department of Oral Health Policy and Epidemiology Harvard School of Dental Medicine 188 Longwood Avenue Boston, MA 02115 e-mail:
[email protected]
0011–8532/02 $15.00 .00
EVIDENCE BASED DENTISTRY
BIAS IN DENTAL RESEARCH CAN LEAD TO INAPPROPRIATE TREATMENT SELECTION Rhonda F. Jacob, DDS, MS
In research, as in life, bias is the enemy of truth. R. F. JACOB
Bias is a systematic error that distorts the true relationship between an event and its outcome. Bias will negatively affect the truth of the conclusions. In research, bias includes any systematic error in the design, conduct, or analysis of a study. Bias can occur at all stages of research, from the selection of the population, how treatment is provided, to how and when outcome measurements are made. One report reviewed more than 50 possible sources of bias in analytic research.33 The various research designs differ in the features within the design that control bias. Specific maneuvers attempt to control bias by reducing opportunities for systematic errors and by encouraging impartial judgment by persons involved in the study. In health care research, bias can result in a mistaken estimate of a treatment’s effect or an exposure’s effect on the course of disease.12 These mistaken estimates probably account for some of the conflicting conclusions observed in apparently similar studies. Mistaken estimates can lead to practitioners’ offering ineffective or even harmful treatments. It is the clinician’s obligation to continue professional education by reviewing current literature. To optimize continued learning and patient care, clinicians should understand and scrutinize the various biases that can exist in research reports.
From the Department of Head and Neck Surgery, MD Anderson Cancer Center, Houston, Texas
DENTAL CLINICS OF NORTH AMERICA VOLUME 46 • NUMBER 1 • JANUARY 2002
61
62
JACOB
CLINICAL RESEARCH How the human population as a whole behaves under natural conditions and how the entire population of humans responds to a particular treatment are the ultimate health care questions for researchers and clinicians. Because the entire human population cannot be entered into or managed in a study, researchers and clinicians rely on the laws of probability and inferential statistics, which allow smaller sample populations to be studied as representatives of the population as a whole. These studies of sample populations use a multitude of research methods to determine the relationship between event and outcome. If stringent research and design criteria are not maintained, the assurance is lost that the sample population and its event-to-outcome relationship accurately represent that relationship in the total population; the study lacks validity. Health care research designs are broadly described as observational or experimental.19, 38 In observational studies, a passive investigator usually observes subjects for exposures and outcomes. In experimental studies, an involved investigator usually prescribes an intervention to achieve a particular outcome. It is generally accepted that, because of the active participation of the investigator, experimental studies offer the best opportunity to control bias and that a correctly implemented experimental study offers the best available evidence to answer a specific research question. Whether an observational or experimental design is chosen to answer a given health care question depends on the type of research question being asked. For many health care questions, an experimental research design may not be appropriate because of the constraints of population availability, population management, cost, time, and ethics. Various design strategies have evolved to overcome these constraints, but some of the strategies increase the possibility of bias. A hierarchy of research design exists, based on study validity and the ability to control bias within certain study designs.15, 32 Clinicians and researchers must understand that less confidence can be placed in the research conclusions derived from some study designs, and extreme caution must be exercised when using these study reports to influence decisions concerning patient care. In addition to employing the appropriate study design, certain elementary research methods must be implemented in all studies to control bias. These include methods regarding patient selection, examiner training, intervention, data collection, and analysis. When bias is not controlled in these areas of clinical research, conclusions are highly suspect, no matter what the study design. HIERARCHY OF RESEARCH DESIGN AND BIAS CONTROL The hierarchy of research design is based on satisfying three main criteria: (1) randomized or nonbiased selection of target and control
BIAS IN DENTAL RESEARCH CAN LEAD TO INAPPROPRIATE TREATMENT
63
subjects; (2) intervention or putative exposure under the control of the investigator, and (3) prospective gathering of outcomes after entry into the study.7, 9, 11, 13, 14, 15, 34, 38, 39a The control of bias in a given research project depends largely on whether these criteria are met. One of the greatest biases of health care research arisis from the methods of selecting the sample populations targeted for the research. If research subjects are inappropriately selected, no amount of stringent research methodology can counter the bias of sample population selection. It has been suggested that the scope of population selection bias in health care literature poses ‘‘potential catastrophic damage to a study’s inferential basis’’ and should be taken as a serious threat.7 Some research designs have more inherent patient-sampling safeguards than others. When these safeguards are appropriately executed, the higher-quality design, with the higher-quality patient sampling safeguards, should be used to make health care decisions. Randomized, Controlled Trials: ‘‘Best at Bias Control’’ Randomized, control group trials (RCTs) offer the greatest opportunity for the investigator to identify subjects and then randomly assign them to the intervention group or the control group by a predetermined randomization protocol. Treatment is not rendered until the subjects are randomly assigned to the study groups. Patient data are collected in a prospective fashion to evaluate the intervention’s effect on the outcome of interest. This study design is ideal for evaluating therapy. A sample group of subjects with the malady of interest is further narrowed in number by the use of specific inclusion and exclusion criteria appropriately based on the research question. Subjects are usually excluded from the study because their inherent characteristics are not relevant to the research question. For instance, adults would probably be excluded from an orthodontic treatment trial evaluating mandibular growth. Some characteristics or cointerventions may confound the research conclusion. Confounding characteristics may have or are suspected to have nearly as profound an effect on the outcome as the intervention, and including subjects with confounding characteristics makes it difficult to discern the true effect of the intervention. Subjects with these confounding characteristics are excluded from the study. For instance, when osseointegration of dental implants was first evaluated, diabetic patients were often excluded, because diabetes was thought to confound the ability to measure healing at the implant site. When evaluating a question related to in-office bleaching of teeth, researchers would probably exclude persons performing at-home bleaching (a cointervention), because this additional therapy would probably confound the true effect of the in-office study intervention. After the subjects are selected, they are queried about their willingness to undergo the study. Ideally, a study would report data on the subjects who were eligible for the study, but refused to enter it.7 The
64
JACOB
investigators should then evaluate the characteristics of those persons who entered and those who refused. This evaluation can establish that the persons entering or not entering the study are alike in measurable respects, and therefore the subjects entering the study are representative of the total population of those subjects. For instance, in a dental trial in which subjects are required to pay for therapy, persons of a lower socioeconomic status may be eligible for the study but consistently refuse to enter the trial because of financial concerns. This financial issue becomes a selection bias, before the trial subjects are enrolled. The subjects in the study should be recognized as representing the population within a socioeconomic stratum, rather than the population as a whole. The manner in which subjects are recruited before screening can produce a selection bias. If an implant study is advertised, only persons interested in implants report to the recruiting site. If all dental school denture patients are queried about their desire to enter an implant study, a number will probably refuse. Something is inherently different in subjects who volunteer for studies versus those who do not. Patients who actively seek implants and those who are offered and accept implants as an option to new dentures are likely to be from different subsets of the population. This selection bias of recruitment at the outset of a study could greatly affect how patients report their satisfaction outcomes and could account for implant studies’ reporting contradictory results.2, 20 After subjects are screened and found to be eligible for the study, they will be randomly assigned to the treatment groups. Randomization allows the patients an equal opportunity to be assigned to either intervention group, thereby reducing selection bias and allowing the study to be representative or generalized to the total population of other patients with similar maladies and characteristics. Randomization should be generated by computer programs,9 and the entry schedule should be kept blind to investigators and study accrual personnel. Assigning subjects to study groups by birthdate, entry date, hospital number, or an alternation schedule is haphazard, but is not randomization. These ‘‘haphazard or quasi-randomized’’ methods allow study personnel or referring clinicians to have prior knowledge of which group the patient will enter. Well-meaning assistants have been known not to enter a subject in a trial when they believe the subject would receive little benefit from the assigned therapy. A system of alternating assignment allows one to guide the order of accrual of subjects and to place a subject in a specific study group, based on the desires of the subject. Assistants responsible for accruing subjects might guide some subjects to a particular group because the morbidity rate is lower and the subject might be more likely to finish the study. If accrual personnel know the new therapy is next to be assigned, they might give positively slanted information to a prospective subject, thereby ensuring the subjects’ entry into the study. (Clinicians, too, can be influenced by their perception of what offers the best treatment opportunity for their patients.) These systematic biases can distort treatment outcomes. Blind randomization allows equal
BIAS IN DENTAL RESEARCH CAN LEAD TO INAPPROPRIATE TREATMENT
65
distribution of variables known to affect treatment outcomes. Perhaps more importantly, randomization allows equal distribution of the unknown characteristics that might affect treatment outcomes. Having interventions under the control of the examiner at the outset of the study allows treatment methods to be standardized by trained practitioners. This control also allows standardization of follow-up regimens, record keeping, and measurements of outcomes. This standardization of methods, along with adequate training for practitioners and examiners, helps minimize bias, thereby making the RCT the definitive clinical trial. Observational Comparison Studies: ‘‘Ranges of Bias Control’’ Observational studies, such as cohort and case-control studies, have been used in epidemiologic surveys to determine the natural history of a disease and exposures associated with a disease. Observational studies, in which the investigators do not actually manipulate the subjects’ exposure to a treatment or event, but only observe outcomes and often retrospectively determine exposures, are often used to discern the prevalence of a disease. Such studies are also used to determine population characteristics that might be risk factors for disease. Observational studies rank lower in the hierarchy of evidence because they can meet few if any of the three criteria for bias control. Observational comparison studies, however are usually the studies of choice when risk factors or harmful exposures are being evaluated. The major weakness of these studies is that the patients are not randomly selected; but rather are selected because they were exposed to a particular event, had a specific lifestyle choice, or were noted to have a particular outcome. A number of study designs fall under the description of observational. The strongest of the observational design strategies is the inception cohort study, in which the investigator is present immediately after the exposure or event occurs (at the inception) and follows the subjects for outcomes using prospective and standardized methods. A control group, whose subjects were not exposed to the event, must be followed with the same prospective methods to determine comparatively how many subjects develop the outcome of interest. Great care must be taken in selecting subjects for the concurrent control group. The control group must be as nearly equivalent as possible to the exposed population in every measurable characteristic that might affect the outcome. The characteristics commonly considered are age, sex, socioeconomic status, and educational background. Depending on the outcome of interest, other characteristics, such as geographic area, concurrent medical conditions, cointerventions, occupational exposures, among others, must also be examined in selecting the control population. Unfortunately, in all circumstances there are unknown characteristics that may influence the outcome of interest. Unlike the random assignment of subjects to the test or control group, there can be no safeguard to assure that these
66
JACOB
unknown characteristics are equally distributed in the control population. Therefore, it is understood that considerable bias can occur in selecting the control population. An inception cohort trial might be used to determine whether persons with and without amalgam restorations have an equal risk of developing multiple sclerosis. Another current health question that might be considered using the inception cohort design is whether the risk of autoimmune diseases is equal in women who undergo silicone breast augmentations versus those who do not. Both these issues have been hotly debated in health care and media arenas. There is probably an unknown multifactorial cause for multiple sclerosis and autoimmune diseases; therefore, selecting a control population that is similar for these unknown characteristics is nearly impossible and fraught with bias. The inception cohort study is the premier observational study because of its prospective nature. Unfortunately, waiting for outcomes to occur may take many years, leading to loss of subjects, loss of trained study personnel, and prohibitive costs. The difficulty of maintaining the validity of a protracted study adds additional biases. Other observational designs using retrospective, one-point-in-time evaluation of comparative populations with and without the outcome of interest offer a more immediate answer to the research question. The price for immediacy, however, is increased bias and risk of distorting the true relationship between event and outcome. Three types of retrospective studies are cross-sectional, ex post facto, and case-control studies.15, 39a In these research designs, the outcome has already occurred in the test population. The selection of the control population is critical to reduce bias. Control subjects should be as equivalent as possible in all characteristics to the test population, with the exception of the exposure of interest. The comparison is the incidence of the outcome in the test population and in the control population. The control subjects may come from the same population pool as the test subjects or from a different population pool. For instance, when investigating whether a particular dental assistant chair may increase the risk for lower back pain, the same-pool subjects might be drawn from all dental assistants at one dental school, but they would be allocated into the control or test group based on whether they used a specific design of chair. Control subjects drawn from a different-pool population could be assistants working at a different dental school where a different chair design is used. Same-pool populations are more likely to have similar demographic and workplace characteristics, both known and unknown. Regardless of whether same- or different-pool subjects are selected for the control group, the processes for identifying possible subjects and the final selection of each subject must be consistent. In either design, the subjects would be queried about their present or past history of back pain. Selection bias is quite difficult to control in observational studies. Because investigators are often gathering data on exposures that have already occurred, the existence of an exposure or outcome must often
BIAS IN DENTAL RESEARCH CAN LEAD TO INAPPROPRIATE TREATMENT
67
be confirmed by patient report or past medical records. Medical records are often incomplete, because practitioners may not document the specific findings required for the study. Alternately, an investigator may infer exposure or outcome from other clinical findings (not the outcome of interest) or tangential records such as insurance claims. These methods may lead to a biased selection of subjects that do not represent the totality of exposed patients. Investigators often evaluate characteristics in the two populations to show that they are similar in all respects except the exposure of interest. Even though the two groups being evaluated may seem comparable, there is always the possibility that one or more unidentified characteristics are responsible or at least influence the outcome of interest; these other characteristics are unlikely to be distributed equally between the two groups. Subjects may also be selected based on their recall of an exposure or event, thereby creating a recall bias. Subjects who have the outcome of interest, or fear they will develop the outcome of interest, are more likely to recall that the exposure occurred. During subject interviews, investigators should blind the subjects to the outcome of interest and the exposure of interest. This blinding can be accomplished by asking the subjects many questions regarding various outcomes and exposures to decrease their awareness of the possible interactions. Besides the difficulties inherent in population selection, the onepoint-in-time studies have other biases. The assessment of outcomes represents a snapshot of the subjects’ daily lives. Outcomes that are identified by waxing and waning signs and symptoms may not be present during the study evaluation. At evaluation, the outcome may be at an early, barely detectable level. Subjects aware of the possible outcome and exposure relationship may have a biased response when asked to recall their symptoms and exposure data and their cointerventions. Cointerventions or confounding signs and symptoms are also likely to wax and wane, thereby affecting the outcomes during the evaluation and affecting recall by the subjects. Observational studies have been used extensively to evaluate harmful exposures. Smoking risks for cardiovascular disease and lung cancer have been universally accepted only in the past decade. Many investigators from many countries have reported increased health risks in persons who smoke. Because of the inherent weaknesses in observational studies and the political and monetary implications of these findings, many years and hundreds of confirmatory studies were required before the risks of smoking were accepted. The few studies that have been conducted on the health issues of amalgam restorations or breast implants reveal a wide range of risks, including no increased risk, for persons undergoing these treatments. Currently, the literature regarding these controversies include more letters to the editor than clinical trials. A MEDLINE search of reports associating amalgam restorations with multiple sclerosis reveals only three small, case-control trials in the past 20 years; with inconclusive suggestions of an increased risk of multiple sclerosis or alterations in immune parameters. One study reported that
68
JACOB
their multiple sclerosis population did not have an increased number of amalgam restorations, but did have an increased number of caries compared with the control population.25 The other study reported that the multiple sclerosis group had an increased number of amalgam restorations.1 This report shows another problem with one-point-in-time evaluations: the inability of such studies to establish a cause-and-effect relationship between exposure and outcome. Patients with multiple sclerosis may have poorer oral hygiene because of possible physical constraints of their disease. Therefore, they may have more caries, and if their caries are treated, they are likely to have more amalgam restorations. This chicken-or-the-egg problem is common in case-control trials that identify possible associations between two processes. Associations can be revealed, but not causation; a cause must always precede the outcome. Case Reports and Case Series: ‘‘Bias Out of Control’’ Having a comparison control group is an absolute criterion for research. Studies that do not have a comparison group are relegated to an inferior position in the research hierarchy. These reports are most commonly referred to as case reports or case series. It has been stated that in these studies the only basis for comparison is ‘‘implicit, intuitive, and impressionistic.’’14 Sackett states that inductive reasoning gives way to seductive reasoning.32 Rather than controlling bias, case reports and case series are more likely to be bias out-of-control. Reports of a single patient outcome or a series of patient outcomes are subject to extreme bias in patient selection and treatment delivery decisions and methods. Subjects in case series do not represent a random sample of the total population, patients within the treatment group often have many pretreatment characteristics besides the malady of interest, and subjects are rarely treated with a standardized protocol of therapy. Often, data are gathered in a retrospective review of charts with nonstandardized measurement and outcomes assessment criteria. Despite their best intentions, reporting clinicians are biased by the very fact that they rendered the care and analyzed the outcome. Clinicians should never predict treatment outcomes based on reports that do not have a comparison group. Despite their unreliability as predictions of treatment outcomes, unusual case reports and case series have value. These case reports call attention to little-known maladies, reveal complications of proposed therapies, and document outcomes that may have occurred because of exposures and proposed therapies. Precisely documented characteristics and descriptive data from case series and case reports are often used to plan subsequent research with control groups. Historical Control Groups Control groups are required to assess the value of a therapy. Dental and medical reports have commonly used data from patients who were
BIAS IN DENTAL RESEARCH CAN LEAD TO INAPPROPRIATE TREATMENT
69
treated earlier at the same institution with a different modality of treatment. As a new therapy is introduced to the profession, some practitioners will begin using it. To assess the value of the new therapy, the practitioners will compare the group receiving new therapy with the group receiving the older therapy. When comparing the outcomes of the old and new therapy, the patients who received older therapy would become the historical control group. Rarely is this historical control group an arm of a RCT with specific population criteria and prospective data protocols. Instead, the historical control group usually consists of patients who were given the older therapy based on a number of decisions made by the patient and the practitioner, and specific treatment and outcome analysis methods were not standardized. Often, these data are gathered from chart reviews. Even if the two groups are treated during the same time-frame, a multitude of biases exist in this type of patient assignment and in the non-standardized methods. When patients in a historical control group were treated many months or even years previously, unknown variables and unknown cointerventions can create additional bias that is likely to affect outcome. An analysis of the literature was performed to compare findings in therapies when data reports are based on RCTs versus historical control trials. A total of six therapies had reports of both study designs evaluating similar outcome endpoints, for a total of 50 RCTs and 56 historical control trials. The historical control trials found that the new therapy was better in 78% of the trials, where the RCTs found the new therapy was better in only 20% of the trials. When comparing the control group in the RCT with the control group in the historical controlled trial, the control group in the historical control trial not only fared worse than the experimental group in that trial, but often fared worse than the control group in the RCT. This finding supports the lack of equivalence in the two populations.34 The two groups are rarely equivalent, except for the primary diagnosis. When a new therapy is developed, there are often conscious or unconscious efforts to narrow the criteria in the treatment group to include only those who are considered most likely to benefit or most likely to comply with the new methods. The others receive the traditional or historical treatment. Also, when historical controls are used, not all participants are included in the evaluation. The finding that control groups in the historical control trials faired worse than control groups in the RCTs suggests that bias in patient selection may ‘‘irretrievably weight the outcome of HCT in favor of new therapies.’’34 In a retrospective chart review of patients receiving palatal obturator prostheses to restore palatal defects following maxillectomy, it was hypothesized that patients had shorter hospital stays when they were given this prosthesis at time of surgery rather than several days after surgery. A review of 120 patients from 1960 to 1971 revealed that nearly 58% of patients did not receive surgical prostheses, and an evaluation of 151 patients from 1980 to 1984 revealed that 45% did not receive a surgical
70
JACOB
prosthesis. In the earlier trial, there was a significant difference in the duration of hospitalization of the two groups studied (22.7 and 14.2 days, respectively), but no significant difference was observed in the later trial (10.6 and 8.0 days, respectively). The practice of dentistry and medicine has changed remarkably from 1960 to 1984, but the cause of the difference in hospital stay in the two groups in the earlier trial and the cause of the magnitude of difference of hospital stay between the two trials remains undetermined. Thus, using historical controls, even within the same institution, presents difficulty in distinguishing treatment effects from changes in ancillary care, manpower, referral patterns, patient support methods, health care reimbursement, and so forth. Historical controls derived from published reports present the same difficulties.
BIAS IN RESEARCH METHODS Bias control continues beyond design selection and population selection. Specific methods of bias control should be implemented in the conduct and analysis of the investigation. These methods are applicable to all research designs.
Blind Participants Blinding the investigators, examiners, and subjects to the intervention and the outcome is a significant controller of bias. Double-blind methods are the ideal situations. Subjects and study personnel are blind to the treatment assignments and to any study events or information that might influence outcome assessments. Single-blind methods blind either the examiner or the patient. When procedures are performed, the persons who examine subjects for outcomes or collect data from subjects should not be the same individuals who perform the procedures. Dentists have been trained to perform various treatment alternatives. For example, fixed partial dentures, removable partial dentures, and implants have all been used to replace the same missing tooth. Most dentists prefer one restorative method over another, and no dentist can state that the preference is solely based on scientific evidence. If the preference is not solely based on scientific evidence, there is an element of bias, and this bias can affect the outcome assessment if outcome data are collected by the practitioner. Those who collect data should be blind to the hypotheses of the study. This blinding is likely to be easier than blinding the clinician who performed the dentistry. Data on oral conditions, restorative conditions, and function could be collected; however, only some of the data would be relevant to a given study. Some institutions have established data
BIAS IN DENTAL RESEARCH CAN LEAD TO INAPPROPRIATE TREATMENT
71
collection facilities, where routine data is collected under strict protocol for all subjects sent to the data collection facility, irrespective of the study in which the subjects are involved. Subjects can be queried about a number of oral conditions without knowing specifically what condition or exposure is relevant to the hypothesis. Blinding subjects to their treatment, especially in dentistry, requires ingenuity. Sham treatments are often unconvincing, and the informed consents required today are so explicit that study subjects may be biased by the description of the procedures and the list of possible complications. Preconceived notions that subjects form during the informed consent process may influence their outcome responses. This influence may be a problem when a model consent form, with its blanks to be completed, has been approved by an institutional review board and is expected to serve as the consent form for all studies. Investigators should campaign for wording in their specific consent form that avoids biasing study participants. When informing subjects of the comparative treatments in the study, clinicians and research assistants should strive to control their own biases. When screening persons for study entry, applicants should be reminded that the study is being conducted because the dental community is not convinced which treatment functions better, is faster, is more esthetic, has greater longevity, and so forth. When reviewing the literature, clinicians should evaluate whether blind methods were used in data collection, and the methods for assuring blinding should be explained. With this information, the clinician can determine if blinding truly occurred. If blind data collection was not employed, clinicians should search for other studies that address the research question. Treat All Subjects the Same Specific methods for delivery of interventions, data collection, and analyses should be determined before initiating an investigation. These protocols should assure that study participants in both treatment and control groups are treated and assessed equally. Doing so requires that the same follow-up regimen, follow-up data, and tests be performed on all subjects. Questionnaires and quality-of-life analyses should be administered in the same fashion to all participants. Follow-up examinations should be scheduled as often as needed to gather the data necessary to answer the research question and as often as needed to anticipate complications, complaints, and compliance issues. Bias can result if patients with complications must alter follow-up regimens because the follow-up examinations were not scheduled frequently enough. Subjects with less tolerance or with more complaints have potential for more frequent follow-up and have the potential of being evaluated differently. It is likely that more data will be gathered on these subjects. Pertinent data may be missed on subjects who return sporadically; their complications and improvements may need to be assessed by history taking
72
JACOB
rather than by examiner observation. The inequities in such data gathering should be recognized as potential biases. Although prospective interventions are not employed in observational studies, specific protocols for data review of records, patient interviews, and tests to evaluate outcomes should be designed in a prospective fashion. Before the investigation is begun, and even before the populations are selected, methods must be established so all subjects are tested and queried similarly. In some observational designs, the outcome of interest is often present before the study is initiated. The investigator queries subjects about exposure history. There is potential for investigators to interview subjects more vigorously to uncover the exposure when the subjects exhibit the outcome. This difference in the level of interrogation potentially biases towards a positive correlation between the exposure and outcome. This problem underscores the need for established methods for data gathering, as well as the need to blind the examiners to the outcomes. Often, subjects are not treated similarly because of missing data. In dentistry, outcomes or baselines may be retrospectively assessed using existing radiographs, photographs, or study casts. Records that were not made for the purpose for which they are currently being used often fall short of meeting various criteria. Frequently, subjects who are otherwise eligible for the study cannot be enrolled because these previously collected records are not available or are nondiagnostic. Records made during a routine clinical examination may serve the purpose for a patient’s treatment or evaluation on that occasion but are often not detailed enough for a later research project. For instance, casts made for custom trays may not be of adequate quality to serve as baseline for studies that require anatomic detail of all tooth surfaces. Less than ideal radiographs may not be remade if patients complain of discomfort, and appropriate angulations of film and beam may be sacrificed. Photographs may be missing; in a busy practice, clinicians may not retain serial photographs of specific patient outcomes that were unsuccessful or unesthetic. Investigators must decide either to extrapolate data from these less-than-ideal sources of documentation or to exclude these potential research subjects. Although it might seem that the better solution is to exclude subjects with missing documentation, doing so may create a serious selection bias. One study sought to evaluate the esthetic outcomes of a specific surgical method of closing cleft lip and palate. Subjects came from one surgeon’s practice, were treated by one of two surgical methods, and were included only if they had had a clinical photograph made after age 15 years. The esthetics of the lip closure were evaluated by a panel of lay judges blind to the surgical method. Subjects with missing photographs or poor-quality photographs were excluded from the investigation. Twenty subjects were included in each group for analysis. No data were supplied as to the number of subjects who never returned before age 15 years, how many subjects failed to have quality photographs, or the percentage of the entire population these 40 patients represented. In this investigation, a population selection
BIAS IN DENTAL RESEARCH CAN LEAD TO INAPPROPRIATE TREATMENT
73
bias occurred based on whether photographic documentation was available on the subjects.31 Calibration and Training of Examiners Innumerable studies are available in the health care literature that specifically test the level of agreement among multiple examiners who are evaluating a clinical test, making a diagnosis, reading radiographs, or measuring treatment outcomes. More than 300 clinical reports evaluating observer variability in health care published from 1985 to 1989 were complied in a pre-MEDLINE bibliography.8 A MEDLINE search found 57 clinical trials that evaluated observer variation between 1990 and 2001. Various indices of agreement have been formulated based on percentage, probabilities, correlation coefficients, the kappa statistic (), and others.6 The statistic is preferred because it provides for an adjustment of agreement beyond chance and is appropriate for category scales and continuous data. (Kappa is affected by prevalence, and it cannot be calculated when one of the investigators constantly uses the same score. Variations on the original formulations by Cohen are frequently employed. Kappa is widely used and widely debated. Continued variations and other models for measuring agreement are being evaluated in statistical arenas.) It has been estimated that for many medical decisions, clinical agreement is at a suboptimal level, with below 0.35.23 It has been proposed that less than 0.4 is poor agreement, of 0.40 to 0.75 is fair to good agreement, and above 0.75 to 1.00 is excellent agreement.10 Even calibrated examiners in dental investigations have not consistently reached good agreement in clinical measurement. Observer agreement was reported among seven calibrated observers of various dental specialties, who evaluated quality of bone trabeculation from 100 panoramic radiographs using a five-point scale. This scale was similar to that used in various implant studies and ranged from lack of trabeculation to bone as dense as cortical bone. The mean intraobserver agreement was 0.61. The observers were paired in 21 pairs, with interobserver agreement ranging from 0.23 to 0.56. Comparison of all seven examiners measuring all 100 sites and grades revealed 0.38. Grade 1, representing no trabeculation, had the most agreement of 0.76. Grades 2, 4, and 5 were 0.38 to 0.39. The worst agreement was for normal trabeculation with 0.23. A grade of 5, representing dense trabeculation, was given 230 times, but 25 subjects were regraded to level 2 on a repeat examination by the same examiners.39 These measurement methods have been used to qualify boney trabeculation and subsequently enroll or exclude patients from implant studies. These same bone qualification methods have been used retrospectively to explain implant failures. Another investigation considered 11 parameters of fixed restorations evaluated on a five-point scale by two calibrated examiners from each of six participating centers. The two examiners from each institution
74
JACOB
were evaluated for agreement on each of the 11 parameters. The agreement of each pair ranged from 0.16 to 0.95. The mean of the values from all six institutions for each parameter ranged from 0.56 to 0.91. Marginal integrity had the lowest level of agreement.27 An evaluation of four calibrated examiners investigating the efficacy of dental radiography found intraexaminer agreement was 0.75 or higher at baseline and remained at approximately the same level (0.80) throughout the 24-month period of the study. The interexaminer agreement among the six pairings of the four examiners ranged from 0.68 to 0.80 for caries and 0.72 to 0.83 for periodontal disease.40 As in other health care clinical measurements, various dental measurements result in ranges in practitioners’ level of agreement. This lack of agreement indicates how critical it is to decrease bias created by systematic errors in measurement by training multiple examiners in the appropriate use of measurement instrumentation and in the implementation of clinical criteria. The more explicitly each measurement technique and category is defined, the less ambiguous are the demarcations between categories, and the higher is the observer agreement. The level of agreement of multiple examiners should be tested before an investigation to assure that the examiners have reached an understanding of measurement criteria and an acceptable level of agreement. During the investigation, continued calibration is often necessary, and the final level of agreement achieved during the investigation should be reported. Accounting for all Subjects It is disconcerting to an investigator to have subjects not complete a study. Statistical tests (power analysis) are often performed before the investigation to determine how many subjects are necessary to detect a difference in outcome between the groups. When subjects do not finish the trial, a result may be inconclusive because the lower number of study subjects causes a lack of statistical power. In prospective trials that require a long follow-up to determine the outcome of interest, there is an increased chance of losing subjects for a myriad of reasons: noncompliance, moving away from area, loss of contact, inability to travel to test site, and unrelated death, among others. It is important to determine the characteristics of the subjects who left the study and to perform another analysis of the remaining subjects to determine if the two groups are still equivalent in the variables that might influence the treatment effect. In addition, one should determine if the dropouts are more common in one group than the other. Uneven loss of subjects was found in a study evaluating the effectiveness of vitamin C in decreasing cold signs and symptoms.20a The caplets often broke, allowing subjects to taste the medication, and subjects discussed this occurrence among themselves while waiting for study evaluations. Persons in the placebo group realized they were not tasting ascorbic acid and began to drop out of the study, anticipating no benefit, where as the subjects ‘‘tasting
BIAS IN DENTAL RESEARCH CAN LEAD TO INAPPROPRIATE TREATMENT
75
the benefit of treatment’’ continued the study. More drop-outs in one group than another can signal a loss of blindness to therapy or may indicate untoward side effects. Uneven loss makes the study groups unequal in numbers and in known and unknown study variables. Too often, reports simply change the number of subjects (N) at the end of the study, with minimal or no reference to the subjects lost to follow-up. It is assumed that these lost subjects have experienced the outcomes at the same rate as those subjects remaining in the study. For example, a systematic literature review of the English-language reports published since 1960 evaluated the survival rate of fixed partial dentures (FPD). Difficulty arose in performing a meta-analysis of the reports because many of the reports did not have any follow-up data on a large portion of the subjects after insertion of the prostheses.36 As follow-up continued, even more subjects were lost to follow-up. One report quantified 255 FPD inserted over 10 years but only had 121 available for evaluation at year 11.30 Another considered a one-point-in-time evaluation of 77% of an original 184 FPD. No data were reported on the 33% of lost subjects.5 Eighteen years after insertion of 122 FPD, 66 persons were available for a follow-up analysis. No data were reported on the 54% of lost subjects.28 A large database of 642 FPD inserted in 1974 was randomly selected from a national dental insurance registry. A 10-year evaluation was made, but only 164 persons presented for examination.21 The subjects were evaluated again at year 14, with only 97 of the original 642 subjects reporting.22 It is inappropriate to assume that 30% to 50% of subjects lost to follow-up would have the same outcomes as those subjects remaining in the study. Investigations are often undertaken to determine differences in treatment outcomes that are usually quite small. Often, the difference in outcomes between the therapies is only 10%. Loss of subjects will reduce the statistical ability to detect these small differences in outcome. Losing only 10% to 15% of subjects can render a study inconclusive. Altering the final N of the study risks drawing the wrong conclusion about the value of the therapy. Data Used Appropriately: Chart Reviews and Errors of Omission In health care research, review of patient treatment records is a common method of describing disease prognosis and determining therapeutic outcomes. Often, historical control data are collected from treatment records to compare previous therapies with current therapies. Some studies have used insurance records or national health care registries to gather data on the prevalence of a disease. When patients are treated as subjects in a research protocol, the data recorded are driven by the research question. In a well-designed trial, measurements or tests required for the protocol are documented and read with strict attention to minimizing bias, using many of the methods previously described.
76
JACOB
Records kept for routine treatment in a clinical setting, however, are often incomplete. Tests may be read, but not recorded. Not all subjects will receive the same tests, and techniques may be modified based on factors unrelated to the disease process. Patient compliance is often not considered. Cointerventions are rarely recorded. Follow-up examinations are often scheduled at patients’ requests; therefore, unless patients have a specific complaint, their follow-up schedule will be abbreviated compared with other patients receiving the same treatment. Often, the notes are influenced by a patient’s complaint; unless the patient complains, the follow-up note is an array of summary statements of ‘‘patient satisfied, within normal limits, normal diet, good esthetics, good occlusion, watch tooth # 3,’’ and so forth. Treatment records are maintained by the treating clinician, and often patients are reluctant to complain to their practitioners, lest that complaint negatively affect the practitioner– patient relationship. For the same reason, patients may tend to overemphasize the positive outcomes of their treatment. Clinicians are also likely to overestimate the positive outcomes of therapies they deliver, waiting for patients to bring forward complaints, rather than asking whether patients experience particular difficulties. Without standard treatment protocols and documentation, omission of data or ambiguous interpretation of data to fit a research question is problematic. A concurrent investigation was undertaken to evaluate temporomandibular disorder on a group of patients receiving orthognathic surgery. An RCT evaluating the cost, risks, and efficacy of two jaw fixation techniques was performed, and pertinent data were documented by the treating clinicians in the patients’ records. The second study involved specific evaluations of patients with temporomandibular disorder performed by blind examiners with specific examination protocols performed on the same patients. The authors then examined the disagreement between data taken from the treatment records and data taken from the temporomandibular disorder examination. Four parameters were evaluated: (1) a vertical opening of more than or less than 40 mm, (2) the presence or absence of clicking, popping, or locking of a joint, (3) the presence or absence of pain, and (4) the presence or absence of crepitus. Although both studies were prospective, it became apparent that the surgeons focused more on efficacy of treatment than on secondary outcomes. Often, no data in the treatment records addressed the criteria for temporomandibular disorder. In other instances, it was necessary to create operational definitions of the four criteria that would allow interpretation of the surgeons’ notes to categorize the outcomes. At 2- and 24-month surgical follow-ups, surgeons stated that 23% and 0% of subjects, respectively, had a vertical opening below 40 mm, whereas the temporomandibular disorder examiners reported 90% and 21%, respectively. The surgeons reported pain in 8.6% and 1.7% of the subjects, respectively, whereas the temporomandibular disorder examiners reported 47% and 29%, respectively. These differences show the level of disagreement that can occur when data from routine treatment records are used for research purposes as compared with data gathered by blind, calibrated examiners.85
BIAS IN DENTAL RESEARCH CAN LEAD TO INAPPROPRIATE TREATMENT
77
SUMMARY The first RCT was instituted in the early 1950s, evaluating streptomycin and bed rest compared with bed rest alone for tuberculosis.26 This research design has become the reference standard for comparative evaluations of therapies because of its prospective nature and the ability to control bias. Because it is easier to conduct observational studies, they have often been inappropriately substituted for the better experimental study designs. Since the 1950s, however, readers of the medical literature have slowly come to demand quality clinical research to assist them in caring for their patients. Dentists are somewhat behind their medical colleagues in using the strongest research designs to answer clinical questions. In dentistry, observational studies with convenience samples of patients have been commonly used. It is often argued that few dental ailments affect a person’s life as negatively as most medical maladies; therefore, experimental rigors are not required of dental research. Although most dental care does not involve life-and-death issues, dentists are as eager as physicians to offer their patients optimal care. Optimal care is best defined through nonbiased research strategies. References 1. Bangsi D, Ghadirian P, Ducic S, et al: Dental amalgam and multiple sclerosis: A casecontrol study in Montreal, Canada. Int J Epidemiol 27:667–671, 1998 2. Boerrigter EM, Geertman ME, van Oort RP, et al: Patient satisfaction with implantretained mandibular overdentues: A comparison with new complete dentures not retained by implants—a multicentre randomized clinical trail. Br J Oral Maxillofac Surg 33:282–288, 1995 3. Carr AB, McGivney GP: Users’ guides to the dental literature: How to get started. J Prosthet Dent 83:13–15, 2000 4. Chalmers TC, Celano P, Sacks HS, et al: Bias in treatment assignment in controlled clinical trials. N Engl J Med 309:1358–1361, 1983 5. Cheung GS, Dimmer A, Mellor R, et al: Gale M. A clinical evaluation of conventional bridgework. J Oral Rehabil 17:131–136, 1990 6. Cohen J: A coefficient of agreement for nominal scales. Educational Psychology and Measurement 20:37–46, 1960 7. Ellenberg JH: Selection bias in observational and experimental studies. Stat Med 13:557–567, 1994 8. Elmore JG, Feinstein AR: A bibliography of publications on observer variability (final installment). J Clin Epidemiol 45;567–580, 1992 9. Feinstein AR: Clinical Epidemiology: The Architecture of Clinical Research. Philadelphia, WB Saunders, 1985 10. Fleiss JL: The measurement of interrater agreement. In Fleiss JL: Statistical Methods for Rates and Proportions, ed 2. New York, John Wiley & Sons, 1981 11. Friedman GD: Primer of Epidemiology, ed 4. New York, McGraw-Hill, 1994 12. Gordis L: Epidemiology. Philadelphia, WB Saunders, 1996 13. Hulley SB, Cummings SR: Designing Clinical Research in Epidemiologic Research, ed 2. Baltimore, Lippincott Williams & Wilkins, 2001 14. Isaac S, Michael WB: Handbook in Research and Evaluation: A Collection of Principles, Methods, and Strategies Useful in the Planning, Design and Evaluation of Studies in Education and the Behavioral Sciences, ed 2. San Diego, CA, Edits Publishers, 1981 15. Jacob RF, Carr AB: Hierarchy of research design used to categorize the ‘‘strength of evidence’’ in answering clinical dental questions. J Prosthet Dent 83:137–152, 2000 16. Jacob RF: [abstracts/commentary]. Journal of Prosthodontics 6:325–327, 1997
78
JACOB
17. Jacob RF: [abstracts/commentary]. Journal of Prosthodontics 7:210–213, 1998 18. Jacob RF: [abstracts/commentary]. Journal of Prosthodontics 7:68–69, 1998 19. Jaeschke R, Sackett DL: Research methods for obtaining primary evidence. Int J Technol Assess Health Care 5:503–519, 1989 20. Kapur KK, Garrett NR, Hamada MO, et al: Randomized clinical trial comparing the efficacy of mandibular implant-supported overdentures and conventional dentures in diabetic patients. Part III: Comparisons of patient satisfaction. J Prosthet Dent 82:416–427, 1999 20a. Karlowski TR, Chalmers TC, Frenkel LD, et al: Ascorbic acid for the common cold. A prophylactic and therapeutic trial. J Am Med Assoc 231:1038–1042, 1975 21. Karlsson S: A clinical evaluation of fixed bridges, 10 years following insertion. J Oral Rehabil 13:423–432, 1986 22. Karlsson S: Failures and length of service in fixed prosthodontics after long-term function. A longitudinal clinical study. Swed Dent J 13:185–192, 1989 23. Koran LM: The reliability of clinical methods, data and judgments. N Engl J Med 293:695, 1975 24. Kramer MS: Clinical Epidemiology and Biostatistics: A Primer for Clinical Investigators and Decision-makers. Berlin, Springer-Verlag; 1988 25. McGrother CW, Dugmore C, Phillips MJ, et al: Multiple sclerosis, dental caries and fillings: A case-control study. Br Dent J 187:261–264, 1999 26. Medical Research Council: Streptomycin treatment of pulmonary tuberculosis. BMJ 2:769–782, 1948 27. Morris HF: Department of Verterans Affairs cooperative studies project number 147: Level of examiner reliability over seven years. Implant Dentistry 2:245–249, 1993 28. Palmqvist S, Swartz B: Artificial crowns and fixed partial dentures 18 to 23 years after placement. International Journal of Prosthodontics 6:279–285, 1993 29. Phillips C, Tulloch JF: The randomized clinical trial as a powerful means for understanding treatment efficacy. Seminars in Orthodontics 1:128–138, 1995 30. Reuter JE, Brose MO: Failures in full crown retained dental bridges. Br Dent J 157:61–63, 1984 31. Ross RB. MacNamera MC: Effect of presurgical infant orthopedics on facial esthetics in complete bilateral cleft lip and palate. Cleft Palate Craniofac J 31:68–73, 1994 32. Sackett DL, Haynes RB, Guyatt GH, et al: Clinical Epidemiology. A Basic Science for Clinical Medicine, ed 2. Boston, Little, Brown and Co, 1991 33. Sackett DL: Bias in analytic research. Journal of Chronic Diseases 32:51–63, 1979 34. Sacks H, Chalmers TC, Smith H Jr: Randomized versus historical controls for clinical trials. Am J Med 72:233–240, 1982 35. Scott BA, Clark GM, Hatch JP, et al: Comparing prospective and retrospective evaluations of temporomandibular disorders after orthognathic surgery. J Am Dent Assoc 128:999–1033, 1997 36. Scurria MS, Bader JD, Shugars DA: Meta-analysis of fixed partial denture survival: Prostheses and abutments. J Prosthet Dent 29:459–464, 1998 37. Shugars DA, Bader JD, White BA, et al: Survival rates of teeth adjacent to treated and untreated posterior bounded edentulous spaces. J Am Dent Assoc 129:1089–1102, 1998 38. Stamm JW: Types of clinical caries studies: Epidemiological surveys, randomized clinical trials, and demonstration programs. J Dent Res 63:701–707, 1984 39. Taguchi A, Tanimoto K, Suei Y, et al: Observer agreement in the assessment of mandibular trabecular bone pattern from panoramic radiographs. Dentomaxillofacial Radiology 26:90–94, 1997 40. Valachovic RW, Douglass CW, Berkey CS, et al: Examiner reliability in dental radiography. J Dent Res 65:432–436, 1986 Address reprint requests to Rhonda F. Jacob, DDS, MS MD Anderson Cancer Center 1515 Holcombe Boulevard, Box 0441 Houston, TX 77030 e-mail:
[email protected]
EVIDENCE BASED DENTISTRY
0011–8532/02 $15.00 .00
SYSTEMATIC REVIEWS OF THE LITERATURE The Overview and Meta-analysis Alan B. Carr, DMD, MS
INTRODUCTION TO THE INFORMATION PROBLEM The process of being a continual learner in this information age is a significant challenge. This challenge is especially significant for the health care provider who realizes that patient care is not a stagnant undertaking but an evolving process in which the responsibility to act in the patient’s best interest requires continual infusion of new knowledge and skills. For others, who are possibly less motivated to stay upto-date, state licensure organizations impose expectations of continuing education that strongly suggest it is in the publics’ best interest for professionals to improve their knowledge continually to provide adequate patient care. For all dental practitioners, staying up-to-date is a challenge because of the vast amount of clinical research available. At the heart of the problem is the difficulty in finding a focused answer that has the best chance of truthfully informing clinicians to act in the patient’s best interest regarding a specific clinical dilemma. The dilemma associated with the sheer volume of the literature available is illustrated by a recent publication15 that focused on a specific area of dental care, the dental implant literature. In this study, the authors wanted to estimate the quantity of dental implant literature available from the MEDLINE database between the years 1989 through 1999 that could be used to guide evidence based decisions. The search
From the Department of Dental Specialities—Prosthodontics, Mayo Graduate School of Medicine, Mayo Clinic, Rochester, Minnesota
DENTAL CLINICS OF NORTH AMERICA VOLUME 46 • NUMBER 1 • JANUARY 2002
79
80
CARR
strategy was designed to identify the best evidence related to the categories of etiology, diagnosis, therapy, and prognosis in implant care. The results for this single area of dentistry reinforced the notion of an information explosion. The search provided an amount of clinically relevant information regarding implants that would require a clinician to read between one and two articles a week for 52 weeks out of the year just to stay current with the progress in dental implants. For the practitioner also interested in staying current in other areas, such as prosthetic, surgical, periodontal, endodontic, and direct restorative procedures, staying current could indeed be difficult. To determine whether this volume of literature is characteristic of all aspects of dentistry or only of special dental subjects such as dental implants, another study16 investigated trends in dental and medical research publications and the proportion of high-quality clinical studies (randomized, controlled trials [RCTs]) of relevance to general dentistry. In this study, the authors conducted a MEDLINE search of the literature published between 1969 and 1999 and found that clinical trials in dental research had increased to 7% and RCTs had increased to 5% of all dental research during this period. Although the overall number of research publications decreased during this period, the proportion specifically related to outcomes of patient care had increased. Thus, more of the literature currently published focuses directly on patient care and might be important for clinicians to read. Between 1979 to 1999, the authors found that one of every 200 research publications was an RCT, studies which by nature of their design have the best chance to provide valid and reliable information. These trials were relevant to between 60% of the dental care activities for adults and 80% of those for children. Together these findings suggest that more high-quality information is available to clinicians than ever before. In a professional life that leaves little time for reviewing the increasing numbers of potentially useful research reports, how does the conscientious clinician of today find the highest quality and most relevant reports among the hundreds of others. A SOLUTION FOR THE BUSY PRACTITIONER One solution is for the clinician to seek reports that synthesize numerous sources of clinical information into summary statements or recommendations regarding specific clinical questions or controversies. These articles can save the clinician time and effort spent sorting through numerous primary research reports. Such research syntheses go by various names and can take a variety of forms. The familiar literature review is a narrative summary of some clinical topic or group of topics, often provided by an expert in the field and usually characterized as an unsystematic compilation of opinion and evidence. Although it intuitively seems correct that experts should be able to inform clinicians about a topic they have studied intensively, it has been shown that they are less able to produce objective reviews of the literature in their subject
SYSTEMATIC REVIEWS OF THE LITERATURE
81
than are nonexperts.11 More reliable are the reviews that take a systematic approach in providing an overview of the relevant and important primary research regarding a specific clinical question. (In this context, primary research refers to the research reports that contain the original information on which the review is based.) Such a systematic review is an overview of the primary research that has an explicit statement of the objectives, materials, and methods and has been conducted following a previously established rigorous and reproducible methodology.5 When the systematic review includes a statistical synthesis of the numerical results of several trials that examined the same question it is termed a meta-analysis. Systematic reviews are now considered the most reliable method for summarizing large volumes of research evidence. These reviews are less prone to subconscious and subjective forms of bias often seen in reports by experts because they follow principles of research design similar to those found in primary research. The fundamental difference between the primary research study and the systematic review is the unit of study. The scientific principles of a systematic review—documentation of methods before beginning, a comprehensive search identifying all relevant studies, and the use of rigorous methods for appraisal, collection, and synthesis of data—limit the bias in identifying and rejecting studies and provide more reliable and accurate conclusions. The usefulness of overviews and meta-analyses is reflected in the increasing numbers of review publications and in the efforts of groups, most notably the Cochrane Collaboration, to prepare, maintain, and disseminate results of systematic reviews of health care. The Cochrane Collaboration is an international initiative for systematic review management and currently has an Oral Health Group that encourages participation by interested individuals.
ANATOMY OF A SYSTEMATIC REVIEW The specific features that illustrate the systematic approach and improve the chance of providing the best synthesized evidence are • Preparation of a detailed research protocol that outlines the clinical question of interest • Selection of criteria for inclusion of articles in the review • Systematic search of relevant published and unpublished research • Determination (by two reviewers) of articles that meet predefined inclusion criteria • Critical appraisal of the quality of selected articles • Extraction of outcome data from the selected articles • Data combination (where appropriate) to synthesize and summarize the best evidence • Report of findings relative to the knowledgebase and new questions raised by the findings
82
CARR
A systematic review has distinct advantages over an unsystematic approach.6 The authors must describe where the data (the published trial) come from and how they were processed to arrive at the conclusions. Being explicit about the methods taken to identify and select the appropriate trials for processing is important in limiting bias and provides more accurate and reliable synthesized information from the volumes of related literature. Also, with a systematic review, large amounts of information can be assimilated in a timely manner, resulting in shorter delays between research discoveries and implementation of demonstrated effective patient-management methods. Results of different studies can be more formally compared, inconsistencies among studies can be identified, and the causes for the inconsistencies can be evaluated. When possible, quantitative systematic reviews, or meta-analyses, can provide more precise answers by combining overall results of many similar trials, increasing confidence in the clinical application of the results. A summary of some recent dental systematic reviews illustrates the important steps in this process. The importance of the search for uncovering all potential sources (articles) that can contribute to the results is highlighted by recognizing that a simple MEDLINE search alone is inadequate for this phase of the systematic review process. One recent report9 describes a process that included a review of 25 electronic databases, the Worldwide Web, relevant journals that were also handsearched, and authors in the field who were personally contacted for additional information. Personally contacting authors is an important attempt to address potential publication-bias problems. This form of bias, which results from the selective publication of studies based on the direction and magnitude of their results, is harmful if important negative studies are not published.10 Because systematic reviews pool results, conclusions derived in the absence of truthful negative studies could lead to overestimation of treatment effectiveness. Another review provides a good illustration of several key features of the systematic process. In this recent report of the effectiveness and cost-effectiveness of prophylactic removal of wisdom teeth, the authors wanted to provide a summary of existing evidence on prophylactic removal of wisdom teeth in terms of the incidence of surgical complications and the morbidity associated with wisdom tooth retention.18 The inclusion criteria were in three main categories: design (RCT, literature review, or decision analysis), patient characteristics (unerupted or impacted wisdom teeth, or those having wisdom tooth extraction prophylactically or because of disease), and reported outcomes (either pathologic changes associated with retention of wisdom teeth or postoperative complications following extraction). The data sources included an existing review that formed the basis for the report, six electronic databases, paper sources (including Clinical Evidence), web-based resources, and relevant organizations and professional bodies that were contacted for further information. For non-English papers, translators were recruited to assist with study selection and data extraction. Decisions
SYSTEMATIC REVIEWS OF THE LITERATURE
83
regarding study selection, data extraction, and validity assessments were made by two independent reviewers; when the reviewers disagreed, discussion took place to gain consensus. The process of assessing validity followed a previously established checklist that was used to evaluate data organized into structured tables. This process resulted in 40 studies being included in the review: two RCTs, 34 literature reviews, and 4 decision analysis studies. The authors’ method of dealing with such a mix of data sources is instructive. Specifically, it was stated that the methodologic quality of the literature reviews (no systematic reviews were included) was generally poor. Although most of the reviews suggested that prophylactic removal was not warranted, the three reviews that did suggest such removal was justified were of poorer methodologic quality than most other reviews. When reviews include primary research with less-than-optimal designs, Slavin17 emphasizes the need to report more details about the studies. Another study faced a similar situation involving questions of study-design related to research synthesis. This study used an alternative method manner for selecting the articles to be included when the primary criteria were not met.14 The aim of this review was to assess the clinical evidence for the ability of glass-ionomer restoratives to inhibit secondary caries. A total of 52 articles that met previously established inclusion criteria were evaluated. Primary and secondary lists of systematic criteria for methodologic quality were drawn up. After applying the primary list of 14 criteria to each article, none was found to be acceptable. The secondary list, which included design features of a prospective trial with an appropriate control group, was then applied to the 52 articles and yielded 28 suitable for data extraction and evaluation. The methodology used in creating a systematic review and the syntheses such reviews provide make them useful for clinicians who do not have the time to review all the primary studies related to a clinical question of interest. Because systematic reviews offer the best chance for busy clinicians to act in their patient’s best interest, it is important to know how to evaluate them.
WHAT TO LOOK FOR IN A USEFUL SYSTEMATIC REVIEW A number of helpful descriptions for evaluating the validity of systematic reviews have been presented in the literature.3, 7, 13 The important questions to consider when assessing a systematic review12 are • Was a clinical question clearly stated and addressed? • Were the search methods comprehensive enough to find all relevant articles? • Were explicit methods used to evaluate which articles to include in the review?
84
CARR
• Was validity of the articles assessed, and was this assessment reliable and free from bias? • Were inconsistencies in the findings of the included studies analyzed? • Were the findings of the primary studies combined appropriately? • Were the reviewers’ conclusions supported by the data? Without a clear statement of the clinical question it addresses, clinicians have no idea if the review can help with their patients’ needs. For clarity, questions must include specification of the patient population involved, the intervention or exposure studied (often with a comparison or standard treatment group), and the outcomes evaluated. Even a good question cannot be adequately answered if all pertinent articles are not found to evaluate. The reader therefore must be reasonably assured that all relevant and important literature has been included in the review. It is likely that comprehensive searches will include (1) use of one or more bibliographic databases, (2) a search for reports that cite the important papers found through a database such as Science Citation Index, (3) perusal of the references of all relevant papers found (and often the references of the references), and (4) personal communication with authors and organizations active in the area being reviewed. A comprehensive search will probably yield many articles not useful for review. An article may be unsuitable because it does not directly relate to the question of interest or because a certain study design is methodologically too weak to provide valid information. The authors should clearly describe how the articles were chosen and, the method used may apply methodologic criteria. Such criteria will not always produce studies that are valid, so a validity assessment is also necessary so that the review will be based on data that are as free from bias as possible. Guidelines for such assessment have recently been published in dentistry for clinical questions that address diagnosis,2 prognosis,1 and treatment.4, 8 Such guidelines should be applied and reported in sufficient detail to allow readers to assess the validity of the primary articles. Even with the use of methodologic guidelines, assessments can be both unreliable and biased. Such assessments can affect both the inclusion and validity assessment of the primary studies. As a safeguard, the primary studies should be assessed by at least two reviewers, each blind to the other’s decision. The level of disagreement should be known, and the rules to reach consensus should be reported. To protect from the bias associated with a lack of blindness, the information regarding the institution and authors associated with the primary research can be removed before assessment for inclusion and validity. Variation in the findings from the assessed studies is inevitable. Reasons for this variation can include chance, study design, and differences in the three basic study components mentioned previously (population, exposure or intervention, and outcome). Authors of reviews who discuss the potential impact of all possible sources of variation have met
SYSTEMATIC REVIEWS OF THE LITERATURE
85
their responsibility to the reader. Whether the review uses statistical methods of data synthesis or not, the author should clearly state the basis for any conclusions and explain any conflicting results. The primary studies included in the review should have been reported in sufficient detail to allow the reader to assess critically the basis for any conclusions. SUMMARY Systematic reviews in the form of overviews or meta-analyses offer a solution for busy practitioners who have difficulty keeping abreast of current literature. Because systematic reviews can condense numerous studies into reliable and valid summaries of the best available evidence for a specific clinical problem, they offer significant benefit to busy clinicians. This article has summarized the major features and advantages of systematic reviews. It has distinguished those features that attempt to increase the usefulness of reviews by limiting bias, and it provided a summary of important questions clinicians can use to appraise such reviews critically. With this knowledge, clinicians should be able to use the literature more appropriately and in a timely fashion. References 1. Anderson JD, Zarb GA: Evidence based dentistry: Prognosis. J Prosthet Dent 83:495– 500, 2000 2. Eckert SE, Goldstein GR, Koka S: How to evaluate a diagnostic test. J Prosthet Dent 83:386–391, 2000 3. Felton DA, Lang BR: The overview: An article that interrogates the literature. J Prosthet Dent 84:17–21, 2000 4. Goldstein GR, Preston JD: How to evaluate an article about therapy. J Prosthet Dent 83:599–603, 2000 5. Greenhalgh T: Papers that summarize other papers (systematic reviews and metaanalyses). In: How to Read a Paper: The Basics of Evidence Based Medicine. London, BMJ, 1997, p 111 6. Greenhalgh T: Papers that summarize other papers (systematic reviews and metaanalyses). In: How to Read a Paper: The Basics of Evidence Based Medicine. London, BMJ, 1997, p 113 7. Greenhalgh T: How to read a paper: The Basics of Evidence Based Medicine. London, BMJ, 1997 8. Jacob RF, Lloyd PM: How to evaluate a dental article about harm. J Prosthet Dent 84:8–16, 2000 9. McDonagh MS, Whiting PF, Wilson PM, et al: Systematic review of water fluoridation. BMJ 321:855–859, 2000 10. Montori VM, Smieja M, Guyatt GH: Publication bias: A brief review for clinicians. Mayo Clin Proc 75:1284–1288, 2000 11. Oxman AD, Guyatt GH: The science of reviewing research. Ann N Y Acad Sci 703:125–131, 1993 12. Oxman AD, Guyatt GH: Guidelines for reading literature reviews. Can Med Assoc J 138:697–703, 1988 13. Oxman AD, Cook DJ, Guyatt GH for the Evidence-Based Medicine Working Group: User’s guide to the medical literature. VI. How to use an overview. JAMA 272:1367– 1371, 1994 14. Randall RC, Wilson NH: Glass-ionomer restoratives: A systematic review of a secondary caries treatment effect. J Dent Res 78:628–637, 1999
86
CARR
15. Russo SP, Fiorellin JP, Weber HP, et al: Benchmarking the dental implant evidence on MEDLINE. Int J Oral Maxillofac Implants 15:792–800, 2000 16. Sjogren P, Hallinf A: Trends in dental and medical research and relevance of randomized controlled trials to common activities in general dentistry. Acta Odontol Scand 58:260–264, 2000 17. Slavin RE: Best evidence synthesis: An intelligent alternative to meta-analysis. J Clin Epidemiol 48:9–18, 1995 18. Song F, O’Meara S, Wilson P, et al: The effectiveness and cost-effectiveness of prophylactic removal of wisdom teeth. Health Technol Assess (Winch Eng) 4:1–55, 2000 Address reprint requests to Alan B. Carr, DMD, MS Department of Dental Specialties—Prosthodontics Mayo Graduate School of Medicine Mayo Clinic 200 First Street SW Rochester, MN 55905 e-mail:
[email protected]
0011–8532/02 $15.00 .00
EVIDENCE BASED DENTISTRY
THE USE OF DIAGNOSTIC DATA IN CLINICAL DENTAL PRACTICE Carol Oakley, DDS, MSc, PhD, and Donald Maxwell Brunette, MSc, PhD
‘‘If it looks like a duck, quacks and waddles like a duck . . . then it probably is a duck!’’ and ‘‘if you hear hoof beats, think of horses not zebras’’ (unless, of course, you are on the plains of the Serengeti). At first glance, these adages may seem irrelevant to the diagnostic process in clinical dental practice. These adages, however, respectively illustrate the principle of pattern recognition and the effect of prevalence, both of which are important aspects of the diagnostic process. This article presents the dentist in clinical practice with an evidencebased approach to diagnostic data and tests so that the reader can become a more discriminating user of tests offered by the medical profession and, increasingly, by the pharmaceutic industry for promotional purposes. This article reviews a few basic principles of biostatistics, discusses test design and test characteristics, and demonstrates how to identify a good test and the circumstances in which a test will be useful in the clinical setting. For ease of discussion, this article focuses on dichotomous data that are divided into mutually exclusive categories: positive or negative. Data are presented from the dental literature, and clinical dental examples are used. Texts providing more detailed, comprehensive information regarding biostatistics, clinical epidemiology, and related topics are listed with the references.4, 22, 29 Much of the following discussion has been summarized from these sources.
From the Department of Oral Biological and Medical Sciences, Faculty of Dentistry, University of British Columbia, Vancouver, British Columbia, Canada
DENTAL CLINICS OF NORTH AMERICA VOLUME 46 • NUMBER 1 • JANUARY 2002
87
88
OAKLEY & BRUNETTE
THE DIAGNOSTIC PROCESS Most dentists have had their height, weight, and blood pressure measured in a physician’s office. They may have had blood drawn for complete blood cell count and differential blood series or testing for cholesterol levels, prostate surface antigen, blood glucose, or thyroid hormone levels. They may have undergone tuberculin skin tests, mammography, electrocardiograms, and cardiac stress tests or had suspicious moles removed for histologic examination. They may even have sought the convenience of home pregnancy tests. As dentists, clinicians have probably prescribed dental radiographs and used explorers and periodontal probes to detect caries, defective restorative margins, and periodontal attachment loss. They may have applied electric pulp testers or ice to teeth to determine their vitality. They may have used toluidine blue dye to aid in selecting sites for biopsy of suspicious oral lesions. They may have recorded mandibular excursions, palpated muscles of mastication, and listened for temporomandibular (TM) joint sounds. As consumers and providers of health care, dentists reasonably expect that the information obtained from diagnostic investigations is reliable and truthful. Moreover, it is generally assumed that the information obtained from these investigations will provide a diagnosis as to the presence or absence of an abnormality or disease and that the diagnosis will direct a subsequent course of management or treatment. The question remains, however: how can patients and clinicians know if the data and subsequent diagnosis are correct? Beck2 maintains that dentistry, in contrast to medicine, has deemphasized diagnostic activities and merged them with treatment-planning activities. Nevertheless, the aim of a medical or dental clinician is to arrive at a diagnosis that may direct a subsequent course of management. The diagnostic process is initiated by the patient history and symptoms and is followed by the clinical examination, during which the clinician perceives signs that are manifestations of the disorders. The clinician may also use assays or measurements that are traditionally referred to as diagnostic tests or tools. In reality, symptoms, signs, and assays may all be considered diagnostic tools, because all are sources of information used to generate a diagnosis.2 Sacket et al29 explain that patients, clinicians, and researchers generally agree that the presence of disease indicates a derangement in anatomy, biochemistry, physiology, or psychology. They less often agree, however, on the exact criteria that define the condition that is the target of the diagnostic process. Wulff 40 distinguished two major principles of disease: (1) the nominalistic or patient-oriented principle, and (2) the essentialistic principle that emphasizes disease as an independent entity. In the nominalistic approach, disease does not exist as an independent entity, and disease classification is really a classification of sick people or patients. Thus, a particular disease is defined by a group of characteristics that occur more often in persons with the disease than in other people. Patients
THE USE OF DIAGNOSTIC DATA IN CLINICAL DENTAL PRACTICE
89
will have a pattern of similar symptoms and signs, and their prognosis and treatment will have some common features. The nominalistic principle does not require a definition of normality and recognizes that definitions of disease may vary among different societies.40 The essentialistic view40 is closely related to a modern principle of disease termed biochemical fundamentalism.6 This view is based on the idea that disease can be described in terms of biochemistry and molecular biology. Diseases are assumed to follow regular patterns, and once the underlying biochemical events are understood, the course of the disease can theoretically be predicted. Hence, disease classification becomes a matter of biotechnology, and the need for defining a normal state is avoided by relying upon statistical terms to define the disease state. That is, disease is defined by the distribution of certain features in a particular population and the extent to which that distribution differs from a similar assessment of a group the investigators consider not diseased.6, 40 This statistical approach forms the basis for using biomarkers as diagnostic or screening tests. Contemporary clinical medical and dental practice is still an art and a science. Overall, the nominalistic approach may offer a more realistic strategy for coping successfully with the varying manifestations of conditions such as coronary heart disease and temporomandibular disorders (TMD) that can be defined in both essentialistic and nominalistic terms.24, 29
DIAGNOSTIC DECISION ANALYSIS The use of diagnostic data and tests can be considered at three levels: screening, confirmatory, and exclusionary.13, 29 The objective of screening procedures is the early detection of disease, before symptoms associated with the disease are apparent. Thus, screening tests are conducted on individuals who do not have symptoms associated with the condition for which screening is being conducted. Screening tests classify individuals with respect to their likelihood of having a particular disease, but they do not diagnose disease. Individuals whose screening tests are positive require further evaluation by subsequent tests to rule in or to rule out the presence of the disease.13, 29 The use and interpretation of diagnostic data, including signs, symptoms, and diagnostic tests, are based on the four principles of decision analysis29, 31: 1. Clinicians should not consider patients as absolutely having a disease but rather as having only the probability of disease. The probability of disease is based on the prevalence of the disease, the patient’s history (including risk factors, symptoms, signs, and previous tests), and the clinician’s previous experience with similar situations. 2. Clinicians use diagnostic tests to improve their estimates of the
90
OAKLEY & BRUNETTE
probability of disease, and the estimate of probability following the test may be lower or higher than the estimate of probability before the test. Tests should be selected by their ability or power to revise the initial probability of disease. 3. The probability that disease is actually present, following a positive or negative test result, should be calculated before the test is performed. Application of this principle results in fewer useless tests being performed. 4. A diagnostic test should revise the initial probability of disease. If the revision in the probability of disease does not alter the planned course of management or treatment, then the use of the test should be reconsidered. Unless the test provides information desired for an unrelated problem, tests that will not alter the planned course of management or treatment should not be performed. Principle 1 states that in the diagnostic context, patients do not have a disease; rather, patients have a probability or likelihood of disease. At the outset, the clinician may assign to the patient a probability of disease that reflects the clinician’s level of confidence that the target disease is actually present. This initial probability may be based on the prevalence (see box) of the disease in the population and may be revised, upwards or downwards, based upon the patient’s history, symptoms, signs obtained from the clinical examination, previous tests, and the clinician’s previous clinical experience with similar situations. If the patient is known to have one or more risk factors for a certain disease, the probability of disease may be increased. Thus, a pretest probability, risk, or likelihood of disease is assigned. Diagnostic tests may then be considered to revise the pretest probability, as per principle 2. That is, by themselves, the measurements, assays or test results do not reflect 100% certainty as to the presence or absence of the disease. Instead, the test results, either positive or negative, are used to revise, upwards or downwards, the initial pretest probability of disease. Moreover, once a test has been carried out, the clinician and patient must accept and deal with the results. That is, the decision that a test provides useful information is independent of the actual result. If the clinician picks and chooses which test results to accept or discard, the clinician opens the door to personal bias and preconceived notions, undermining the principle of objective testing. On completion of the clinical examination, and before further investigations are considered, the clinician may be confident that a particular disease or condition really is present. In that instance, there is no need for further investigations or tests, and management appropriate for the condition should commence without delay. Likewise, if the clinician is confident that a particular disease is not present, further investigation or treatment of that disease is not warranted. These decisions are based on the threshold approach in decision analysis, shown in Figure 1. For each condition or disease, the clinician sets a threshold for testing known
THE USE OF DIAGNOSTIC DATA IN CLINICAL DENTAL PRACTICE
91
Figure 1. Threshold approach to decision analysis: Examples of threshold approach for disease of pulpal pathology and test of periapical radiograph. ZONE A, A patient complains of sensitivity to cold and sweet stimuli. These symptoms are localized to an unrestored tooth with no known history of trauma but with visible cervical abrasion and root recession. Pulpal pathology is most likely absent because root sensitivity caused by exposed dentin is the most probable diagnosis. A radiograph would not be warranted because information obtained from the radiograph would not alter the diagnosis or further management. ZONE B, A patient with a poorly maintained dentition describes intermittent and increasing sensitivity to cold and sweet stimuli and occasional spontaneous discomfort lasting over an hour and requiring analgesics for relief. These symptoms are associated with a heavily restored tooth with subgingival restorative margins. Recurrent caries or pulpitis may be present. A radiograph is warranted because it may provide useful information for diagnosis and further management. ZONE C, A patient describes severe pain with biting pressure and denies sensation to cold stimuli. These symptoms are localized to a molar with visible gross caries. Radiographs are not required for the clinician to arrive at the diagnoses of caries and a nonvital pulp; however, a periapical radiograph is indicated to guide prognosis and further treatment, such as endodontic therapy or extraction.
as the test threshold and a second threshold for treatment known as the test-treatment threshold.29 In general, these cutoff threshold probabilities for ruling in or ruling out a disease depend on the particular disease and the subsequent courses of action or follow-up that relate to either ruling in or ruling out the disease. That is, the consequences of falsepositive and false-negative results must be weighed in each case. If a test is not powerful enough to alter the pretest probabilities so that a positive or negative test result will not alter the pretest planned course of action, the test should not be performed.29, 31 The strategies for defining specific test and test-treatment threshold cutoffs are discussed in greater detail by Sacket et al.29 Three clinical decisions are depicted in Figure 1. In the first instance, the pretest probability of a disease is below the test threshold (Zone A in Fig. 1). The patient is unlikely to have the disease, and even a positive test result would not alter the posttest probability to a level that would justify treatment. Therefore, neither treatment of the disorder nor further testing for the disorder should proceed. For example, multiple yellowish
92
OAKLEY & BRUNETTE
spots and plaques are observed bilaterally on the posterior buccal mucosa of an elderly male patient. The spots and plaques cannot be removed with gentle wiping of a gauze across the mucosal surface. The clinician is confident that Fordyce granules are present and that no pathologic condition is present. Therefore, further investigations such as biopsy or further management or treatment are not indicated. In similar fashion, if the pretest likelihood of disease exceeds the test-treatment threshold (zone C in Figure 1), treatment should proceed without further diagnostic testing. For example, soft white plaques resembling milk curds are observed on the palate and buccal mucosa of an elderly male patient. The plaques may be stripped from the tissue, leaving an intensely erythematous surface with localized bleeding. Oral thrush (candidiasis) is most likely present, and further investigation such as biopsy will not alter the diagnosis or the probable management with antifungal medications. When the pretest probability falls in between the test and testtreatment thresholds, however (zone B in Figure 1), testing is indicated, and treatment should proceed on the basis of the test results. In general, a diagnostic test is most useful when the pretest probabilities fall between roughly 30% and 70%.5, 20, 21 For example, an adherent white plaque is observed on the anterior floor of the mouth and ventral left lateral tongue of an elderly adult male. A pathologic condition may or may not be present. Further investigation such as biopsy is indicated to establish a diagnosis and to direct further management. MEASUREMENT RELIABILITY Measurement reliability refers to the ability to obtain the same measurement consistently over sequential measures. The reliability of a measurement may be affected by three sources of variability: (1) the system or phenomenon being examined, (2) the examination itself, such as the instruments or equipment used and the examination environment, and (3) the examiners.4, 29 Variation in the System or Phenomenon Being Measured Normal biologic variability may be inherent in the phenomenon being measured. For example, blood pressure and pulse fluctuate throughout the day and under different circumstances such as stress, exercise, and body position; hormonal levels fluctuate with the diurnal and menstrual cycles. Moreover, the very act of measurement may influence or alter the phenomenon being measured so that repeated measurements (test-retest) are not reproducible (not reliable). For example, if persons are asked to bend over and touch their fingers to their toes, they may not be able to do so on the first attempt. After several
THE USE OF DIAGNOSTIC DATA IN CLINICAL DENTAL PRACTICE
93
attempts, however, the distance between fingers and toes may decrease. In similar fashion, clinical variables for assessment of TMD such as muscle palpation and assessment of joint sounds may not be stable in the short- or long-term, and they may be altered by repeated palpation or repeated mandibular movements.39 Some phenomena such as blood pressure will demonstrate regression towards the mean by returning to usual levels over time.4 Therefore, evaluation of some phenomena may require several examinations over time before a diagnosis is finalized. Variability from Examination Equipment and Environment In laboratory-based measurements, instruments are typically calibrated against established standards such as those of the American National Bureau of Standards, and the measurements are performed under controlled and specified conditions. The results and variability in these measurements are usually expressed as a standard deviation of the individual values or as confidence intervals around the calculated means.4, 22 It is important to distinguish the reliability of a measurement from the precision of the measurement. The precision of a measurement refers to the exactness or degree of refinement with which a measurement is stated. For example, clinicians may measure the anatomic root length on a radiograph to the nearest half-millimeter with a Boley’s gauge or measure the depth of a periodontal pocket to the nearest millimeter with a periodontal probe. Alternatively, these measurements could be made electronically using tools with more precision, perhaps facilitating measurements to the nearest hundredth of a millimeter. Such a level of precision, however, may not be clinically relevant and would not necessarily translate to higher reliability scores. That is, just because a measurement is precise does not mean that it is reliable. In fact, the inherent variability of the physical attributes associated with many dental conditions is responsible for the inability to attain higher reliability scores. Variability may also originate from the incorrect function or use of measuring devices or instruments. For example, reliable periodontal probing requires the use of a calibrated probe, on correct positioning of the probe, and application of appropriate probing pressure. Variability of Examiners Examiners may be inexperienced or incompetent. Examiners also differ because of biologic variation in the acuity of their senses (e.g., sight, touch, hearing), which may be further affected by their mood and sleep status. Examiners may also replace evidence by inference, potentially increasing the diagnostic error because a hasty inference may close a clinician’s mind to other diagnoses.29 For example, a middle-aged
94
OAKLEY & BRUNETTE
female patient describes symptoms of constant aching, throbbing pain that began shortly after a recent lengthy dental appointment. The patient localizes the symptoms to the right submandibular region, right mandibular angle, and the right mandibular molar teeth. The dentist recalls the recent restoration of extensive caries on the mandibular right first molar. No radiographic abnormalities are detected, but irreversible pulpitis is diagnosed, and lengthy endodontic therapy is completed. Unfortunately, the patient returns the following day with increased bilateral pain of the mandibular molar teeth, restricted interincisal opening, and pain radiating from the mandibular molars bilaterally along sides of the face to the preauricular and anterior temporalis regions. Temporomandibular disorders, including referred pain from the masseter muscles to the mandibular molars, are subsequently diagnosed. In this example, the clinician jumped to the conclusion that the initial symptoms were of odontogenic origin and failed to consider the common alternative of referred pain from the masticatory muscles to the teeth.23 A clinician’s diagnosis may also be affected by the mind set; that is, clinicians tend to diagnose what they expect or hope to find.29 For example, when pathologists reach a diagnosis, they may be influenced by factors other than the histomorphology of the tissue on the slide. Schwartz et al33 suggest that the pathologist’s knowledge of the patient’s clinical presentation may be considered and incorrectly weighted in reaching a diagnosis, so that the clinical data are double counted. If the pathologist knows that a biopsy specimen has been obtained from an area of erythroleukoplakia on the floor of the mouth of a heavy smoker and alcohol drinker, the suspicion of malignancy is raised even before the slide is placed on the microscope stage.9, 19 In such instances, the dysplasia or carcinoma may be unconsciously graded as more severe than if the clinical information were not available to the pathologist.33 Specific biologic assays do not exist for all diseases, and investigators may need to make judgments using criteria that are not very specific or make judgments about subject characteristics that are difficult to evaluate. Because there are no absolute standards, the best that can be done is to determine if the investigators are consistent in their judgments. That is, performance review of the clinician investigators focuses on the likelihood that repeated examinations of the same, unchanged patient by either the same clinician or other clinicians yield identical results. Comparisons may be made in which the same investigator examines the same subjects two or more times (intraexaminer reliability) or in which different investigators examine the same subjects (inter-examiner reliability). Interobserver variability is minimized when the endpoints are well defined and quantifiable, such as measuring the anatomic root length on a periapical radiograph or measuring overbite or overjet on study models. Interobserver variability is greater when criteria are vague and subjective, as in the clinical diagnosis of TMD24, 39 or histologic diagnosis of dysplasia.17, 18, 26
THE USE OF DIAGNOSTIC DATA IN CLINICAL DENTAL PRACTICE
95
Reliability measures provide information only about how well the examiners agree, not about whether the conclusions are correct. Inter- and intraexaminer reliability have been quantitated by such measures as the Pearson correlation coefficient, intraclass correlation coefficient (ICC), or the kappa statistic () (Table 1). For more details, readers are directed to the text by Norman and Streiner.22
Correlation Coefficients The correlation coefficient, more properly called the Pearson product moment correlation coefficient, is used with continuous data. It is based on the extent to which the relationship between two variables can be described by a straight line called the regression line. The Pearson correlation coefficient, r, is a measure of the strength of the relationship between two sets of data. The strongest positive correlation has a value of 1.0, no relationship is indicated by 0, and perfect negative correlation has a value of 1.0. Thus, correlation coefficients with values closest to 1.0 demonstrate the greatest relationship between sets of data, but perfect agreement occurs only when the regression line has a slope of 1; that is, the points fall along the line of equality. In regression analysis, the square of the correlation coefficient, r2, is known as the coefficient of determination, which is, in effect, the fraction or proportion of the total variance in the dependent variable that can be explained by the relationship between variables; r2 tends to overestimate the true reliability. In general, r values will be higher than, and overestimated in comparison with, the more theoretical sum reliability calculated by the intraclass correlation coefficient (ICC). Bland and Altman3 have discussed the problems with use of correction coefficients and have developed an alternative method for assessing agreement between two methods of clinical measurements based on graphic techniques.
Intraclass Correlation Coefficient The ICC is generally derived from analysis of variance calculations. Intraclass correlation coefficient values can range from 0 to 1.0. Unlike r, the ICC value indicates what proportion of the total observed variability is caused by variability among the subjects as compared with variability among the examiners. If most of the variability results from discrepancies among examiners, the ICC values are low. Alternatively, if the examiners are reliable (consistent) among themselves, ICC values are high (e.g., between 0.75 and 1.00), and in effect one examiner could be replaced with another.8 The ICC values may be interpreted in a manner similar to scores, which are more commonly used.
96 Table 1. RELIABILITY OF SOME MEASUREMENTS/TESTS USED IN DENTISTRY Correlation Coefficient Test Periodontics Probing depth, general Plaque Temporomandibular disorders Temporomandibular joint sounds—manual palpation Trained examiner Untrained examiner Temporomandibular joint sounds—stethoscope Trained examiner Untrained examiner Mandibular kinesiology Maximal pain-free vertical opening Trained examiner Untrained examiner Dental radiology Caries, calibrated examiner Periodontal disease, calibrated examiner Degenerative temporomandibular joint changes on tomography Disk displacement on MR imaging Oral pathology Diagnosis of dysplasia Grading of oral leukoplakia from no dysplasia to carcinoma in situ
Reference Number
Interobserver
Intraobserver
5b, 10a 6b, 11a
0.63 0.81
0.72 0.32
Intraclass Correlation Coefficient
Kappa Interobserver
Intraobserver
0.26 0.22
8 8
0.68 0.35
0.62 0.30
8 8
0.26 0.32
0.61 0.35
8, 10b 0.89 8
0.90 0.72
36a 36a 5a
0.73 0.80 0.47–0.80
24
0.70
1 16
0.30–0.63
0.29–0.48 0.27–0.45
0.80 0.79 0.58–0.79
0.05–0.49
THE USE OF DIAGNOSTIC DATA IN CLINICAL DENTAL PRACTICE
97
Kappa Scores The best approach in evaluating reliability for noninterval data is the statistic, which adjusts for the degree of agreement expected by chance. For a perfect association, 1.0, and for no association 0. Qualitative interpretation in relation to values vary,16, 29 but Brunette4 suggests that values below 0.4 indicate poor agreement, values of 0.4 to 0.75 are fair, and values of 0.75 to 1.0 are excellent. A rule of thumb is that clinical studies should not proceed before investigators have been trained and calibrated with demonstrated high scores (e.g., ⬎ 0.6). Table 1 lists the reliabilities of some measurements and tests used in dentistry and illustrates the differences between correlation coefficients and scores. For example, the interexaminer correlation coefficients for probing depths and plaque assessment are 0.63 and 0.81, respectively; in contrast, the interexaminer scores are only 0.26 and 0.22! In similar fashion, Abbey et al1 calculated correlation coefficients and scores for six pathologists whose agreement between their original sign-out diagnoses of dysplasia and subsequent reexaminations of the same slides were compared. Correlation coefficients averaged 0.50; intraexaminer scores ranged from 0.05 to 0.49. In the same study, interexaminer scores for the presence or absence of dysplasia ranged from 0.29 to 0.48. MEASUREMENT VALIDITY AND THE REFERENCE STANDARD Measurement validity refers to the truthfulness of the measurement or technique. In other words, whether the measurement measures what it claims to measure. The determination of measurement validity requires a comparison of the measurement or technique with a reference measure or technique that has been accepted as true and is the acknowledged standard, at the time, for definitive diagnosis of the disease or condition. The principle of measurement validity is crucial to clinical measurements because even if a measurement is highly reliable, the measurement has no diagnostic value if that measure does not accurately reflect the characteristic of interest. For example, a clinician may reliably measure the anatomic root length of an incisor on a periapical radiograph. If however, the bisecting angle technique rather than the parallel technique was used for exposure of the radiograph, the measured root length may not be a true or valid representation of the anatomic root length. The classification of disease is traditionally based on pathologic anatomy,40 and therefore the histopathologist’s diagnosis is typically regarded as the reference standard. Performing the reference test of autopsy or histopathologic examination is not always feasible, however, because obtaining a specimen is generally an invasive procedure that may also be risky, expensive, and often impossible to perform in a timely
98
OAKLEY & BRUNETTE
manner. Not all body sites are as readily accessible for biopsy and histologic examination as the oral soft tissues. Therefore, surrogate parameters such as biologic assays or measurements are used as the standard for comparison. For example, in the case of bovine spongiform encephalopathy and its human variant, Creutzfeld-Jacob disease, autopsy is both the reference standard and the only reliable and valid diagnostic tool at this time. If valid and less invasive laboratory techniques were available, earlier diagnosis of the disease would be possible. The assumed benefit of earlier disease detection, such as through screening tests, must be tempered with the possibility that for some diseases earlier detection is unlikely to improve the prognosis. The early detection of disease is assumed to be beneficial, because treatment initiated before the onset of symptoms is assumed to be more effective than later treatment and thereby the development of disease may be reduced or eliminated. For some conditions, such as Creutzfeld-Jacob disease, there is no effective treatment at this time; hence, the earlier diagnosis of some conditions must be weighed against the overall risks and benefits for the individual and society.13 Widmer39 reviewed the measurement validity of TM joint imaging techniques to anatomy. Arthrography demonstrated an 84% true correlation to anatomy,37 MR imaging had a 73% to 85% true correlation,7 and tomography had a 63% to 85% true correlation to anatomy.12 Widmer39 also reviewed the measurement validity of TM joint sounds by palpation and stethoscope in an arthrographic examination of asymptomatic subjects. Assessment for TM joint sounds by manual palpation revealed that 15% of silent joints had disk displacement.37 Joint sound assessment by stethoscope revealed 14% of silent joints with disk displacement.32 These results demonstrate that disk displacements may be present in the absence of joint sounds and that the presence of joint sounds may not offer a valid assessment of disk displacements. DIAGNOSTIC VALIDITY AND THE REFERENCE STANDARD Biologic assays do not exist for all disorders, and for some diseases and conditions, a real or practical reference standard does not exist. For example, biologic assays for TMD and fibromyalgia do not exist, and there is no reference standard for the measurement of active periodontal disease. Instead, clinicians use measurements such as probing and attachment levels, which are cumulative indices reflecting the history of disease (in this case, attachment loss) rather than the presence of active disease.4 In similar fashion, the diagnosis of fibromyalgia relies on the key clinical feature of decreased pain threshold as manifested by tenderness at 18 specified anatomic locations. Widmer39 distinguishes measurement validity from diagnostic validity, which is the extent to which diagnostic criteria can be used to classify persons as to the absence or presence of a disorder in regards to the
THE USE OF DIAGNOSTIC DATA IN CLINICAL DENTAL PRACTICE
99
current reference standard classification system. That is, in the absence of reference standard based on histopathologic or biologic assays, a general nominalistic impression of the diagnostic usefulness of each measure is gained through diagnosis of the presence or absence of the disorder among individuals already known either to have or not to have the disorder of interest. For example, for fibromyalgia, the diagnostic validity of tenderness to muscle palpation is evaluated by the ability of this measurement technique to distinguish between individuals known to have or known not to have fibromylgia. In the future, if laboratory findings are linked to fibromyalgia, this new measurement or test approach must also be assessed for its ability to distinguish between individuals known to have or not to have fibromylgia. Thus, the relative diagnostic abilities of the existing method of muscle palpation and the new laboratory finding can be compared; the more successful method would be regarded as the reference standard until another new test proves superior. TEST CHARACTERISTICS Traditionally, a new test, measurement, or technique is evaluated in a sample of patients identified by the existing reference standard either to have or not to have the disease of interest. A general impression of the diagnostic strengths of a measure, test, or technique may then be obtained from characteristics or parameters of the test. Test characteristics are mathematical probabilities that are calculated by direct comparison between a test, measurement, or technique and the reference standard in a 2 2 contingency table (Figs. 1–2; Box). Summary statistics such as sensitivity, specificity, and predictive values aid in the comparison and analysis of different tests. Test accuracy is a measure of the agreement between the test and the reference standard, but, as discussed
Figure 2. Contingency comparison between gold standard and new test. For example, for the disease of caries, the gold standard is histologic examination, and a new test for diagnosis of caries may be direct digital radiography.
100
OAKLEY & BRUNETTE
Definitions of and Calculations for Test Characteristics Accuracy
is the overall agreement between the test and the reference gold standard. Accuracy may be calculated from a 2 2 contingency table as shown in Figure 2 by the formula ad abcd
Sensitivity
is the proportion of diseased individuals correctly identified by the test. Sensitivity is also known as the true positive rate and may be calculated from a 2 2 contingency table as shown in Figure 2 by the formula a ac
Specificity
is the proportion of non-diseased individuals correctly identified by the test and is also known as the true negative rate. Specificity may be calculated from a 2 2 contingency table as shown in Figure 2 by the formula d bd
Prevalence (P)
is the overall probability or risk that the disease is present before the test and is also known as the pretest likelihood. Prevalence is the proportion of individuals in a population who have the disease at a specific point in time. Prevalence in a specified population may change over time, and prevalence may change if the definition of the disease changes. Prevalence may be calculated from a 2 2 contingency table as shown in Figure 2 by the formula ac abcd
Post-test Likelihood of a Positive Test (PTL)
is also known as the positive predictive value. For an individual with a positive test result, PTL() is the probability that the disease is actually present. The PTL() may be calculated from a 2 2 contingency table as shown in Figure 2 by the formula a ab
THE USE OF DIAGNOSTIC DATA IN CLINICAL DENTAL PRACTICE
101
When the sensitivity, specificity, and prevalence or pretest likelihood are known, PTL() may be calculated by the formula P LR() PTL() (1.0 P) P LR() where LR() Post-test Likelihood of a Negative Test (PTL[])
sensitivity true positive false positive 1.0 specificity
For an individual with a negative test result, PTL() is the probability that the disease is actually present. The PTL() may be calculated from a 2 2 contingency table as shown in Figure 2 by the formula c cd When the sensitivity, specificity, and prevalence or pretest likelihood are known, PTL() may be calculated by the formula PTL()
P LR() (1.0 P) P LR()
where LR() Negative Predictive Value (NPV)
1.0 sensitivity false negative true negative specificity
For an individual with a negative test result, the probability that disease is really absent. The NPV may be calculated from a 2 2 contingency table as shown in Figure 2 by the formula d cd
in the section on likelihood ratios, accuracy is not the sole measure or guarantee of a test’s clinical usefulness. Sensitivity is the proportion of individuals who are correctly identified as having the disease. Specificity is the proportion of individuals who are correctly identified as nondiseased. Table 2 illustrates the sensitivities and specificities of some diagnostic tests used in dentistry. Sensitivity and specificity are typically calculated in defined populations in which the disease status of the individuals is already known and confirmed by the reference standard and in which only extremes of disease (the very sick) and health (the very healthy) are represented. As discussed later, these circumstances do not represent the true clinical situation. If the clinician already knew the disease status of a patient,
102
OAKLEY & BRUNETTE
Table 2. SENSITIVITIES, SPECIFICITIES, AND LIKELIHOOD RATIOS OF SOME DIAGNOSTIC TESTS USED IN DENTISTRY Reference LR LR Number Sensitivity Specificity ()* ()†
Test Caries Clinical examination Bite-wing radiographs Periodontics Gingival redness Plaque Bleeding on probing (2 mm, 5/6 threshold) Temporomandibular joint disorders Temporomandibular sounds—manual palpation—single click Disk displacement on MR imaging Degenerative changes on sagittal tomography *LR () is calculated by
sesnsitivity 1.0 specificity
†LR () is calculated by
1.0 sensitivity specificity
36b 21a
0.13 0.73
0.94 0.97
2.2 24.3
0.93 0.28
11a 11a 18a
0.27 0.47 0.29
0.67 0.65 0.88
0.82 1.3 2.4
1.09 0.82 0.81
7a
0.43
0.75
1.7
.76
36a
0.86 0.47
0.63 0.94
2.3 7.8
.22 .56
there would be no need for further investigation. Instead, the clinician is typically confronted with equivocal cases among a population of healthy and diseased individuals. The difference between the diagnosis for presence or absence of disease or abnormality depends on the selection of cutoff points. Changes in activity or level of any physiologic, biochemical, or molecular marker are typically reflected by continuous measures. In contrast, the presence or absence of an abnormality or disease is typically a dichotomous diagnosis, such as normal versus abnormal or health versus disease, on occasion gradations of abnormalities are also used, such as mild, moderate, or severe dysplasia, and hypertension, which is classified as stage 1 to stage 4. Continuous measures may be collapsed to dichotomous data by the selection of cutoff points. For example, individuals exhibit a wide range of pain-free unassisted vertical and horizontal mandibular movements, and these mandibular kinesiology measurements are used as diagnostic criteria for TMD.39 If the cutoff point between non-TMD (health) and TMD (disease) is arbitrarily set at interincisal opening of 40 mm, then theoretically, the patient with a 39-mm opening is eligible for diagnosis of TMD, but another patient with a 41mm opening is diagnosed as non-TMD. Alternatively, if the cutoff point between non-TMD and TMD is set instead at 35 mm, the same patient with a 39-mm opening would be excluded from diagnosis of TMD. In similar fashion, the number of specified muscle sites that are tender to
THE USE OF DIAGNOSTIC DATA IN CLINICAL DENTAL PRACTICE
103
palpation and the number and type of TM joint sounds will affect the proportions of individuals diagnosed with TMD.39 Ideally, the selection of a cutoff point should be based on what is best for the patients concerned, and the consequences of over- and underdiagnosis must be considered. If the condition is innocuous and neither shame nor anguish is associated with the diagnosis (for example, the diagnoses of linea alba or the common cold), then the cutoff for classification as diseased may be relaxed. Conversely, if there is no advantage in early diagnosis, a positive diagnosis has the potential to produce anxiety in the patient, and there is no effective treatment, the cutoff for disease should be set high (ⱖ 99%) to exclude the nondiseased.29 The selection of the cutoff point will determine the proportion of true-positive, false-positive, true-negative, and false-negative results, which, in turn, will produce different estimates of the sensitivity and specificity of the diagnostic test (see box). A perfect test will yield only true-positive and true-negative results without any overlap or falsepositive or false-negative result (Fig. 3). Criteria for Selection of Test Thresholds Low Threshold • selected if it is important that all individuals with the disease or its progression are detected • provides high sensitivity and high PTL() • results in increased number of false-positive results because of the low specificity • is useful for screening for serious or life-threatening disease but confirmation testing is required (e.g., dentists perform screening examinations for high blood pressure or for oral cancer in patients who are asymptomatic for these diseases.) High Threshold • limits the number of false-positive results • is required for confirmation testing • results in high specificity but lower sensitivity. High-specificity values are important for diseases that are not life-threatening such as TMD. High specificity excludes individuals without the disease from pursuing unnecessary, irrelevant, and possibly invasive, irreversible, and expensive treatment. In general, if a low threshold is selected, the sensitivity is increased, and the specificity is decreased; a high threshold results in high specificity but lower sensitivity. High sensitivity is desirable for screening tests. High specificity is required for exclusionary tests to minimize the number of false-positive results. The highest possible sensitivity and specificity are desirable for confirmatory tests to minimize both false-positive and false-negative results. Unfortunately, high sensitivity and high specificity are rarely found in a single test.
OAKLEY & BRUNETTE
No. of Individuals
104
TNF TPF
No. of Individuals
A
B
Parameter
TNF
TPF
Parameter
Figure 3. Hypothetical distribution of healthy (true positive fraction [TPF]) and diseased (true negative fraction [TNF]) populations. Test results yield different estimates of sensitivity and specificity A, Hypothetical perfect test with 100% sensitivity and 100% specificity. The diseased (TPF, dashed line) and healthy (TNF, solid line) individuals are identified without false negative (FNF) or false positive (FPF) fractions. B, Hypothetical useless test. The diseased and healthy populations are not identified by the test.
Receiver Operating Characteristic Analysis One of the best methods to evaluate the effect of different cutoff points is the receiver operating characteristic (ROC) analysis (Fig. 4). An ROC analysis plots the true-positive fraction (sensitivity) as a function of the false-positive fraction (1.0 specificity), and points along the curve can be used to determine the effect of different thresholds for the test. Selection of points towards the left of the curve yields higher specificity, and points to the right yield higher sensitivity. An ROC analysis also permits the comparison of different tests without any selection of upper or lower reference limits or any particular sensitivity or specificity. It is widely agreed that ROC curves are independent of the disease prevalence and therefore reflect the true performance of the diagnostic tests.11, 29 In clinical practice, the selection of cutoff points is determined by several factors, including mortality and morbidity of the disease, the
THE USE OF DIAGNOSTIC DATA IN CLINICAL DENTAL PRACTICE
105
No. of Individuals
Cut-Off #1
FNF
TNF
No. of Individuals
C
FPF
TPF
Parameter
Cut-Off #2 FNF TNF
D
FPF
TPF
Parameter
Figure 3 (Continued). C and D, Hypothetical typical test with overlap of healthy and diseased populations. The selection of the cut-off point to distinguish between healthy and diseased individuals affects the proportion of the FNF and FPF. Sensitivity and specificity are affected by the selection of the cut-off point. In C, the cut-off point is located further to the right than the cut-off point in D. Therefore, the FPF in C is smaller than the FPF in D. Conversely, the FNF in C is larger than the FNF in D.
consequences of over- and undertreatment, and the cost and time required to perform the diagnostic test. Once test thresholds are established, sensitivity and specificity are considered to be stable properties of the test because they are apparently not affected by the prevalence of the target disease. Some evidence, however, indicates that sensitivity and specificity do change from one clinical population to another,14, 15 especially if the stage of disease varies in different groups of patients.11, 24 The Effects of Prevalence Sensitivity (true-positive rate) and specificity (true-negative rate) are measures of how well the test correctly identifies diseased and healthy individuals, respectively. Sensitivity and specificity do not provide the clinician with any information about whether the test will provide meaningful diagnostic information for individuals whose disease status is not known. Hence, the predictive values (see box on pages 100–101) of a test are required to provide information about how often a test will
106
OAKLEY & BRUNETTE
Figure 4. Receiver operating characteristics (ROC) curves plot the TPF (sensitivity) against the FPF (1.0-specificity). ROC curves permit selection of the threshold or cut-off point that provides the best combination of sensitivity and specificity scores. The most discriminating tests cluster in the upper left-hand corner, and the most discriminating test has the greatest area under its ROC curve. ROC curves also permit the comparison of tests without selection of reference limits or sensitivity and specificity. For example, this figure compares the ROC curve for conventional radiographic film evaluation of artificial cortical bone lesions, produced with a size 6 burr in dried mandibles (bulleted line) with the ROC curve for conventional radiographic film evaluation of in vivo periodontal crestal alveolar bone loss (dashed line). In this example, the area under the ROC curve for the detection of in vitro cortical lesions is larger than the area under the ROC curve for the in vivo detection of periodontal crestal bone loss. As expected, conventional radiographic evaluation of in vitro artificial cortical lesions is more discriminating or a more powerful test than conventional radiographic evaluation of in vivo crestal bone loss. Solid line ROC curve of noise or a hypothetical useless test. (Dashed line, Data from Nummikoski PV, Steffensen B, Hamilton K, et al: Clinical validation of a new subtraction radiography technique for periodontal bone loss detection. J Periodontol 71:598–605, 2000; Bulleted line, Data from Paurazas SB, Geist JR, Pink FE, et al: Comparison of diagnostic accuracy of digital imaging by using CCD and CMOS-APS sensors with E-speed film in the detection of periapical bony lesions. Oral Surg Oral Med Oral Pathol Oral Radiol Endod 89:356–363, 2000.)
provide a correct diagnosis in a mixed population. Three predictive values may be calculated: (1) positive predictive value, (2) negative predictive value and (3) posttest likelihood of a negative test. The positive predictive value is also known as the post-test likelihood of a positive test (PTL[]). For a patient who has undergone a diagnostic test and obtained a positive test result, PTL() is the probability that disease is actually present. When a negative test result is obtained, the probability that disease is truly absent is known as the
THE USE OF DIAGNOSTIC DATA IN CLINICAL DENTAL PRACTICE
107
negative predictive value. For a patient with a negative test result, the clinician may need to know the probability that disease is actually present; this probability is known as the post-test likelihood of a negative test (PTL[]). Although a negative result will reduce the probability of disease being present, typically it will not absolutely eliminate this possibility. The predictive values of a test vary widely as the prevalence of the disease changes.11, 29 Prevalence is also known as the pretest likelihood, and it is the overall probability or risk that disease is present before the test is administered. For example, toluidine blue has been advocated for the detection of oral squamous cell carcinoma (SCC). The sensitivity of toluidine blue ranges from 93.5% to 97.8%, and its specificity ranges from 73.3% to 92.9%.28 The predictive values of toluidine blue and the conclusions provided by this test will vary, however, depending on the individual patient to whom or the population in which the test is applied. The prevalence of SCC in the general population has been estimated at 3%,25 and therefore the posttest likelihood of a positive toluidine blue test in the general population is only 6%.10 In contrast, the prevalence of SCC, either as primary or recurrent disease, is greater in a tertiary care center for oral SCC, where prevalence estimates range from 26%25, 34 to 33%.10 Consequently, the posttest likelihood of a positive test in a tertiary care center is also greater (51%).10 In the high-prevalence setting, the posttest likelihoods of the tests are considerably higher than the pretest probabilities, meaning that there is a considerably increased probability that the disease is actually present. In contrast, the posttest likelihoods of the same test in the general population (low-prevalence setting) are similar to the pretest probabilities, meaning that there is only a slight increase in the probability that the disease is actually present. Nevertheless, the significance of each positive and negative test must be evaluated on an individual basis by the clinician, who must then decide the subsequent course of action. The example with the toluidine blue test demonstrates that even a test with high sensitivity (93.5%–97.8%) and specificity (73.3%–92.9%)27 can yield low predictive values when the prevalence (or pretest likelihood) is low. Sacket et al29 further illustrate this point using a theoretical test with 95% sensitivity and 95% specificity under conditions of variable prevalence. For example, as the prevalence changes from 99% to 1%, the PTL() changes from 99.99% to 16%, respectively. Thus, even a test that has excellent specificity and sensitivity will produce a low likelihood of disease being present if it is applied to an individual in a population in which the initial pretest prevalence is low. The choice of a particular test for a specific disease is determined by the power or ability of the test to revise the pretest probabilities, either upwards to rule in the disease, or downwards to rule out the disease. In general, for a test with a sufficiently high sensitivity, a negative result rules out the disease. In contrast, for a test with a sufficiently high specificity, a positive result rules in the disease.29 In
108
OAKLEY & BRUNETTE
other words, the clinician relies on pattern recognition: ‘‘if it looks like a duck, quacks like a duck, and waddles like a duck, it probably is a duck.’’ Different tests for the same disease can be used in combination, either in series, such as screening testing followed by confirmation testing, or in parallel. In series testing • tests are used in succession • if tests A and B are used in series, then either test A or test B can be used first • a positive result on the first test requires testing with the second test • is less sensitive in detecting disease than parallel testing, but series testing has greater specificity and is more efficient at confirming the presence of disease • is used in confirmation testing In parallel testing • tests are performed concurrently • if tests A and B are used in parallel, a positive result requires positive results for either test A or test B a negative result requires that both test A and test B are negative • is more sensitive than series testing for detecting disease but less efficient at confirming presence of disease At health fairs, clinician dentists may perform screening tests for oral cancer through a careful visual inspection of the oral soft tissues or a screening test for TMD by evaluating the patient’s range of pain-free mandibular movements. With positive results of suspicious oral lesions or a restricted range of jaw movements and associated discomfort, the patients would be referred to their own dentists or to specialists for possible oral biopsy or more detailed TMD evaluation including assessment of joint sounds and TM joint and neck and masticatory muscle tenderness. When several tests are used in sequence, the posttest likelihood of disease after the first test is used as the pretest likelihood for the subsequent test. A possible problem with this approach is the propagation of errors, because each test can be considered as having some associated error. Therefore, as more tests are performed, the precision of the probability estimate will decline. The posttest probability of disease may also be distorted by the end of the test sequence if the clinician assumes that the tests are independent when the test results are actually dependent. That is, the test result on one test or measure may affect the characteristics of the second test, a phenomenon termed concordance or convergence.29 Concordance occurs when patients who are positive on one of the paired tests are likely to be positive on the other one as well, or when patients who are negative on one test are likely to be negative on the other one. For example, the electric pulp-stimulation test is much
THE USE OF DIAGNOSTIC DATA IN CLINICAL DENTAL PRACTICE
109
more likely to be positive when the thermal (cold) test is positive (i.e., the patient reports sensation upon cold stimulation of the tooth) than when the thermal test is negative (the patient denies sensation to cold stimuli). Conversely, teeth with negative results on one test (either the cold or electric pulp test) are also likely to be negative on the other. Concordance results in an overestimate of disease likelihood. Sacket et al29 suggest that for short courses of two or three diagnostic tests, convergence is not a serious problem but should be considered. For example, concordance was observed between the use of toluidine blue and visual clinical examination of patients in an oral cancer tertiary care center by a trained and experienced clinician.10 That is, oral lesions that were classified as suspicious or positive by one these methods were likely to be positive on the other method as well. When the results of both the visual clinical examination and toluidine blue were positive, the pretest likelihood of 33% was raised to a posttest likelihood of 54%, which is greater than the PTL() obtained by either toluidine blue application alone (51%) or the visual clinical examination alone (44%).10 A PTL() of 54% calculated with consideration of concordance is a lower but more realistic value than the PTL() of 62% that is calculated if the tests are used sequentially and assumed to be independent. LIKELIHOOD RATIOS AND NOMOGRAMS Principles three and four of the diagnostic decision analysis require that the interpretation of possible test outcomes precede the ordering of the test and that testing should proceed only if the subsequent management of the patient will be altered as a result of the test result. How can this interpretation be accomplished? If the sensitivity and specificity of a particular test and the prevalence of the disease of interest are known, the post-test likelihoods of a positive and negative test can be calculated from the formulas for PTL() and PTL() shown in the box on pages 100–101. These calculations use likelihood ratios that ‘‘express the odds that a given level of a diagnostic test result would be expected in a patient with (as opposed to one without) the target disorder.’’29 Sensitivity and specificity are probability statements, and they may be converted to odds ratios, which are the ratio of two probabilities. Probabilities and odds contain the same information but convey it differently. Thus, a probability of 50% means even odds of 1:1. Likelihood ratios provide a measure of a test’s ability to revise the pretest probabilities, and they are simple to calculate from the sensitivity and specificity of the particular test. Although sensitivity and specificity are used to calculate the likelihood ratios of a test, it is the likelihood ratios, not sensitivity and specificity, that provide information as to the potential power of the test. As a rule of thumb, if the sum of a test’s sensitivity and specificity is unity (1.0), the test is useless: the likelihood ratios of the test are also unity (1.0), and therefore the test has no power to revise the pretest probability. In general, power-
110
OAKLEY & BRUNETTE
ful tests for revising pretest probabilities of disease have positive likelihood ratios with values greater than 10 and negative likelihood ratios less than 0.1. Likelihood ratios offer diagnostic advantages in that they are less susceptible than sensitivity or specificity to changes in the prevalence or pretest probability of the disease.29 Likelihood ratios may also be calculated for dichotomous levels of disease and for several levels of the test result. The product of the likelihood ratio for the diagnostic test result and the pretest odds for the target disorder yields the posttest odds for the target disorder.29 A convenient method for rapidly calculating posttest probability of disease is offered by the use of likelihood ratios for the test and nomograms. Nomograms (Fig. 5)30 offer a convenient and fast alternative to the calculation of posttest likelihoods using the formulas shown in the box on pages 100–101. Table 2 illustrates the sensitivities, specificities, and likelihood ratios of some diagnostic tests used in dentistry. Figure 6 demonstrates use of the nomogram in the diagnostic decisions for three examples of potential interproximal caries (the disease) and use of bitewing radiographs (the test). In each case, the clinician detects a small area of discoloration on the distal aspect of the maxillary second bicuspid but is not able to engage the explorer interproximally. For the disease of caries, the clinician has assigned a test threshold of 30% and a testtreatment threshold of 65% (Figs. 1, 6). Patient A is an adolescent female who aspires to a career in modeling with an unrestored permanent dentition. Patient A practices excellent oral hygiene and is compliant with twice-yearly prophylaxis appointments. Bitewing radiographs taken 2 years ago at the completion of orthodontic treatment do not reveal any abnormalities. The clinician assigns a pretest probability for caries of 1%. The clinician’s pretest probability is located well below the test threshold of 30%, and therefore radiographs would not be indicated. In the unlikely event that radiographs (the test) were performed with a positive test result, the probability of caries or PTL() can be calculated to be 20%. Despite this positive test result, no further tests or restoration would be indicated, because this probability is still less than the test threshold of 30%. If the test results were negative, PTL() can be calculated to be 0.4%, effectively ruling out the presence of caries. Patient B is a young adult male with a moderately restored posterior dentition. Patient B is a pastry chef apprentice who demonstrates poor oral hygiene and poor compliance with recommended dental recall and prophylaxis appointments. The patient was last seen 3 years ago when bitewing radiographs revealed no sites of interproximal caries in the posterior mandibular dentition. The clinician assigns a pretest probability of 50% to the presence of caries. This pretest probability is located between the test and test-treatment thresholds; therefore, bitewing radiographs are indicated. With a positive test result, treatment is indicated, but a negative test result rules out the disease and treatment. Patient C is an elderly patient with a heavily restored dentition and
THE USE OF DIAGNOSTIC DATA IN CLINICAL DENTAL PRACTICE
111
Figure 5. Nomograms have converted pre- and post-test odds to their corresponding probabilities. To use the nomogram, a straightedge is used to align the pretest probability (left column) with the likelihood ratio (center column) of the test being used. The post-test probability is revealed by reading across the straightedge to the right-hand column on the nomogram. (Data from Fagan, TJ: Nomogram for Bayes’ theorem [letter]. N Engl J Med 293:257, 1975; Sacket DL, Richardson WS, Rosenberg W, et al: Evidence-Based Medicine: How to Practice and Teach EBM. New York, Churchill Livingstone, 1997, p 127.)
112
OAKLEY & BRUNETTE
Figure 6. Diagnostic decisions for bitewing radiographs for three patients with possible caries (the disease). Patient A, By aligning the straightedge at 1% in the pretest probability column with 24 in the likelihood ratio column, the post-test probability of caries being present is raised to about 20%—a value well below the test-treatment threshold of 65% and below the test threshold of 30%. Despite a positive test result, no further tests or restoration are indicated, and the clinician may feel confident about merely observing the tooth. When the pretest probability of 1% is aligned with the likelihood ratio (LR) of a negative test result (0.28), the post-test probability of disease has been further reduced to about 0.4%, effectively ruling out the presence of caries. Patient B, The pretest probability of 50% is located between the test and test-treatment thresholds. Radiographs are indicated. Post-test likelihood of disease (PTL[]) is raised to 92% and treatment is indicated. PTL() is reduced to 18% and treatment is not indicated. Patient C, The clinician recognizes that the 95% pretest probability exceeds the established test-treatment threshold; bitewing radiographs are not required for diagnosis and test results would not alter the proposed management (restoration of the tooth). Even a negative test result (no radiographic evidence of caries) would still result in an 80% post-test probability of caries being present. Although 80% is a lesser probability of disease than 95%, it still exceeds the testtreatment threshold and is probably not low enough to change the planned management. LR() 24; LR() 0.28 (see Table 2); test threshold 30%; test-treatment threshold 65% (see Figure 1); see Figure 4 for nomogram.
THE USE OF DIAGNOSTIC DATA IN CLINICAL DENTAL PRACTICE
113
recent past history of recurrent and new caries. Patient C is disabled with rheumatoid arthritis and is xerostomic with poor oral hygiene although she is a compliant patient. The clinician assigns a pretest probability for caries at 95%, and treatment is indicated without further diagnostic testing. That is, radiographs are not required to establish the diagnosis of caries in this case, although radiographs may provide useful information to guide treatment of the caries or the diagnosis or treatment of other pathologic conditions. For patient C, even a negative test result would still result in an 80% posttest probability of caries being present and requiring treatment. This case illustrates that clinicians must be careful not to overestimate the meaning of negative test results when, in fact, the probability of disease is high. SUMMARY This article has briefly introduced the dental clinician to the principles and practical application of diagnostic decision analysis. There are trade-offs and uncertainties in the process of arriving at a diagnosis, but they can be understood and controlled. First, the clinician must understand the significance of disease prevalence and assign to the patient an initial probability of disease being present. The clinician must then determine if further diagnostic measurements or tests are warranted. If so, the appropriate test must be selected, based on the ability of the test to revise the initial pretest probability. When a diagnostic test is positive, the clinician must know the probability that disease is actually present. The clinician must also know the probability that disease is actually present if the test result is negative. The astute clinician will calculate the posttest probabilities before proceeding with a test and will base treatment decisions on test results in accordance with predetermined test and test-treatment thresholds. ACKNOWLEDGEMENTS The authors are grateful to David Perizzolo for formatting the digital figures, to Lesley Weston for her careful editing, and to Dr. Babak Chehroudi for his critical review.
References 1. Abbey LM, Kaugars GE, Gunsolley JC, et al: Intraexaminer and interexaminer reliability in the diagnosis of oral epithelial dysplasia. Oral Surg Oral Med Oral Pathol Oral Radiol Endod 80:188–191, 1995 2. Beck JD: Issues in assessment of diagnostic tests and risk for periodontal diseases. Periodontology 2000 7:100–198, 1995 3. Bland JM, Altman DG: Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 8:307–310, 1986 4. Brunette DM: Critical Thinking. Understanding and Evaluating Dental Research. Chicago, Quintessence Publishing Co, 1996
114
OAKLEY & BRUNETTE
5. Choi BCK, Jokovic A: Diagnostic tests. J Can Dent Assoc 62:6–7, 1996 5a. Cholitgul W, Petersson A, Rohlin M, et al: Diagnostic outcome and observer performance in sagittal tomography of the temporomandibular joint. Dentomaxillofacial Radiology 19:1–6, 1990 5b. Clemmer BA, Barbano JP: Reproducibility of periodontal scores in clinical trials. J Periodont Res 9 (suppl 14):118–128, 1974 6. Dabelsteen E, Mackenzie IC: The scientific basis for oral diagnosis. In Mackenzie IC, Squier CA, Dabelstein E (eds): Oral Mucosal Diseases: Biology, Etiology and Therapy. Copenhagen, Laegeforeningens Follag, 1987, pp 99–102 7. Drace JE, Young SW, Enzmann DR: TMJ meniscus and bilaminar zone: MR imaging of the substructure–diagnostic landmarks and pitfalls of interpretation. Radiology 177:73–76, 1990 7a. Dworkin SF, LeResche L, DeRouen T, et al: Assessing clinical signs of temporomandibular disorders: Reliability of clinical examiners. J Prosthet Dent 63:574–579, 1990 8. Dworkin SF, LeResche L, DeRouen T: Reliability of clinical measurement in temporomandibular disorders. Clinical J Pain 4:89–99, 1988 9. Ephros H, Samit A: Leukoplakia and malignant transformation. Oral Surg Oral Med Oral Pathol Oral Radiol Endod 83:187, 1997 10. Epstein JB, Oakley C, Millner A, et al: The utility of toluidine blue application as a diagnostic aid in patients previously treated for upper aerodigestive tract cancers. Oral Surg Oral Med Oral Path Oral Radiol Endod 83:537–547, 1997 10a. Fleiss JS, Chilton NW: The measurement of interexaminer agreement in periodontal disease. J Periodont Res 18:601, 1983 10b. Goulet J, Clark GT: Clinical TMJ examination methods. Journal of the California Dental Association 18:25–33, 1990 11. Greenstein G, Lamster I: Understanding diagnostic testing for periodontal diseases. J Periodontol 66:659–666, 1995 11a. Haffajee AD, Socransky SS, Goodson JM: Clinical parameters as predictors of destructive periodontal disease activity. J Clin Periodontol 10:257–265, 1983 12. Hansson LG, Westesson PL, Katzberg RW, et al: MR imaging of the temporomandibular joint: Comparison of joints of autopsy specimens made at 0.3 T and 1.5 T with anatomic cryosections. AJR Am J Roentgenol 152:1241–1244, 1989 13. Hennekens CH, Buring JE: Screening. In Mayrent SL (ed): Epidemiology in Medicine. Boston, Little, Brown and Co, 1987, pp 327–347 14. Hlatky MA, Mark DB, Harrell FE, et al: Factors affecting sensitivity and specificity of exercise electrocardiography. Am J Med 77:64–71, 1984 15. Hlatky MA, Mark DB, Harrell FE, et al: Rethinking sensitivity and specificity. Am J Cardiol 59:1195–1198, 1987 16. Karabulut A, Reibel J, Therkildsen MH, et al: Observer variability in the histologic assessment of oral premalignant lesions. J Oral Pathol Med 24:198–200, 1995 17. Kramer IRH: Basic histopathological features of oral premalignant lesions. In Mackenzie IC, Dabelstein E, Squier CA (eds): Oral Premalignancy. Iowa City, University of Iowa Press, 1980, pp 23–34 18. Kramer IRH: Prognosis from features observable by conventional histopathological examination. In Mackenzie IC, Dabelstein E, Squier CA (eds): Oral Premalignancy. Iowa City, University of Iowa Press, pp 304–311, 1980 18a. Lange JP: Clinical markers of periodontal disease. In Johnson NW (ed): Risk Markers for Oral Disease, vol. 3. Periodontal Disease, Markers of Disease Susceptibility and Activity. Cambridge, Cambridge University Press, 1991, pp 179 19. Mashberg A: Clinical features of oral malignancy in relation to prognosis. In Mackenzie IC, Dabelstein E, Squier CA (eds): Oral Premalignancy. Iowa City, University of Iowa Press, pp 292–334, 1980 20. Matthews DC, Banting DW: Authors’ response. J Can Dent Assoc 62:7, 1996 21. Matthews DC, Banting DW, Bohay RN: The use of diagnostic tests to aid clinical diagnosis. J Can Dent Assoc 61:785–791, 1996 21a. Mileman PA, Vissus T, Pundell-Lewis DJ: The application of decision making analysis to the diagnosis of approximal caries. Community Dental Health 3:65–81, 1985 22. Norman GR, Streiner DL: PDQ Statistics. Toronto, Canada, Decker Inc, 1986
THE USE OF DIAGNOSTIC DATA IN CLINICAL DENTAL PRACTICE
115
23. Okeson JP: Management of Temporomandibular Disorders and Occlusion. St. Louis, C.V. Mosby Co, 1989, pp 147–300 24. Orsini MG, Kuboki T, Terada S, et al: Clinical predictability of temporomandibular joint disc displacement. J Dent Res 78:650–660, 1999 25. Parker SL, Tong T, Bolden S, et al: Cancer statistics. CA Cancer J Clin 46:5–27, 1996 26. Pindborg JJ, Reibel J, Holmstrup P: Subjectivity in evaluation of oral epithelial dysplasia, carcinoma in situ and initial carcinoma. Journal of Oral Pathology 14:698–708, 1985 27. Rohlin M, Akerman S, Kopp S: Tomography as an aid to detect microscopic changes of the temporomandibular joint. Acta Odontol Scand 44:131–140, 1986 28. Rosenberg D, Cretin S: Use of meta-analysis to evaluate tolonium chloride in oral cancer screening. Journal of Oral Surgery 67:621–627, 1989 29. Sacket DL, Haynes RB, Guyatt, et al: Clinical Epidemiology. A Basic Science for Clinical Medicine, ed 2. Boston, Little, Brown and Co, 1991, pp 3–170 30. Sacket DL, Richardson WS, Rosenberg W, et al: Evidence-Based Medicine: How to Practice and Teach EBM. New York, Churchill Livingstone, 1997, p 127 31. Schechter MT, Sheps SB: Diagnostic testing revisited: Pathways through uncertainty. J Can Med Assoc 132:755–759, 1985 32. Schiffman E, Anderson GC, Fricton J, et al: Diagnostic criteria for intraarticular TM disorders. Community Dent Oral Epidemiol 17:252–257, 1989 33. Schwartz WB, Wolfe HJ, Pauker SG: Pathology and probabilities. A new approach to interpreting and reporting biopsies. N Engl J Med 305:917–913, 1981 34. Silverman SJR: Oral Cancer, ed 3. Atlanta, GA, American Cancer Society, 1990 35. Streiner DL, Norman GR: Health Measurement Scales. Oxford, Oxford University Press, 1989, pp 79–95 36. Tanimoto K, Peterson A, Rohlin M, et al: Comparison of computed with conventional tomography in the evaluation of temporomandibular joint disease: A study of autopsy specimens. Dentomaxillofacial Radiology 19:21–27, 1990 36a. Valachovic RW, Douglass CW, Berkey CS, et al: Examiner reliability in dental radiography. J Dent Res 65:432–436, 1986 36b. Vendonschotsh, Bronkhurst EM, Burgersdijk RCS, et al: Performance of some diagnostic systems in examinations for small occlusal caries. Caries Res 26:59–64, 1992 37. Westesson PL, Bronstein SL, Liedberg J: Temporomandibular joint: Correlation between single-contrast videoarthrography and postmortem morphology. Radiology 160:767–771, 1986 38. Westesson PL, Eriksson L, Kurita K: Reliability of a negative clinical temporomandibular joint examination: Prevalence of disk displacement in asymptomatic temporomandibular joints. Oral Surgery, Oral Medicine and Oral Pathology 68:551–554, 1989 39. Widmer CG: Physical characteristics associated with temporomandibular disorders. In Sessle BJ, Bryant PS, Dionne RA (eds): Temporomandibular Disorders and Related Pain Conditions, Progress in Pain Research and Management, vol 4. Seattle, IASP Press, 1995, pp 161–174 40. Wulff HR: Rational Diagnosis and Treatment. Oxford, Blackwell Scientific Publications, 1976 Address reprint requests to Donald Maxwell Brunette, MSc, PhD Department of Oral Biological and Medical Sciences University of British Columbia 2199 Wesbrook Mall Vancouver, British Columbia Canada, V6T 1Z3 e-mail:
[email protected]
EVIDENCE BASED DENTISTRY
0011–8532/02 $15.00 .00
ASSESSMENT OF KEY ELEMENTS TO DETERMINE CAUSATION AND RISK FACTORS IN DENTISTRY Rhonda F. Jacob, DDS, MS
The hypothesis that an exposure or characteristic is associated with a particular disease outcome can be statistically proven through large population studies. Causation studies usually involve identifying diseases that are caused by or whose natural history is modified by lifestyle choices and environmental exposures. A causal association is one in which a change in the frequency or quality of an exposure or characteristic results in a corresponding change in the frequency or quality of the disease outcome. The causal characteristics associated with an increase in disease are often called risk factors. In 1890, Robert Koch clarified the cause-and-effect relationship of infectious disease when he postulated that a bacterium was the cause of a single disease entity. He stated that the specific organism should be present in all hosts suffering from a specific disease, the microorganism should be isolated from the diseased host and grown in pure culture in the laboratory, inoculation of the cultured organism into a healthy host should cause the disease, and the microorganism should be reisolated from the inoculated host. Although not all of Koch’s postulates have proved true for all bacteria, viruses, and prions, Koch’s postulates marked a milestone for cause-and-effect thinking in health care science. Epidemiologists frequently perform causation studies. When a true cause-and-effect association is determined, this information assists in
From the Department of Head and Neck Surgery, MD Anderson Cancer Center, Houston, Texas
DENTAL CLINICS OF NORTH AMERICA VOLUME 46 • NUMBER 1 • JANUARY 2002
117
118
JACOB
formulating global strategies for controlling disease based on population issues such as living conditions, nutrition, personal behavior, lack of health care education, absence of immunity, and so forth. Clinicians are interested in the cause of disease so that they may test therapeutic strategies to prevent or cure the disease. Clinicians are interested in prevention and therapy for individual patients and smaller patient populations. For clinicians, strategies are tailored to individual patient characteristics that will allow the clinician to modify therapy prescribed to the patient afflicted with the target disease. Although most dentists may view dentistry as a clinical and therapeutic science, many global population issues of causation are related to dentistry. Some examples are public water fluoridation, a possible association of amalgam restorations with multiple sclerosis, and smoking as a risk factor for periodontal disease and dental implant loss. Dentistry has recently begun to examine the association of periodontal disease and cardiovascular disease.1, 4, 8 In these studies, it has been statistically proven that some persons with a diagnosis of heart attack or coronary heart disease are more likely than the general population to have a diagnosis of periodontal disease. The question arises whether this association is a valid cause-and-effect relationship: does periodontal disease cause cardiovascular disease? To some, this cause-and-effect relationship seems outlandish. So did the hypothesis generated by Oliver Wendall Holmes (professor of anatomy and physiology and later dean of Harvard medical school) in 1843 that maternal fevers after childbirth were communicated from mother to mother by obstetricians who did not practice hand washing between births. Learned colleagues stated that they suspected the disease was ‘‘accident or providence’’ rather than any process that could be stemmed by hygiene.5 It was almost 40 years before Koch set forth the postulates that an infectious agent causes disease. KEY ELEMENTS FOR EVALUATING CAUSATION Just as Koch formulated postulates that shaped the assessment of the validity of the causal association of a specific organism and a disease, scientists have formulated key elements that assist in judging the scientific evidence for causation. These elements involve chance, bias, confounding variables, biologic credibility, temporal relationship, strength of the relationship, and a dose-response gradient. Making judgments as to whether associations are causal associations involves an evaluation of the totality of evidence taken from a number of sources that document the cause-and-effect relationship. The ultimate test of causation is the successful use of intervention strategies that therapeutically alter the risk factor or characteristic, thereby altering or curing the disease. RESEARCH STUDIES TO EVALUATE CAUSATION The methodologies of research an epidemiologist uses to study cause-and-effect relationships are different from those used by a clini-
ASSESSMENT OF KEY ELEMENTS TO DETERMINE CAUSATION AND RISK FACTORS
119
cian. Causation studies usually include large sample populations, are carried out by epidemiologists, and are observational in nature; the subjects are observed, queried, and measured, without the investigators’ offering or testing any interventions. The study is often hypothesisdriven: the investigators are gathering data to determine if the characteristic and the outcome can be found together (associated) in the patient population, with the statistical analysis supporting the association beyond mere chance. Today, health care researchers understand that, unlike Koch’s simple assessment of ‘‘bacteria cause disease,’’ the cause of disease is often multifactorial. Multiple characteristics affect the host’s susceptibility to disease, and how the characteristics come together in the host affects the magnitude of the disease. Therefore, observational studies often examine multiple characteristics of the population to determine associations. Observers record the natural course of events, noting which subject has or does not have the risk factor and who does or does not develop the outcome of interest. Different observational study designs can be used, but some designs offer an improved opportunity to control bias and confounding variables, thereby increasing the likelihood that these studies report a valid causal association. The two types of observational studies most often employed are cohort or case-control studies. Either may be used, but the decision to use one rather than the other is often based on features of the exposure or risk factor and the disease, current knowledge of the disease, and considerations of time and resources. How is the Magnitude of a Risk Factor Reported? Depending on the study design, the magnitude of the causal association is often described as a ratio, either an odds ratio or a relative risk. This mathematical analysis considers the ratio of subjects in the exposed or unexposed group that have or do not have the outcome of interest. Because this relationship is a ratio, an odds ratio or a relative risk of 1 denotes that there is no difference in outcomes between the two groups. Relative risks barely above 1 describe a weak association of the risk factor with the outcome. As the ratio becomes higher than 1, it is an estimate of the increased risk of having the outcome if the risk factor is present, as compared with having the outcome if the risk factor is not present. A relative risk of 1.5 means that the subject with the risk factor is 50% more likely to have the disease outcome than a subject without the risk factor. As in all studies, the test population serves only as a representative population to predict how similar populations would respond. When testing a subpopulation, one can only estimate how similar populations would respond. Statistical maneuvers can assure the precision of the estimate by using a 95% confidence interval. A relative risk and confidence interval might be written as 2.2 (C.I. 1.3–4.4) This expression states that, given the data from this representative study, the best estimate of the relative risk is 2.2, but if the study were performed
120
JACOB
100 times, in 95% of those studies the true risk estimate would fall somewhere between 1.3 and 4.4.7, 9, 15
Prospective Cohort Studies Cohort studies observe large populations with and without the exposure and observe the subjects forward in time to determine if there is a difference in the populations as to the incidence of the disease outcome. The observation process often begins with descriptive statistics that reveal a difference in prevalence of a disease in a defined population, such as a geographic area. A hypothesis then arises that the increased prevalence of the disease in one geographic population versus another is caused by some environmental factor. For instance, in 1942 a low prevalence of dental decay was demonstrated to correlate with a high fluoride concentration in the natural water supply. These descriptive correlations came from a study of 4425 children, 12 to 14 years of age, in 13 cities located in four states.14 Understanding this correlation, a subsequent prospective comparison of the dental status of children in a city without natural fluoride (Kingston, New York) and a city that had fluoride added to its water supply (Newburgh, New York).* The children were examined for decayed, missing, and filled teeth at baseline and again in 10 years. In the 6- to 9-year-olds who had drunk fluoridated water for all of their lives, a 57% relative reduction of dental caries was seen. The older children experienced a 41% relative reduction.2, 6, 7 Another observational study compared decayed, missing, and filled tooth surfaces at baseline and years after fluoridation was removed from the water supply in Antigo, Wisconsin. This study revealed that the caries index rose significantly, from 2.1 to 4.8 surfaces per person in the fourthgrade population and from 0.5 to 2.0 surfaces per person in the secondgrade population.7, 11 This observational study added to the totality of the evidence that fluoride acts to prevent dental caries. In these studies, fluoride is a preventive factor or a negative risk factor for dental caries. These two studies exemplify some of the elements that quantify the strength of a causal association. In the New York study, the observations were conducted in a prospective fashion and showed that the exposure occurred first, and the outcome followed. This demonstration satisfies the temporal relationship required in causation. In the second study, the negative risk factor (fluoride) was withdrawn, and the disease incidence increased. This demonstration satisfies the dose-response gradient of causation. The strength of the causal evidence is also enhanced by the magnitude of the causal effect in both studies. *The addition of fluoride to the water supply could be considered a therapeutic or interventional trial; however, because the subjects were not randomized but were considered as two distinct, self-selected populations who were not balanced for other population characteristics, the methods are similar to those used in an observational trial.
ASSESSMENT OF KEY ELEMENTS TO DETERMINE CAUSATION AND RISK FACTORS
121
Dental research has since satisfied biologic credibility by explaining the mechanism of fluorappetite and how it decreases acid dissolution of enamel. Finally, multiple therapeutic trials evaluating administration of fluoride in the water supply, in the diet, and through pharmaceutic supplements have demonstrated a decrease in caries incidence. Case-control Studies Case-control studies differ from cohort studies in that the exposure or risk factor and outcome have already occurred. There is no following of the subjects over time, waiting for the outcome to occur. A casecontrol study usually includes a fixed population from which the investigator selects a population with the outcome of interest (cases). In a systematic fashion, the investigator identifies another subject from the fixed population that is as similar in as many characteristics as possible except for the outcome (control). This type of study is often called a matched case-control trial. Both these groups are then evaluated to determine how many of them have the exposure or risk factor. The data concerning the characteristics and exposures are almost always gathered retrospectively from chart reviews, patient questionnaires, and other documentation. Examinations may be performed to confirm some of the data, such as the outcome. The retrospective data and examination data are collected at this one point in time. If the final analysis confirms that the outcome of interest occurs more frequently in the group with the exposure, an association exists. This type of study is sometimes preferred to the cohort study because it allows evaluation of rare outcomes and outcomes that may take many years to manifest. It is also less costly than longitudinal studies.5, 7, 10, 15 A cross-sectional study is similar to a case-control study in that the outcomes and exposures have occurred before the study, and the interface with the investigator is at one point in time, without longitudinal follow-up. The population is a fixed population, but usually only a representative sample, a cross-section of the fixed population, is evaluated. The associations between periodontal disease and coronary heart disease have been reported through case-control studies and cross-sectional convenience samples.1, 12, 13 The multifactorial causality of coronary heart disease, the various criteria defining heart disease outcomes, the various methods of defining periodontal disease, and the large number of microorganisms in the oral cavity have made it difficult to evaluate the evidence in these studies. From a statistical standpoint, the multifactorial causality of heart disease requires statistical adjustments for as many as 13 different causal variables, besides periodontal disease. One study categorized attachment loss in one quadrant of the oral cavity, compared with a self-reported history of a heart attack, in 5564 persons older than 40 years of age. After adjustment of other risk factors for heart attack, the odds ratio for heart attack in persons with attachment loss of 3 mm or greater in 67% of measurements was 3.8 (C.I. 1.5–9.7)
122
JACOB
compared with persons without attachment loss. The odds ratio for persons with attachment loss of 3 mm or greater in 33% to 67% of measurements was 2.3 (C.I. 1.2–4.4). There was no statistically significant difference in odds ratio with attachment loss in less than 33% of measurements.1 A second study of 85 persons referred to a hospital for angiography, matched with persons without coronary heart disease selected from public records, revealed no difference in the dental indices of periapical and periodontal disease. The average age was 56 years, and the author speculated that this group is older than those in previous studies. There may have been an age-selection bias, such that older patients with coronary heart disease are in better general health and have better oral health, because the severely ill patients with coronary heart disease have all ready died.13 Another one-point-in-time assessment from chart review data and periodontal examination of a sample of 320 Veterans Medical Association dental patients older than 60 years of age was performed to determine dental associations with coronary heart disease. Other risk factors were also considered from data gathered from hospital charts and patient interviews. Use of cardiac medications were considered to represent a diagnosis of coronary heart disease. Multiple analyses were performed on 25 characteristics. The medically recognized risk factors for coronary heart disease did not have significant association in this study. The authors believed the lack of significance in this study was probably caused by to the increased age of the subjects and that those subjects with significant associations may have already succumbed to coronary heart disease. In addition, subjects were being treated for many of the other risk factors, and therefore those risk factors were under control. Statistical associations with coronary heart disease were found for total tooth number up to 14, low salivary levels of Streptococcus sanguis, gingival bleeding, positive plaque scores, and a complaint of xerostomia.12 A prospective analysis of 9760 persons concluded that persons with periodontitis had a 25% increased risk of coronary heart disease compared with those with minimal periodontal disease. Poor oral hygiene, determined by dental debris and calculus, was also associated with an increased incidence of coronary heart disease, which was defined as a hospital admission or death caused by coronary heart disease. Compared with men without periodontal disease, the highest relative risk for coronary heart disease was for men with periodontitis who were younger than 50 years old, 1.72 (C.I. 1.10–2.68). An even greater relative risk for total mortality was found for this group; those with periodontitis had a relative risk of 2.12 (C.I. 1.24–3.62), and the edentulous subjects had a relative risk of 2.60 (C.I. 1.33–5.07). The authors concluded that a causal association between periodontal disease and coronary heart disease is unclear, and that dental health may be more an indicator of personal hygiene and overall health care practices.4 Case-control studies that interface with the subjects at one point in time can suggest an association between a characteristic and an outcome, but they cannot confirm the temporal relationship that the risk factor
ASSESSMENT OF KEY ELEMENTS TO DETERMINE CAUSATION AND RISK FACTORS
123
came before the outcome. The correct temporal relationship is a primary element in proving causation, but in case-control studies this element is missing. In the evaluations of coronary heart disease and periodontal disease, case-control studies have demonstrated that the two entities occur simultaneously in the population, but one cannot be certain that the coronary heart disease did not in some way cause the periodontal disease. Case-control studies do not control the element of confounding characteristics, that is, the possibility that a third variable or mechanism, not yet isolated or understood, is causing the increase in both coronary heart disease and periodontal disease. Such a confounding element would account for the association of the coronary heart disease and periodontal disease without there being a causal relationship between the two. A confounding characteristic is demonstrated in the study by Loesche and colleagues.12 There was a significant association of an increase in the complaint of xerostomia in persons with coronary heart disease. One should not assume that xerostomia causes coronary heart disease. It is known, however, that cardiac medications cause xerostomia. Patients with coronary heart disease require cardiac medications. The association of xerostomia and coronary heart disease results from the cardiac medications; the cardiac medications are the confounding factor. Randomized, Controlled Trials Randomized, controlled trials (RCTs) are rarely conducted as the first step to determine a causal association. In an RCT, a homogenous population of subjects is randomly assigned to two groups, one that will receive the test intervention and the other that will receive a placebo or standard-of-care intervention. The two groups are followed prospectively for the outcome of the two interventions. The decided advantage of RCTs over all other study designs is the investigators’ ability to control multiple aspects of the trial, prospectively thereby decreasing bias and offering the greatest opportunity to arrive at a valid and conclusive answer to a research question.5, 10, 15 In discussions of causation, the cause is usually harmful, and the outcome is usually a disease. Initially, only descriptive data are available describing a possible harmful cause and effect. Even though the cause-and-effect assumption may be weak, most ethicists and clinicians would not wish to move directly to an RCT in which the investigator purposefully administers a possibly harmful event to determine if it really is harmful. For questions of causation, the initial information to promote the hypothesis and prove an association between an event and an outcome should be gained through observational studies. This data gathering can usually be performed more efficiently and cost effectively in case-control or crosssectional trials, in which the subjects can be examined at one point in time. If several trials indicate that sufficient association exists and the health care impact is judged appropriate for further time and monies to be expended, several longitudinal, cohort trials could be undertaken.
124
JACOB
These studies do not occur in a vacuum, and it is likely that many investigators are examining the same issue in the laboratory and clinically. As a significant body of evidence mounts that a causal association exists, RCTs can be used to evaluate treatments that will modify the harmful cause or risk factor, thereby altering the disease process or curing it. Bias in Research and Causation Studies In any research, the validity of the conclusions is negatively affected by bias. One major source of bias in observational studies is the difficulty in assuring a homogenous study population. In observational studies, subjects are usually self-selected in that they experienced an exposure, have an inherent risk factor, or have specific lifestyle behaviors. The investigators must attempt to quantify the other characteristics of the study subjects and to select a comparison population with characteristics as similar to the exposed population as possible. With such ex post facto population selection, bias can easily occur. Given the intricacies of the human body, unknown patient characteristics may affect the outcome. Also, many patients engage in self-directed interventions whereby they wittingly or unwittingly alter the exposure or outcome. These confounding interventions may affect the disease outcome more than the risk factors being assessed. Because the investigator does not know about these confounding entities, they cannot be measured. Nor will the unknown characteristics be uniformly distributed in both comparison groups. Randomized, controlled trials control for these unknown, confounding characteristics by selecting a large homogeneous population before rendering any intervention. The large population is then randomly divided into the two comparison groups. The random assignment of the subjects allows equal assignment of the known and unknown characteristics into both study groups, thereby creating two homogenous populations. Other biases can occur in observational studies because of the retrospective nature of the data gathering. Investigators must rely on patients to give valid answers on questionnaires and rely on the completeness of medical and dental records to acquire information about a subject’s health and exposure status. One cross-sectional study to evaluate fluorosis in a school population required that that the parents complete a questionnaire. Forty-five percent of the questionnaires were not accepted because of invalid responses.3 Without standard treatment protocols and documentation protocols, difficulties can arise from omission of data or ambiguous interpretation of data to fit a research question. A study that evaluated temporomandibular complications was undertaken as a concurrent evaluation of the efficacy of two orthognathic surgical fixation techniques. The surgeons and other investigators evaluated the temporomandibular complications. The surgeons recorded their data in the patients’ charts as part of the treatment record, whereas the trained
ASSESSMENT OF KEY ELEMENTS TO DETERMINE CAUSATION AND RISK FACTORS
125
temporomandibular examiners documented their data on standardized research forms. It was apparent that the surgeons focused more on efficacy of surgery than on secondary temporomandibular outcomes. In many instances, the surgeons failed to document the temporomandibular findings of pain, oral opening, crepitus, locking, or clicking of the joint. The approximate differences in documentation and disagreement of the various findings between the surgeons and the trained investigators ranged from 20% to 65% for each parameter over the various measurement periods.16 Magnitude of the Risk When the magnitude of risk is great, risk is not easily masked by bias. Even though the original studies in New York on water fluoridation did not use randomized populations or populations selected for like characteristics, the magnitude of the effect in reducing caries was so great that the causal association was accepted. Further studies and examination of the key elements related to causation established the cause-and-effect relationship of fluoride and decrease in caries incidence. When causes are multifactorial, have a long latency period before the effect is demonstrated, and the physiology is complex, as in the risk factors for coronary heart disease, a small increase in risk is not readily observed. When a characteristic has a small influence on the outcome, more subjects are required to demonstrate that influence. The more subjects in a study and the longer the duration of a study, the more likely it is that the study will be tainted by bias, and the more equivocal the conclusions become. Such was the case of smoking and lung cancer. Multiple studies, conducted over many years and from many countries, were necessary to establish this causal relationship by proving a large risk of lung cancer among smokers, and proving both a temporal relationship and a dose-response gradient. SUMMARY The best research method for assessing therapeutic modalities is the RCT. The prospective nature and the randomization of the subjects in an RCT provide the greatest opportunity to control bias and offer the most valid answer to the clinical question. Observational studies generate hypotheses about causation and should be viewed as a first step in the continuum of health care delivery. The preponderance of evidence will mount as the hypotheses are tested by additional prospective, longitudinal, observational trials. The clinician’s involvement is to design and implement therapeutic strategies to alter the causal exposure, intervene in the dose-response gradient, and block the pathophysiologic mechanisms. Dentistry is an art and a science. Moving through the continuum from causation hypothesis to therapeutic intervention is the science of
126
JACOB
dentistry. It is the science of dentistry that will change the scope of the profession in this millennium. References 1. Arbes SJ Jr, Slade GD, Beck JD: Association between extent of periodontal attachment loss and self-reported history of heart attack: An analysis of NHANES III data. J Dent Res 78:1777–1982, 1999 2. Ast DB, Schlesinger EF: The conclusion of a 10-year study of water fluoridation. Am J Pub Health 46:265–271, 1956 3. Brothwell DJ, Limeback H: Fluorosis risk in grade 2 students residing in a rural area with widely varying natural fluoride. Community Dent Oral Epidemiol 27:130–136, 1999 4. DeStefano F, Anda RF, Kahn HS, et al: Dental disease and risk of coronary heart disease and mortality. BMJ 306:688–691, 1993 5. Fletcher RH, Fletcher S, Wagner EH: Clinical Epidemiology: The Essentials, ed 3. Baltimore, Williams & Wilkins, 1996 6. Friedman GD: Primer of Epidemiology, ed 4. New York, McGraw-Hill, 1994 7. Gordis L: Epidemiology. Philadelphia, WB Saunders, 1996 8. Hujoel PP, Drangsholt M, Spiekerman C, et al: Periodontal disease and coronary heart disease risk. JAMA 284:1406–1410, 2000 9. Hulley SB, Cummings SR: Designing Clinical Research. An Epidemiologic Research, ed 2. Baltimore, Lippincott Williams & Wilkins, 2001 10. Jacob RF, Carr AB: Hierarchy of research design used to categorize the ‘‘strength of evidence’’ in answering clinical dental questions. J Prosth Dent 83:137–152, 2000 11. Lemke CW, Doherty JM, Arra MC: Controlled fluoridation: The dental effects of discontinuation in Antigo, Wisconsin. J Am Dent Assoc. 80:782–786, 1970 12. Loesche WJW, Schork A, Terpenning MS, et al: Assessing the relationship between dental disease and coronary heart disease in elderly U.S. Veterans. J Am Dent Assoc 129:301–311, 1998 13. Mattila KJ, Askikainen S, Wolf J, et al: Age, dental infections, and coronary heart disease. J Dent Res 79:756–760, 2000 14. US Public Health Services, 1942, pp 1155–1179, Public Health Report 57 15. Sackett DL, Haynes RB, Guyatt GH, et al: Clinical Epidemiology. A Basic Science for Clinical Medicine, ed 2. Boston, Little, Brown and Co, 1991 16. Scott BA, Clark GM, Hatch JP, et al: Comparing prospective and retrospective evaluations of termporomandibular disorders after orthognathic surgery. J Am Dent Assoc 28:999–1003, 1997 Address reprint requests to Rhonda F. Jacob, DDS, MS MD Anderson Cancer Center 1515 Holcombe Boulevard, Box 0441 Houston, TX 77030 e-mail:
[email protected]
EVIDENCE BASED DENTISTRY
0011–8532/02 $15.00 .00
USERS’ GUIDE TO THE DENTAL LITERATURE How to Use an Article about Prognosis Patrick M. Lloyd, DDS, MS
Today’s prosthodontic treatments are some of the most sophisticated the profession has ever been able to offer. State-of-the-art materials simulate the color, texture, pliability, and wear of human tissues to near perfection. Techniques have been refined, dramatically reducing the time required for sophisticated procedures. With proper planing and sequencing, function and appearance can be restored to such a high level that the artifice is imperceptible even to the most critical patient. Coincident with the availability of advanced forms of therapy, patients have also become more sophisticated and present with specific requests and desires. Because many patients are financially secure, they are able to support the costs associated with extensive prosthodontic treatments. These are patients who, after reviewing the evidence, are most willing to make the investment of time and resources to achieve a particular outcome. They have high expectations and will not be satisfied by results that fall short of predicted outcomes. The demand for authority to support a particular intervention has become increasingly common among prosthodontic patients whose treatments are either fully or partially the responsibility of third-party providers. Because third-party providers oversee the care of thousands (sometimes hundreds of thousands persons), they are likely to demand an even greater degree of proof before they support a given intervention.
From the Department of Family Dentistry, The University of Iowa College of Dentistry, Iowa City, Iowa
DENTAL CLINICS OF NORTH AMERICA VOLUME 46 • NUMBER 1 • JANUARY 2002
127
128
LLOYD
With the growing need for documented efficacy of treatment and efficiency of rendering care, prosthodontists will serve their patients best when they fully understand the intricacies of clinical research and the results reported. This article proposes a structure for evaluating the literature that pertains to prognosis—the prediction of outcomes and the frequency of such occurrences (see box).
Users’ Guide for Evaluating an Article About Diagnosis 1. Are the results of the study valid? • Primary guides Was there a representative and well-designed sample of patients at a similar point in time? Was follow-up sufficiently long and complete? • Secondary guides Were objective and unbiased outcome criteria used? Was there adjustment for important prognostic factors? 2. What are the results? • How large is the likelihood of the outcome events in a specified period of time? • How precise are the estimates of likelihood? 3. Will the results help a clinician care for patients? • Are the study patients similar to those in the clinician’s practice? • Will the results lead directly to selecting or avoiding therapy? • Are the results useful for reassuring or counseling patients? From Jacob R, Lloyd P: How to evaluate a dental article about harm. J Prosthet Dent 84:8–16, 2000; with permission.
It will help practitioners develop the ability to judge whether the results of an investigation are valid, to interpret the results, and to determine whether the analysis of the results is relevant to their practice. A hypothetical case is given here for discussion.
CLINICAL SITUATION The first patient of the day is a 52-year-old woman with an unremarkable health history. She has been referred by a general dental practitioner for evaluation and possible treatment of a missing mandibular left first molar. About 15 years earlier, tooth #19 was restored with a multiple-surface, intracoronal silver amalgam restoration. About 3 months ago, the patient bit into a hard piece of bread, separating the lingual surface of the tooth and resulting in an immediate sharp pain that persisted for 2 days. Her general dentist diagnosed it as a vertical
USERS’ GUIDE TO THE DENTAL LITERATURE
129
root fracture and recommended extraction. The patient has excellent oral hygiene, a class I Angle’s malocclusion bilaterally, no mucogingival defects, and an extensively restored posterior dentition (with silver amalgam as the predominate restorative material). Her third molars are the only other missing teeth. Her chief concern is whether a dental prosthesis should be fabricated to replace her missing molar tooth. Her general dentist has told her that if the edentulous space is left untreated, it will lead to future problems, the most significant of which would be drifting and shifting of the adjacent and opposing teeth. Such tooth movement, the dentist said, often results in severe occlusal disharmony, limiting a patient’s ability to eat comfortably and, because of the concomitant gingival and periodontal complications, ultimately leading to the demise of other teeth. At present, the patient does not find the toothless site to be an esthetic problem. She reports having slightly modified her chewing pattern, eating more on her right side than her left since the trauma to tooth #19. The prosthodontic specialist informs the patient that before treatment options can be considered it is necessary to make diagnostic casts and to test the vitality and physical condition of the teeth surrounding the edentulous space. A relevant article reporting a study of the consequences of not replacing a missing posterior tooth has been published recently in a national dental journal. The specialist promises to share the results presented in that article with the patient at her next visit so that she can make an informed decision. After spending almost an hour rummaging through a stack of journals later that day, the practitioner finally locates the article. Its title seems to fit the patient’s condition perfectly: ‘‘The consequences of not replacing a missing posterior tooth,’’ by Shugars et al.6 Because the specialist has read it once, a few months ago, he plans to review it again in more detail before the patient’s next appointment.
STUDY DESIGNS THAT YIELD INSIGHT AND IDENTIFY PROGNOSIS FACTORS To advise the patient with the greatest degree of confidence, it is desirable to have the results of a clinical study that follows over time a large population of patients who are in every way similar to this patient. The subjects of the study would have the same condition (missing a mandibular first molar) and would be comparable in all other domains (e.g., age, gender, oral hygiene, periodontal support, classification of malocclusion). Such a study design would allow observation of the natural history or clinical course of the condition. It would be possible to monitor the status of anatomic, physiologic, and psychologic conditions that have been reported to occur as a consequence of no treatment.
130
LLOYD
Ultimately, the clinician would have definitive insight to share with the patient and could feel secure in advising her. This type of study would provide the information the reader desired, but it is unlikely to be undertaken for many reasons. First are considerations of cost and time. To assemble such a pool of patients would require innumerable resources: hundreds of calibrated examiners and clinical facilities that could accommodate tens of thousands of subjects. Identical follow-up treatment would have to be provided to each subject (e.g., the same period for prophylaxis, operative treatments, and other, more complicated procedures). To assure that there were no influences from other health conditions, it would be necessary to remove patients from the study who developed illnesses or were prescribed medications. Ultimately, the initial population might be reduced to too small a group to make a conclusive assessment. Many years would be required collect the data necessary to allow advice to be given with confidence. A cohort study design offers a more realistic approach for exposing the risks associated with certain conditions. Patients in a cohort study would have the same condition (missing a mandibular first molar) but would be different in ways previously reported to influence the outcome (e.g., type of malocclusion, periodontal status, other tooth loss). Subjects would be grouped according to these prognostic factors and followed over time. Data collected on other conditions that arise during the course of the study would allow additional analysis to expose other factors that contribute to the negative outcomes. Absolute risk ratios could be calculated so that the patient could be offered probabilities on the outcomes associated with not treating her condition. The case-control study design is even more practical from both a resource and a time perspective but is extremely prone to bias. In a casecontrol study, subjects with the condition who have experienced the negative outcome (periodontal destruction, additional tooth loss) are compared with subjects who have not. Because subjects, cases, and controls are selected after the event has or should have occurred, there is tremendous potential bias. Investigators, because they must examine subjects to determine their appropriateness for the study, cannot be blinded during the selection process. The population from which subjects are drawn (e.g., a convenience sample from a dental college) further contributes to bias. Bias is compounded by the inherent shortcomings of a retrospective study design, substantially reducing the confidence that clinicians can realistically derive from such a study. Also, because casecontrol studies do not follow subjects over time, only relative risks can be calculated. In spite of these deficiencies, skillfully planned and tightly monitored case-control studies can play a significant role in patient care, especially when the outcome under consideration is infrequently detected or the time needed to observe the outcome is excessively long. (For example, mesial drifting of teeth posterior to the edentulous space has been reported to take several years.)
USERS’ GUIDE TO THE DENTAL LITERATURE
131
ARE THE RESULTS OF THE STUDY VALID? Primary Guides Was There a Representative and Well-defined Sample of Patients at a Similar Point in the Course of the Disease? The validity of the conclusions drawn by investigators from their work should be judged on how well the population is defined. Are the criteria for patient selection well defined and appropriate? Is the database adequate to determine whether the study group represents the total population of patients at risk for the negative outcome? Shugars et al studied patients from a large group-model health maintenance organisation who had a first molar or second premolar extracted, were 18 years of age or older, and were enrolled in the program for at least 8 years.7 The potential for introducing bias into an investigation during the selection of subjects is quite strong. Questions that should be asked include: Did all types of patients have an equal probability of being selected? Were some patients filtered out because of coexisting conditions? What measures were taken to ensure that patients represented a broad cross-section of the population (e.g., age, sex, geographic origin, socioeconomic status)? In judging whether a study on prognosis is valid, it is also important to make sure that all patients entering the investigation are at a similar, well-defined point in the course of their condition. The investigators should describe, as specifically as possible and using discipline terminology, the stage patients must be in to be included in the study. Shugars et al decided to enroll patients if there was a radiograph of the adjacent and opposing teeth within 6 months before or after the extraction.7 Was Follow-up Sufficiently Long and Complete? Even if there is a true association between a prognostic factor and an outcome of interest, it may take an extended period of time before the connection becomes evident. The chronic nature of most dental diseases and their delayed sequelae call for a rather long and protracted observation phase to confirm or deny a relationship. For instance, the loss of additional teeth as a consequence of not replacing a missing posterior tooth may take several years to occur. Patients should be examined at regular intervals over a sufficiently long period to judge whether such an outcome is related to a particular prognostic factor. Investigators studying the natural history or clinical course of a disease process or clinical condition are compelled to maintain contact with individual patients in their study populations. Because of myriad circumstances, most of which might have little effect on the results of the study, many patients may be unavailable for follow-up. They may
132
LLOYD
fail to return for a recall examination because of a family relocation, a loss of interest in the study, an unrelated debilitating illness, or because the condition under scrutiny has bothered them enough that they elect to seek treatment (e.g., placement of a fixed partial denture). The greater the number of patients lost to follow-up, for whatever reason, the less confidence can be placed in estimates of true risk for a given adverse outcome. The effect of patients who are lost to follow-up depends on the size of the population being studied and the rate of risk for the outcome event. For example, if 50 patients in a study population of 1000 were not available for recall, and the calculated risk of the outcome event for those patients examined was 25%, the worst-case scenario (i.e., all 50 experienced the event) would be a rate of 30%. Although this effect may be of statistical importance, it would be unlikely to be of clinical import. If, however, the calculated risk were only 1%, the worst-case scenario would be 6%, an outcome with substantially different clinical implications. To lessen the impact of patients lost to follow-up on a study’s ralidity, investigators need to report the reasons for unavailability. Each unamilable patient should be individually counted and identified. In addition, a comparison should be made, using a multitude of demographic parameters and clinical conditions, between those for whom a complete set of follow-up data was collected and those with partial data on follow-up. Such reporting and analysis increase the confidence that can be placed on the conclusions made. This type of analysis, comparing the demographics of respondents and nonrespondents, was done by Haselton et al1 in a retrospective study of the clinical performance of high-strength all-ceramic crowns over a 3-year period. They showed that the age range, gender distribution, number of ceramic crowns received, and the type of ceramic restoration were comparable between the two groups, allowing readers to place more confidence in conclusions they drew from the patient base available for examination. WHAT ARE THE RESULTS? How Large is the Likelihood of the Outcome Event Over a Specified Period of Time? Of all the questions patients pose, none is more frequent than ‘‘How long will it last?’’ or, in the case of predicting risk, ‘‘What are the odds that it will happen to me?’’ To satisfy the sophisticated patient whose decision whether to be treated may depend on the response to this question, the practitioner should consider crafting an answer that will address the issue even more completely than the patient expects. One could first offer a predication based on absolute prevalence rates—the percent of likelihood that a particular event will occur at
USERS’ GUIDE TO THE DENTAL LITERATURE
133
some time in the future. In the article by Shugars et al6, 12% of the patients who did not receive treatment for a missing posterior tooth lost an additional adjacent tooth. The median time was 2.5 years, with a range of 0.9 to 6.7 years. An additional 13% experienced a tilting of the teeth adjacent to the edentulous space by a distance greater than 2.0 mm, with a median time of 6.9 years and a range of 1.1 to 9.6 years. A second-level response would be to advise the patient of the relative likelihood that she will experience the outcome. This response would involve calculating the relative risk that the event (additional tooth loss) would occur during a specified period of time if no treatment were rendered. In a related article on the same cohort of patients, Shugars et al7 reported the status of teeth adjacent to a bound edentulous space for patients who received no treatment and for patients for whom a fixed partial denture was fabricated. There was a 13% failure rate (e.g., an additional tooth was lost adjacent to the edentulous space) for the untreated group of patients and a 7% failure rate for those who received a fixed partial denture. These rates demonstrate a relative risk of 1.86. In other words, patients in this study were 1.86 times as likely to lose a tooth adjacent to an edentulous space if they received no treatment than if a fixed partial denture were constructed. Finally, to provide the patient with a perspective on the rate at which the event is likely to occur over time (more often than not, there is significant variation), one could provide information gleaned from a survival curve. These graphic representations depict what occurs over the course of time and yield information of potentially great value to the patient. McLaren and White, in a report on the survival rates of InCeram (In-Ceram, Vident, Brea, California) crowns, used multiple survival graphs to show the rate of failure in each successive month (Fig 1).3a
Figure 1. Reasons for loss of service of In-Ceram crowns followed for 36 months. (From McLaren E, White S: Survival of In-Ceram crowns in a private practice: A prospective clinical trial. J Prosthet Dent 83:216–222, 2000; with permission.)
134
LLOYD
In addition, to help practitioners and patients further, they categorised their data to identify specific reasons for failure.
How Precise are the Estimates of Likelihood? Even with the best of intentions and systematic planning, the population selected for study will always be a sampling of the population as a whole. From the data collected, the relative risk for a particular event can be calculated, but the value will be only an estimate. To show the precision of this estimate, confidence intervals (CIs) are used. Confidence intervals help clinicians decide the range within which they can be confident of the relative risk estimate.2 Norderyd et al, reporting on the risk of periodontal disease in a Swedish adult population, found that age is correlated with severe periodontal disease progression.4 Because this was a case-control study, their calculated risks were expressed as odds ratios, with a value of 1.05 for the age correlation. The CI was 1.02 to 1.07, a rather tight range and one indicating that the calculated risk is quite precise.
WILL THE RESULTS HELP THE PRACTITIONERS IN CARING FOR PATIENTS? Are the Study Patients Similar to Those in the Practitioner’s Practice? Regardless of the steps taken to minimize bias, to standardize measurement, and to adjust for differences, a study on prognosis has limited application to one’s practice if the patients under consideration are unlike those one treats from day to day. An adequate base of information on the demographics and clinical conditions for patients used in the study should be reported so that one can judge the level of comparability. Characteristics to consider include age, socioeconomic status, patterns of tooth loss, and medication profile—virtually any characteristic that distinguishes the patient population in a particular practice. The description of the patients involved in the article by Shugars et al6, albeit brief, might be adequate to judge their similarity to the patients a practitioner treat. The article reports the gender distribution (51% female) and the mean age and range of the population (45.5 years, with a range of 24 to 90 years). All subjects were also enrolled in a large group-model health maintenance organization in Portland, Oregon. These data, although limited, do offer some insight into the comparability between patients in this study and in one’s practice.
USERS’ GUIDE TO THE DENTAL LITERATURE
135
Will the Results Lead Directly to Selecting or Avoiding Therapy? It is highly unlikely that the results of any clinical investigation, whether it deals with prognosis or therapy, will be directly applicable to one’s practice. The myriad factors that influence subject selection, measuring, and follow-up protocol should be critically examined to determine what insight, if any, can be applied to a particular clinical situation. Relevance is not and should not be considered an all-or-none situation. Nearly every article contains some evidence that, when used properly, can help support or refute a decision whether to intervene. The article reviewed here6 reported a 13% rate of clinically significant tilting (⬎2.0 mm) of the teeth adjacent to the edentulous space and some loss of alveolar bone around the involved teeth; 12% of the patients who did not receive treatment for a missing posterior tooth lost an additional adjacent tooth. In sharing this information with a patient, a practitioner would be obligated to inform the patient how the subjects were selected and what characteristics could potentially raise or lower that rate, given the patient’s unique set of conditions. Are the Results Useful for Reassuring or Counseling Patients? For the hypothetical patient discussed in this article, there is evidence that she may not suffer significantly if her condition is not treated immediately. Although there is a risk that not replacing her missing posterior tooth will cause her harm, the risk is apparently less than reported by other investigators and clinicians. To help the practitioner be more confident about the counsel he or she provides, the practitioner may want to reread sections of occlusion, fixed prosthodontic, and orthodontic texts. These texts may offer additional theory on how to manage the condition and what other factors should be monitored to ensure that an intervention is both appropriate and timely. ACKNOWLEDGMENT The author expresses appreciation to Anita Makuluni for her insights and editorial comments.
References 1. Haselton D, Diaz-Arnold A, Hillis S: Clinical assessment of high-strength all-ceramic crowns. J Prosthet Dent 83:396–401, 2000 2. Jacob R, Lloyd P: How to evaluate a dental article about harm. J Prosthet Dent 84:8–16, 2000 3. Laupacis A, Wells G, Richardson S, et al: Users’ guides to the medical literature V. How to use an article about prognosis. JAMA 272:234–237, 1994
136
LLOYD
3a. McLaren E, White S: Survival of In-Ceram crowns in a private practice: A prospective clinical trial. J Prosthet Dent 83:216–222, 2000 4. Norderyd O, Hugoson N, Grusovin G: Risk of severe periodontal disease in a Swedish adult population. J Clin Periodontol 26:608–615, 1999 5. Sackett D, Haynes R, Guyatt G, et al: Clinical epidemiology: A basic science for clinical medicine, ed 2. Boston, Little, Brown and Co; 1991, pp 173–185 6. Shugars D, Bader J, Phillips W, et al: The consequences of not replacing a missing posterior tooth. J Am Dent Assoc 131:1317–1323, 2000 7. Shugars D, Bader J, White A, et al: Survival rates of teeth adjacent to treated and untreated posterior bounded edentulous spaces. J Am Dent Assoc 129:1089–1095, 1998 Address reprint requests to Patrick M. Lloyd, DDS, MS Department of Family Dentistry, The University of Iowa College of Dentistry S313 DSB Iowa City, 1A 52242 e-mail:
[email protected]
EVIDENCE BASED DENTISTRY
0011–8532/02 $15.00 .00
BIOSTATISTICAL CONSULTATION FOR DENTAL RESEARCH Jonathan Clive, PhD
What do dentists need to know about statistics, and why do they need to know it? This article suggests some reasonable and convincing answers to these questions. To focus the discussion, dental health care providers are considered as either practitioners and specialists who see patients daily but who do not perform scientific research (PR), or as dental researchers (DR) who may see patients or students but are also actively engaged in research. In this article, it is generally assumed that dental researchers’ activities involve the acquisition and evaluation of some kind of data. The term data simply refers to a description, numeric or otherwise, of the attributes of the experimental units (patients, laboratory animals, teeth, periodontal tissue, and so forth.) being considered. These may be as basic as the number of decayed, missing, and filled teeth (DMFT), or the number of decayed, missing, and filled surfaces (DMFS), gingival index, or some more specialized and exotic measure, such as the number of cells of a particular type per unit volume. This article does not provide a crash course in statistics; in fact, no specific statistics lessons are offered here (although some specific examples are provided). It would be counterproductive to attempt to cover in this limited space subject matter that most introductory statistics texts require several hundred pages to present. Instead, the author discusses related general concepts that he believes are crucial for both dental researchers and dental practitioners to understand before beginning the statistical consultation process. From the Office of Biostatistical Consultation, University of Connecticut Health Center, Farmington, Connecticut
DENTAL CLINICS OF NORTH AMERICA VOLUME 46 • NUMBER 1 • JANUARY 2002
137
138
CLIVE
STATISTICAL NEEDS OF DENTAL PRACTITIONERS Although dental practitioners may not possess formal statistical experience or training, they may frequently use several terms that comprise a basic statistics vocabulary. These terms may carry associations or interpretations that are intuitively understood. Perhaps the terms most often used in this fashion are mean and average. Given a set of numbers (which can be assumed to represent data acquired during the course of some research endeavor), the mean is often interpreted as the most typical or most representative single value describing all the numbers. Although there is some justification for this view, the term has a more rigorous definition. To determine the mean of a set of data, one sums the data values and divides by the number of observations. The mean represents the center of gravity of a set of numbers, the value around which all other numbers are distributed. The standard deviation (or, equivalently, the variance, which is the square of the standard deviation) is another basic summary attribute of data that has a relatively straightforward meaning. It describes the spread or dispersion of the numbers in the dataset around the mean. The larger the standard deviation, the greater the variation, or heterogeneity, among observations. These two measures, the mean and the standard deviation, arise naturally in the logical study of data. Outlining this development is helpful in understanding means and standard deviations and in obtaining an overview of statistical procedures. To do so, it is useful to consider a common graphic portrayal of data, as illustrated in Figure 1A. This figure shows a histogram, which illustrates the distribution of values for some variable, in a sample assessed from some hypothetic population. The distribution of values encompasses the values that occur in the data and the frequency with which they occur. To generate a histogram, the range of the data (smallest and largest values among the observations) is partitioned into successive intervals. A set of DMFS scores, for example, might range from 0 to 40. Intervals of DMFS could be designated as from 0–4, 5–10, 11–15, up to 36–40. The number of observations falling in each of these intervals is then tallied, and a bar graph is generated. The height of the bar for each interval is proportional to the frequency or the number of observations falling within that interval. The resulting plot is called a histogram, and it, too, is an intuitively and easily understood representation of data. (Note that the width of the intervals must be constant so that the comparison of frequencies is meaningful.) BASICS 1: HISTOGRAMS TO BELL-SHAPED CURVES Although useful and informative, the histogram represents little more than a basic tool for exploratory data analysis and is purely descriptive. Statisticians and mathematicians who were not satisfied such heuristic descriptions, carried the idea further, asking what happens when a histogram presents all the information possibly available. This
BIOSTATISTICAL CONSULTATION FOR DENTAL RESEARCH
139
Figure 1. A, A histogram can be used to graphically represent the distribution of values contained in a sample of observations. Each bar corresponds to a range of values of the measure being considered (X), and the height of the corresponding bar (Freq) is proportional to the count or percentage of all observations falling into that category. Histograms can be refined by designating more narrow intervals and increasing the number of observations. B, A histogram may be modeled, or approximated, by an appropriate mathematical function (see text). FREQ frequency.
limit can be approached, it was suggested, by making the intervals more and more narrow as the sample size (the number of observations or data points) becomes larger and larger. As successive histograms are generated under these circumstances, their appearance is increasingly seen to resemble a relatively smooth or continuous curve, in contrast to a single histogram defined over a few intervals, which resembles a conventional bar chart with adjacent bars. The generation of such a curve suggests that it might be possible to use some type of mathematical function to characterize it. A mathematical representation would have several desirable features. First, it would provide a succinct way to describing a distribution, namely, the mathematical function describing the curve. Second, it would serve as a basis for comparing distributions across different populations or groups. Making such comparisons is a basic activity of statistical inference. Using a well-defined mathematical function to describe or model a histogram is illustrated in Figure 1B. Here, a curve has been superimposed around the histogram shown in Figure 1A. The particular curve
140
CLIVE
used here is called a normal curve, or normal distribution. It is also well known as the bell-shaped curve familiar to most scientists. The term normal in normal distribution is used as a name, and derives from the suggestion that the distribution of most attributes in the normal (here used as an adjective) population is well represented by the type of curve shown in Figure 1B. In fact, the normal curve is in many cases an acceptable representation of a distribution, even when the observed histogram is skewed or not symmetric or, in general, somewhat poorly behaved. Furthermore, one of the most remarkable results from mathematical statistics shows that in almost all situations, no matter what the underlying form of the histogram of observations, the distribution of means from the population under study does tend to follow a normal distribution. The normal distribution forms the core of much of statistical theory and practice. It is important to understand the distinction between a histogram (Fig. 1A) and the normal curve (Fig. 1B). The histogram represents the data, or the real world, whereas the normal curve is strictly a model, an approximation generated by statisticians. A normal distribution is uniquely characterized by its mean and standard deviation. These are the same parameters so intuitively familiar to nonstatisticians. The mean of a normal distribution reflects the center of gravity of the values or observations and is often referred to as a measure of location, or a measure of central tendency. The normal distribution peaks at the mean value; that is, the maximum value of the curve along the y-axis occurs at the location of the mean on the x-axis. The standard deviation indicates the spread of the distribution around the mean; the larger the standard deviation, the more flat or less spiked the normal curve.
BASICS 2: CONFIDENCE-INTERVAL ESTIMATES The mean and standard deviation of a normal curve (or any set of data, for that matter) are often presented in the scientific literature as mean standard deviation. For example, an author might write that ‘‘the mean and standard deviation DMFS for the test group was 8.3 4.5, whereas for the control group the mean and standard deviation DMFS was 11.6 5.2.’’ The symbol ‘‘’’ means plus or minus and is technically not correctly used in this context. The standard deviation is a positive quantity. By convention, however, the use of this notation has been almost universally adopted. The best interpretation of a statement of the form mean standard deviation is that approximately 68% of all observations lie within one standard deviation of the mean. Approximately 95% of all observations lie within two standard deviations of the mean. (These statements apply to observations that are well modeled by a normal distribution.) The use of this notation does provide the reader with the notion of an interval in which most of the data are contained. A confidence interval represents
BIOSTATISTICAL CONSULTATION FOR DENTAL RESEARCH
141
the natural extension of this notion of an interval estimate and is another statistical concept commonly encountered in the scientific literature. The use of a mean standard deviation attempts to indicate both the value of a mean and the precision with which the value was determined. Thus, one might simply note that ‘‘the estimated mean DMFS was 8.3.’’ This statement can be extended to include a 95% confidence interval: ‘‘The mean DMFS was 8.3, with a 95% confidence interval given by (6.8, 9.8).’’ This statement means that one is 95% confident that the interval from 6.8 to 9.8 inclusive contains the true mean DMFS for the population under study. A 95% confidence interval given by (6.8, 9.8) is much more precise than a 95% confidence interval given, for example, by (4.8, 11.8). Most scientific journals currently insist on the use of confidence intervals beyond simple point estimates when discussing numerical data. The difference between a simple point estimate and a confidence interval is illustrated in the following statements: (1) ‘‘It is likely that the restaurant is on 7th Avenue and 55th Street’’; and (2) ‘‘I am 95% confident that the restaurant is on 7th Avenue between 52nd and 58th streets.’’ BASICS 3: COMPARING MEANS If two distributions have identical means, one can assume that the distribution of values of the variable being measured are identical in the two groups from which the data were drawn. If two distributions have similar means, the distributions are similar; finally, if two distributions have different means, the distributions are different. Statistics provides a way of estimating how probable it is that two or more means originated from the same underlying population. This probability is called a ‘‘P-value’’ and is the last of the routinely invoked statistical terms considered here. P-values arise when two means are compared statistically. Thus, one may report that ‘‘when the two means were compared using a t-test of independent group means, it was observed that t on 44 degrees of freedom was 4.55, P ⬍ 0.001.’’ The interpretation of this statement is as follows: if there is really no difference between the groups being observed, then the probability of observing a mean difference as extreme or more extreme than that observed is less than 1 in 1000. Because the P-value in the example is less than 0.05, the difference is said to be statistically significant. This cutoff point for statistical significance (0.05) is rather arbitrary but has developed into a standard in the scientific literature over time. The P-value indicates the strength of the evidence against the hypothesis of no difference in means. (The perspective of no difference is used because doing so reflects how the theory of statistical hypothesis testing developed.) Small P-values indicate that the hypothesis of no difference is unlikely. Unlikely does not mean impossible, however; therefore the researcher (or reader) must choose between rejecting the hypothesis of no difference or accepting the conclusion
142
CLIVE
that the data represent the unlikely instance of a large difference in sample means. STATISTICAL NEEDS OF DENTAL PRACTITIONERS— GENERAL CONSIDERATIONS Although dental practitioners generally do not need assistance in designing and executing an experiment, there are circumstances in the activities of daily practice for which some statistical insight is required. For example, dental practitioners need to be able to keep up with the scientific literature, studying and evaluating scientific reports appearing in professional journals. Also, patients may ask questions such as, ‘‘Am I at risk for losing my teeth?’’ or, ‘‘Am I at risk for periodontal disease?’’ or, ‘‘Am I at risk for oral cancer?’’ These are statistical questions. Risk is a statistical concept in epidemiology, and the appropriate estimation of risk factors for major diseases and other clinical conditions is a major topic in biomedical research. The dental practitioner may not need to estimate the risk but rather may need to be able to explain it and discuss it intelligently with a concerned patient. In other circumstances, practitioners may need to evaluate an article in the scientific literature and ponder the implications for their practice. For example, a practitioner may need to decide whether an article presents convincing evidence for switching to a different type of material for restorations. The study by Kilburn and Asmundsson10 serves as an example of the importance of reviewing the scientific literature with a certain degree of skepticism. These authors claimed to disprove the long-held clinical maxim that the anteroposterior (AP) diameter of the chest is increased in patients with advanced pulmonary emphysema (who were compared with a group of nondiseased controls and a second group of patients with non-emphysema diagnoses). In fact, the authors were not at all reluctant to assert that ‘‘it is contended that measurement has destroyed an apparently long-established and often repeated maxim that an increased AP diameter is a common and useful sign of emphysema.’’ The experimental approach used in this study was suspect; furthermore, the authors cited no statistical evidence to support their claim. They did, however, present sufficient tabulated summary measures to enable the reader to carry out the basic statistical test that would have been appropriate for the clinical question. When the reader carries out the test (the level of statistical knowledge necessary to do so would be acquired during the first third of an introductory biostatistics course), the difference in mean AP diameter between the emphysema group and the nondiseased controls is found to be statistically significant, with P 0.04! Thus, not only does this result contradict the main conclusion; the authors themselves have provided the reader with the resources to refute the paper. In this case, the statistical test is a t-test of independent group means,
BIOSTATISTICAL CONSULTATION FOR DENTAL RESEARCH
143
two-tailed, carried out at the 0.05 level of significance. Note that is also possible to perform a one-way analysis of variance given the tabulated summary measures. A one-way analysis of variance permits comparison of all three group means simultaneously, as well as the appropriate multiple-comparison procedures to isolate group differences. Is it reasonable to expect practitioners to be familiar with these terms and to be able to duplicate statistical procedures of the type discussed here in the course of evaluating a journal article? Probably not. Any statistical background practitioners acquired in dental school may be long forgotten, and the practitioner probably faces more pressing concerns involving office management and patient treatment. In addition, the practical import of a study in the literature may need time to propagate to the office of a practitioner. Unfortunately, in the presentation of scientific studies, the situation is often ‘‘let the reader beware.’’18 It is true that since the appearance of the Kilburn, et al article, most journals have increased their requirements for statistical rigor in submitted manuscripts. Many journals retain statistical consultants for special reviews and will use statistically sophisticated referees where necessary. Nonetheless, a practitioner may need to know how to evaluate such issues as the suitability of the experimental design, the appropriateness of the statistical tests used, and whether the results of the test have been interpreted correctly. STATISTICAL TRAINING FOR DENTAL PRACTITIONERS The practitioner must determine what level of statistical insight is appropriate and how to acquire it. A suggested knowledgebase is shown in Table 1. Although there is no one-size-fits-all statistics curriculum, this table lists the basic topics an introductory statistics student should
Table 1. SUGGESTED TOPICS FOR A BASIC STATISTICAL EDUCATION, WITH ASSOCIATED ENHANCEMENTS Topic
Contents
Descriptive statistics Hypothesis testing
Summary measures/graphics Paired, 2-sample t-tests Compare independent proportions (analysis of 22 tables) Power/sample size Type I and type II errors One-way analysis of variance
Bivariate analysis
Univariate regression/ correlation
ANOVA analysis of variance
Enhancements
Higher order contingency tables Linear models/higher order ANOVA designs Multiple linear regression Logistic regression
144
CLIVE
master. Beyond these basic topics, certain enhancements or intermediate topics are given. In the absence of a working knowledgebase of this type, a practitioner may feel it sufficient to ‘‘ask around,’’ at professional meetings, for example, concerning a particular topic. In special circumstances, it may be necessary to secure the services of a statistical consultant from a local university, college, or school of public health. (The department secretary of the statistics or biostatistics department will, in most cases, direct a PR to an available consultant or the director of a consulting facility.) Sometimes these two approaches can be combined, as when a local dental society, for example, invites a statistician to address a meeting and discuss a reference of special interest to the members. The practitioner may also want to audit a course in introductory biostatistics. The course should be presented from a research perspective. Most institutions of higher learning from the community college level to the college, university, or academic health center (including schools of public health) level offer such courses, and auditing privileges can often be obtained. Alternatively, practitioners may want to follow a self-study regimen. There are a number of excellent references for this purpose, well written and especially suited for individual use. For biostatistics, the author recommends Glantz, Primer of Biostatistics8; for clinical epidemiology, Sackett, Haynes, and Tugwell, Clinical Epidemiology17 and Friedman Primer of Epidemiology7; and for general evaluation of the scientific literature, Riegelman, Studying a Study and Testing a Test.16 The truly ambitious practitioner can acquire some statistical software. Statistical package for the social sciences (SPSS) is a Windows-based, user-friendly program, and inexpensive student versions are available. A number of self-instruction texts are available for this package to complement the useful inprogram documentation.2 STATISTICAL NEEDS OF DENTAL RESEARCHERS At this point, it is appropriate to consider the statistical needs of the dental researcher, although readers who consider themselves practitioners are encouraged to keep reading. The statistical needs of dental researchers are generally more pressing (e.g., there is generally less time for a self-instruction approach) and may be more technically advanced than the statistical needs of a practitioner. Because the researcher is actively involved in some form of dental research, the need for statistical input during all phases of the research is important, from the initial formulation of the scientific or clinical hypotheses being considered, through the execution of the project, to the preparation of research papers and presentations. The extent and amount of this input depend, of course, on the magnitude of the project. Clearly, a survey of several thousand patients involving many study
BIOSTATISTICAL CONSULTATION FOR DENTAL RESEARCH
145
variables will require more statistical resources than a study involving 20 or 30 laboratory animals and few study variables. Nonetheless, the planning issues are similar in both cases.
THE ROLE OF THE STATISTICAL CONSULTANT IN DENTAL RESEARCH Certain primary tasks are the biostatistician’s responsibility when interacting with dental researcher. For the present discussion, it is assumed that the consultation goes beyond a simple drop-in visit during which the biostatistician can respond to a few simple questions such as responding to comments in a manuscript review. Ideally, the researcher and the biostatistician meet early in the planning process (at the researcher’s initiative, of course), to discuss the nature of the research, whether it is an intricate experiment with many outcome measures or a largescale clinical trial in which observations are collected at multiple points over time on a large group of patients or experimental animals. It is advisable for the biostatistician to become as familiar as possible with the purpose of the research and the underlying scientific or clinical considerations. This familiarity can accrue over time, as the biostatistician and the researcher meet repeatedly and interact regularly over the course of planning and executing the project. It is not reasonable to expect the statistical consultant to manifest the same level of understanding of the scientific issues as the researcher. It is also not reasonable, however, to expect the researcher to be able to handle the technical mathematical issues of the statistics involved in the research. Thus, in the interaction between researcher and statistician, what the researcher needs to know about statistics mirrors in many ways what the statistical consultant or dental patient needs to know about clinical dentistry. A patient does not need to know the intricate and minute clinical and scientific details of how a therapy works to benefit from it. It is likely that over time a patient will become somewhat familiar with aspects of the dental procedures received. For example, a patient receiving implants would be able to advise other patients on the nature of the process but would not be qualified to apply it. Similarly, the researcher does not need to know whether maximum likelihood or least squares algorithms were used to generate estimates of model parameters. The model needs to target the specific aims of the research and to address the fundamental hypotheses. It is the interpretation of the results of the analysis or modeling process that is crucial. (It is assumed here that the biostatistician has appropriately diagnosed the adequacy of the model—a process referred to as evaluating the fit of the model. That is, this discussion assumes good statistical practice on the part of the statistician, just as it assumes high standards of clinical and scientific conduct on the part of the researcher.)
146
CLIVE
STATISTICAL HYPOTHESES AND METHODS The biostatistician and the researcher need to carefully establish a one-to-one relationship between a set of clinical or experimental hypotheses and corresponding statistical hypotheses. These hypotheses are often expressed in opposite ways. For example, the clinical hypothesis, ‘‘This new treatment, together with proper oral hygiene, will greatly reduce the rate of increase in DMFS compared with proper oral hygiene alone,’’ might be expressed in a statistical context as, ‘‘There is no difference in mean change in DMFS between the drug-and-oral-hygiene and oral-hygiene-only groups.’’ This transcription will help to specify the appropriate statistical procedures to be used in analyzing the data. In some cases, the relationship may be more subtle and may require extensive interaction and question-and-answer sessions between the researcher and the biostatistician. It is quite possible that the type of statistical methods selected will change as new understanding arises on the part of both the researcher and the statistical consultant. New issues may arise that require additional planning. In any case, it is important to establish statistical hypotheses for all primary and secondary clinical and scientific hypotheses and to take the nature of the corresponding outcome variables into account. The statistical procedures should be coordinated with the research-specific aims when a research proposal is being jointly prepared. COMPLICATING FACTORS A number of characteristics of dental research need to be considered when planning statistical analyses. The first characteristic is structural; quite simply, one is dealing with multiple units if one considers individual teeth or even individual tooth surfaces. Statisticians refer to multiple-unit data as high-dimensional data, and such data can seriously complicate both the research planning and the data analysis. The usual approach is to attempt a type of statistical analysis known as a dimension reduction procedure and then study the reduced number of data units. Alternatively, researchers can confine attention to a specified subset of the original units. These issues are discussed by Clive and Woodbury.3 An example of such a situation is the paper by Lo¨e et al, which analyzes data from the well-known study of the progression of periodontal disease among Sri Lankan tea laborers.12 The authors seek to identify and characterize subtypes of disease development based on analysis of loss-of-attachment measurements. Three disease subtypes were identified. Although the clinical utility of these types remains to be established, their descriptive value is clear. Many experiments in dental research involve the acquisition and analysis of longitudinal, or repeated measurements, data, because the development and manifestation of dental disease is often a gradual
BIOSTATISTICAL CONSULTATION FOR DENTAL RESEARCH
147
process. Longitudinal research concerns the assessment of experimental units at several points in time over the entire chronologic course of the research. Such observations are referred to as clustered, or correlated, data because the outcomes for individual subjects may be related over time. It is generally inadvisable to analyze such data in orthodox ways, as if the observations were independent. For example, if a subject’s DMFS is above average at time 1, it is likely to be so again at subsequent readings; this degree of association can influence the analytic results and needs to be accounted for. Developments in both applied and theoretical statistics and computer science over the past two decades have made it possible to deal with these analyses on a fairly routine level; these approaches are discussed in Diggle, Liang, and Zeger’s The Analysis of Longitudinal Data4 and in Littell, Milliken, and Stroup’s SAS System for Mixed Models.11 Armitage et al1 provide an interesting illustration of such research in assessing the use of elastase as a marker for the progression of periodontal disease. This paper, which appeared in the dental literature, is paired with a technical paper from the statistical literature20 that assesses the advanced longitudinal data analytic techniques in the specific context of periodontal disease. These articles are noteworthy in that together they present research evaluating the appropriateness of a new class of data analytic models as well as clinical and scientific applications of the new analyses. The software for implementing these procedures is now routinely available; this was not the case when the papers were published. Other interesting examples of longitudinal data analysis for dental research are given by Neely14 and Chugal et al.2 Neely identifies key risk variables for tooth loss based on analysis of the Sri Lanka data cited previously.12 Subjects were seen between one and seven times, and loss of attachment for two surfaces for each tooth were measured at each point. Tooth loss was also assessed at each point and used as the outcome variable for the analysis. Chugal and colleagues2 investigated factors influencing the success or failure of endodontic therapy. This research modeled the success or failure for each canal. The number of canals within teeth varied, as did the number of treated teeth across patients. In this case, there were, in fact, two levels of clustered data: canals within teeth, and treated teeth within patients. Still another attribute of dental data that complicates analytic considerations concerns the intrinsic lack of precision of some basic measurements; this imprecision is sometimes referred to as noise. A good example of noise is the limit on precision in measuring loss of attachment. Although loss of attachment is a crucial outcome measure in many studies, investigators need to be aware of the extent to which experimental conclusions can be compromised by measurement deficiencies. Certain experimental design considerations devised to deal with the problem of noise are discussed by Imrey and Chilton.9
148
CLIVE
SAMPLE SIZE AND POWER ANALYSIS The estimation of sample size is a crucial aspect of research development. Having too many experimental subjects wastes time and money, a circumstance especially frowned upon by funding agencies. Having too few subjects increases the risk of failing to detect a real effect or a statistically significant difference. In statistical terms, the smaller the sample size, the smaller the power of the experiment, which is the chance of detecting a difference or effect that is really present. In the chronology of planning, it could be argued that sample size estimation precedes the specification of statistical techniques described previously. The selection of methods of analysis often dictates the approach to sample size estimation, however. The design of the study is another factor influencing the estimation of sample size. There are several well-established experimental designs to consider, especially when the research is in the form of a clinical trial. A randomized clinical trial is a design with at least two study groups (test and control) to which eligible patients are assigned randomly. Other designs include case-control, prospective cohort, crossover, and the less scientificly rigorous pilot and observational studies. The selection of an appropriate study design is an important aspect of researcher and biostatistician interaction. Sample size estimation must include a statistical justification in terms of testing the primary research hypotheses and specification of what a clinically or scientifically significant difference is for each of the main outcome variables in the study. The term difference is used here to denote the magnitude of the difference between summary outcome measures across experimental groups. Note that clinical or scientific significance may be different from statistical significance. As Feinstein, citing Gertrude Stein, has noted, a difference has to make a difference to be a difference.6 Thus, rational experiment planning requires the researcher to estimate what difference is noteworthy and to specify sample size accordingly. This planning is in contrast to the haphazard selection of a sample size with the hope that something statistically significant and worth reporting will turn up. Although the biostatistician actually provides sample size estimates, these estimates are based on extensive input from the researcher. In addition to the concepts cited previously, the researcher needs to have a good working understanding of type I and type II errors. In statistical terms, a type I error involves rejecting a true hypothesis of no difference, and a type II error involves accepting a false hypothesis of no difference. The probability of these events are denoted by ␣ and ; the power of a test is the complement of a type II error, with probability 1-. In research terms, type I and type II errors correspond to concluding falsely that an effect is present or concluding falsely that an effect is absent, respectively. Many researchers consider the latter more serious than the former, because failing to detect an experimental effect might
BIOSTATISTICAL CONSULTATION FOR DENTAL RESEARCH
149
lead to loss of interest or motivation in the particular type of research being performed. On the other hand, it is likely that a false effect will be exposed sooner or later in the course of further research. The researcher also needs to provide estimates of the magnitude of the effect of the intervention on each of the outcome variables in one of the research groups, together with an estimate of the variability; these estimates are called pilot data. Making these estimates may seem counterintuitive, because it may reasonably be asked what purpose the research serves if some concept of the size and variance of the intervention is available a priori. In fact, researchers are not presuming to estimate the effect of the experimental intervention but rather to make a reasonable speculation on the response that could be expected in the control or nonintervention group. The pilot data form the basis for estimating the sample size required to observe a given difference. Assume, for example, that an investigator is planning to test the effect of an intervention hypothesized to reduce the rate of loss of attachment. Suppose further that it is known that over some time period untreated individuals will lose an average of 4 mm attachment, with a standard deviation of 3.5, and that these estimates apply to the type of patient population being studied. The biostatistical consultant can use this information to estimate the number of patients needed to detect a specified difference based on given values of ␣ and  and the type of analysis to be used. SOURCES OF PILOT DATA The division of labor is straightforward in sample size estimation. The researcher needs to supply estimates of the appropriate summary measures for the important outcome variables for at least one of the groups in the study (probably the control or non-intervention group). The researcher also needs to provide an idea of the magnitude of a clinically significant effect. The statistician uses these data, together with specifications of ␣ and  and the statistical method to be used, to estimate a sample size. Table 2 outlines the main steps in performing a power analysis. It is permissible to estimate sample sizes for a range of values of ␣ and , as illustrated in Table 3, which shows a sample size table for the hypothetical experiment concerning loss of attachment discussed previously. There are several sources of assistance for the researcher in determining what constitutes a clinically significant difference, as well as providing pilot data for use by the statistical consultant in estimating sample size or power. The scientific literature is a valuable source of background data for this planning. It is quite likely that the researcher has exhaustively reviewed the literature in the course of formally developing the research plan. The literature review may provide multiple sources of pilot data as well as indications of the variation in response across different classes of patients or potential research subjects.
150
CLIVE
Table 2. PRIMARY STEPS IN CARRYING OUT A POWER ANALYSIS Step
Activity
Responsibility
1 2 3 4
Specify clinical hypotheses Determine primary outcome measures Transcribe clinical hypotheses to statistical hypotheses Specify range of clinically meaningful differences for outcome measures Specify ␣,  Obtain pilot data for calculations Obtain power/sample size estimates Evaluate results for final sample size/power specification
DR DR, B DR, B DR
5 6 7 8
DR DR B DR, B
DR dental researcher; B biostatistician; ␣ probability of a type I error;  probability of a type II error
Input from colleagues is also a useful source of data for research planning, especially when inquiries can be focused, so that matters of pilot data and significant effect can be addressed directly. A researcher who is also a practicing dentist may have a set of patient records worth examining. Many providers use the readily available software (such as spreadsheet packages) to construct their own databases, which may be useful for planning; however, it is important to verify the standards under which the data were assessed. Some commercially available software packages for power analysis provide interactive prompts for assisting users through the steps of a sample size determination.6 These steps include a variety of techniques for generating pilot data summary measures from limited input (e.g., estimating the standard deviation for an outcome variable based on estimation of the range or percentile values). These techniques are useful when prior knowledge or data are limited. In some cases, suitable pilot data are lacking altogether. Such a situation may arise, for example, early in the history of a line of scientific inquiry or when first testing a new drug or intervention. Here, the researcher may consider implementing a pilot study. A pilot study is a Table 3. SAMPLE SIZE ESTIMATES FOR HYPOTHETICAL LOSS OF ATTACHMENT STUDY*  Value
Percent Mean Difference
0.10
0.20
10 15 20 25
40 33 29 25
35 28 24 19
*The table is designed to show the number of subjects needed in each of two groups, assuming a repeated measures design with hypothesis testing carried out at ␣ 0.05. Sample size estimates are shown for each of two values of , and each of several effect sizes. ␣ probability of a type I error;  probability of a type II error
BIOSTATISTICAL CONSULTATION FOR DENTAL RESEARCH
151
small-scale, preliminary study designed to assess the feasibility of the proposed research and entails evaluating all aspects of the research (including administrative matters if a clinical trial is being contemplated). One of the primary objectives is to acquire a database for use in formal calculations of sample size needed for a larger experiment or clinical trial to be carried out at a later date. Although relatively small in scale, pilot studies often require as much interaction between researcher and statistician as more formal research, particularly in determining which variables will be assessed. Stopping rules need to be established because, by definition, detailed power analyses are not possible in a pilot study.
DATA MANAGEMENT Data management is a deliberately broad term, incorporating a variety of tasks concerned with data acquisition, storage, confidentiality, editing, and retrieval. These aspects of research execution are of vital importance in assuring the quality of the research. Data management is especially vital in large-scale projects involving the determination of many variables from many research subjects and possibly at multiple time points. Data acquisition begins with the design and planning of datagathering forms. Completed forms need to be machine entered, although machine-readable forms can facilitate this activity. Data should be checked thoroughly. One procedure for checking is dual entry, in which data forms are entered twice, by independent data technicians. The final data files for the two operators are compared data point by data point. With dual entry, the only way an incorrect value can enter the final file is for each operator to make the identical mistake in the identical location. Although dual entry is an effective mechanism for data-entry quality control, it is not always feasible. In large studies or clinical trials, the researcher should plan on printing a randomly selected subset of the entire data file for verification against the data-gathering forms. It may also be prudent to check all values of any variables that are particularly significant. This checking will provide an estimate of the overall dataentry error rate and may suggest variables or areas in the data file that need further attention in the editing process. Once a data file has been checked, edited, and found satisfactory, further examination should involve the search for potential outlying values. This search can be done for all variables in the file and is a basic procedure in the exploratory data analysis phase of the research. All values above the 95th percentile and below the 5th percentile (or some other specified cutoff points) are listed, together with identifying information indicating which record in the data file contains the value. The researcher can then consider this output and flag blatantly out-of-range
152
CLIVE
values for further checking. This procedure should be followed for all major study variables. Data confidentiality is important in clinical trials involving human subjects. Assuring confidentiality usually entails an intermediate step in the data-entry process in which identifying information is replaced with some numeric patient identifier. This is the responsibility of the researcher, who establishes and maintains the key relating the two data fields. Access to the key is limited by and is under the direction of the researcher. The file supplied to the statistical consultant should contain no unique patient data that could be used for identification or to breach subject anonymity. RESEARCH ADMINISTRATION The biostatistical consultant can help the researcher with other aspects of the actual administration and execution of the research project. Several such topics noted here arise in clinical trials and include the randomization of patients, protocol deviations, and the analysis of dropouts and missing data. The randomization of patients refers to the assignment of patients to study groups, usually by some random mechanism. Randomization is generally a straightforward task, and the method of randomization depends largely on the study design. Both the researcher and the statistical consultant need to keep careful track of any protocol deviations. Protocol deviations involve changes in the study design or plan once subject intake has begun. These planning changes are sometimes unavoidable. Subject dropout can be a major problem in longitudinal clinical trials. Patients can leave for a variety of reasons. Some may decide not to continue participating, especially if the experiment involves some unpleasant or invasive procedures. Others may leave the area. Some may become injured or ill. The researcher and the biostatistician will hope fervently that such dropout is random; that is, one is not primarily losing the treatment responders or non-responders or only the most compliant or non-compliant participants. Random dropout implies that subjects who leave a clinical trial before completion of the protocol do so at random, and that the remaining subject groups are still homogeneous with respect to potentially confounding factors. The term intent to treat refers to the analysis of data from all subjects, including those who drop out. The rationale for this method of analysis is discussed in most clinical trial guides; see, for example, Spilker’s Guide to Clinical Trials19 and Piantadosi’s Clinical Trials: A Methodological Perspective.15 Most researchers will also analyze those subgroups of participants completing the protocol. A crucial phase of the analysis of data from a clinical trial is the analysis of dropouts and their comparison with subjects who remained.
BIOSTATISTICAL CONSULTATION FOR DENTAL RESEARCH
153
Missing data can be a problem in surveys or retrospective studies. The difficulty, as with patient dropouts, arises when missing data occur in a nonrandom fashion. Although the treatment of missing data (involving, for example, multiple imputation procedures) is the responsibility of the biostatistician, the researcher needs to participate in planning the procedures for carrying out the study so that that the occurrence of missing data is minimized or the variables prone to absence are of relatively minor importance in the study. DISCUSSION The primary theme of this article is that it is not essential that the researcher have a strong working knowledge of elementary (or higher) statistics to perform valid scientific research. Rather, the researcher should be prepared to work with a biostatistical consultant on an extensive and ongoing basis to plan and execute a research project carefully. The need for this emphasis was recognized by Moses and Louis, who suggested that effective collaboration between clinician and statistician can help identify tractable scientific and statistical problems that need attention and can help avoid undertaking intractable ones.13 Furthermore, they assert that the ‘‘central requirement for successful collaboration is clear, broad, specific, two-way communication on both scientific issues and research roles.’’13 The researcher will need to assist the biostatistician in estimating sample size, in understanding the basics of the science involved, and in relating scientific and statistical hypotheses. The biostatistician should come to appreciate the scientific and clinical issues and underlying principles; likewise, the researcher will come to appreciate how appropriately executed data analysis can extract valuable scientific knowledge from experimental data. Over time, the researcher will acquire the statistical knowledge needed to interpret and present study results. The statistical understanding may be focused and restricted to the methods relevant to the particular study, but it will constitute a useful body of knowledge, appropriate for future studies or as a basis for using other statistical methods in different studies. Most, if not all, scientists are convinced of the utility of mathematical models in representing and studying natural phenomena. Statistical models are mathematical models that incorporate probabilistic measures of uncertainty. In the study of oral health, two primary sources of variation impart this uncertainty. The first is the natural variation among patients in measures of oral health; the second is the variation resulting from sampling, or selecting a subgroup of patients for study, because the entire population of such patients is impossible to access. The progression of data analyses intended to account for this variation, from simple independent group t-tests through complex multivari-
154
CLIVE
ate methods, is one of increasing technologic, mathematical, and statistical sophistication and advancement. It is also a progression that describes considerable theoretic and applied advances by researchers attempting to understand dental disease and how to deal with it. Often, theoretic clarification and understanding derive from the application of more detailed models, as new information about the processes being modeled derive from formalization and logical representation. The influence of this trend of increasing detail and complexity in data analysis for dental research will become more profound in the immediate future, as new developments in dental science occur simultaneously with new advances in statistical theory and computer science. This progression will only increase the need for dental researchers to establish and develop lines of communication with data analysts.
References 1. Armitage GC, Jeffcoat MK, Chadwick DE, et al: Longitudinal evaluation of elastase as a marker for the progression of periodontitis. J Peridontol 65:120–128, 1994 2. Chugal N, Clive J, Spangberg L: A prognostic model for assessment of the outcome of endodontic treatment: Effect of biologic and diagnostic variables. Oral Surg Oral Med Oral Pathol Oral Radiol Endod, in press 3. Clive J, Woodbury MA: Continuous and discrete global models of disease. Mathematical Modeling 7:1137–1154, 1986 4. Diggle PJ, Liang K-Y, Zeger SL: The Analysis of Longitudinal Data. New York, Oxford University Press, 1994 5. Elashoff J: nQuery Advisor Version 4.0 User’s Guide. Los Angeles, CA, 2000 6. Feinstein A: Clinical Biostatistics. Boca Raton, FL, CRC Press, 2002 7. Friedman G: Primer of Epidemiology, ed 4. New York, McGraw Hill, 1994 8. Glantz A: Primer of Biostatistics, ed 4. New York, McGraw Hill, 1997 9. Imrey PB, Chilton NW: Design and analytic concepts for periodontal clinical trials. J Periodontol (suppl) 63:1124–1140, 1992 10. Kilburn KH, Alsmundsson T: Anteroposterior chest diameter in emphysema. Arch Intern Med 123:379–382, 1969 11. Littell RC, Milliken GA, Stroup WW, et al: SAS System for Mixed Models. Cary, NC, SAS Institute, 1996 12. Lo¨e H, Anerud A, Boysen H, et al: Natural history of periodontal disease in man. Rapid, moderate and no loss of attachment in Sri Lankan laborers 14 to 46 years of age. J Clin Periodontol 13:431–440, 1986 13. Moses LE, Louis TA: Statistical consulting in clinical research: The two-way street. Stat Med 3:1–5, 1984 14. Neely A: The natural history of periodontal disease in man. Risk factors for progression of attachment loss in subjects receiving no oral health care. J Periodontol 72(8):1006–1015, 2001 15. Piantadosi S: Clinical Trails: A Methodological Perspective. New York; John Wiley & Sons, 1997 16. Riegelman RK: Studying a Study and Testing a Test, ed 4. Philadelphia, Lippincott Williams & Wilkins, 2000 17. Sackett DL, Haynes RB, Tugwell P, et al: Clinical Epidemiology ed 2. Philadelphia, Lippincott Williams & Wilkins, 1991 18. Sheehan TJ: The medical literature: Let the reader beware. Arch Intern Med 140:472– 474, 1980 19. Spilker B: Guide to Clinical Trials. New York, Raven Press, 1991
BIOSTATISTICAL CONSULTATION FOR DENTAL RESEARCH
155
20. Ten Have TR, Landis JR, Weaver SL: Association models for periodontal disease progression: A comparison of methods for clustered binary data. Stat Med 14:413– 429, 1995 21. Voelkl KE, Gerber S: Using SPSS for Windows. New York, Springer-Verlag New York, 1999 Address reprint requests to Jonathan Clive, PhD Department of Biostatistical Consultation University of Connecticut Health Center MC3805 Farmington, CT 06030 e-mail:
[email protected]
EVIDENCE BASED DENTISTRY
0011–8532/02 $15.00 .00
APPLYING EVIDENCE BASED DENTISTRY TO YOUR PATIENTS James D. Anderson, BSc, DDS, MScD
GENERAL ISSUES A common criticism of evidence based practice is that it seeks to usurp the individual clinician’s judgment, imposing instead an external authority found in the literature that may or may not be appropriate. This criticism is not valid. Indeed, the fourth step of the Evidence based Practice Model (Fig. 1) reserves a place for the individual practitioner’s judgment in the application of the literature to the clinical problem. Evidence based practice therefore seeks to inform clinical decisions, not to impose them. After converting the patient’s problem into an answerable question, searching the literature, and critically appraising the found articles, the clinician must to decide if the valid information that has been revealed can be applied to the patient whose problem triggered the process. To do so, the clinician must consider certain specific factors. First, clinicians cannot allow themselves to be dazzled by elaborate statistics showing extreme measures of statistical significance. In a trial comparing Bra˚nemark and IMZ implants under mandibular overdentures, Boerrigter et al1 found a statistically significant difference in bone level changes between the implant types 1 year after implant placement. The mean scores were 0.5 mm for the IMZ implants and 1.0 mm for the Bra˚nemark implants. This difference was found to be statistically significant (P ⬍
From the Faculty of Dentistry, University of Toronto ; and the Craniofacial Prosthetic Unit, Toronto-Sunnybrook Regional Cancer Centre, Toronto, Ontario, Canada
DENTAL CLINICS OF NORTH AMERICA VOLUME 46 • NUMBER 1 • JANUARY 2002
157
158
ANDERSON
Figure 1. The steps in the model of evidence-based practice. (From Anderson JD: Need for evidence-based practice in prosthodontics. J Prosthet Dent 83:58–65, 2000; with permission.)
0.003), meaning there was only a 0.3% chance that such a difference could have occurred by chance. This difference seems major until one realizes that it is only a 0.5-mm difference and therefore is unlikely to be clinically significant. A highly significant statistical difference is therefore no indicator of a clinically significant difference. Most articles that describe clinical research report their findings on a sample of patients. Often, the sample of patients is intended to represent the whole population. The selected patients therefore should have demographic and disease characteristics similar to those of the population at large. The distribution of age, sex, socioeconomic status, education, nutritional status, and occupational range all should reflect society in general. Similarly, the prevalence, severity, and duration of disease status should also mirror the general population. Clearly, a sample of patients in any given study is unlikely to fulfill all these criteria. Often, the authors do not want to reflect the whole population and limit their sample to persons of a certain age group, or with a history of exposure to an agent such as smoking, or with a clinical condition such as edentulousness. In applying the findings from such studies to the individual patient, a clinician must decide if the patient is similar enough to the study patients for the results to be applicable. One way to do so is to see if the clinician’s patient would have met the inclusion and exclusion criteria to be included the study. Often, some differences are found between the study sample and the present patient. These differences may not make the article useless. A more useful approach may be to reverse the question and ask whether the study population is so different
APPLYING EVIDENCE BASED DENTISTRY TO YOUR PATIENTS
159
from the patient that the results cannot possibly be applied. This approach makes it possible to apply some information from the article. If the study population is divided into subgroups, it may be possible to match the reader’s patient to one of the groups for more focused information. The setting in which the study was gathered can have a major impact on the findings. The results of a new, experimental periodontal treatment tested in a major teaching institution may not be applicable to the patients of a general practitioner because of an effect called referral filter bias. The major teaching institution is likely to attract patients who have more severe periodontal problems than those seen in general practice. Similarly, the treatment they receive at a major center may not be feasible in general practice. The patients’ response to the new treatment, therefore, may not be applicable to the patients of a general practice. An example is the series of patients treated with severe (apical third) periodontal bone loss who were rehabilitated with extensive fixed bridges and aggressive oral hygiene maintenance.6 Such a report offers little help to the general practitioner who sees less severe periodontal destruction, is less likely to undertake such extensive reconstructions, and may not be able to expect such a high degree of patient compliance in oral hygiene. Therefore, if readers are seeking information to apply to their general practice, it will be necessary to pay special attention to how the patients were selected with respect to the severity of their disease and the feasibility of the treatment approach. The important question for the practitioner to ask is, ‘‘Could such circumstances be duplicated in my office?’’ No clinical decisions are made without some element of patient input. The patient’s preferences, priorities, and resources will therefore affect clinical decisions. Stated another way, the social and cultural issues that are important to the patient must be considered when deciding how to apply the information found in a literature search related to the patient’s problem. A new, highly effective treatment approach that takes too long, is likely to be painful, or is too expensive is not appropriate if it is not consistent with the patient’s wishes. Similarly, treatment solutions exist for problems that are not important to some patients. The use of effective veneering techniques makes sense only if the social and cultural pressures on a patient exceed the risks inherent in the technique. For many people, a less than perfect smile is simply not important. To suggest a solution where there is no problem invites disaster. Marketing techniques aimed at creating demand are a concern here.
SPECIFIC APPLICATIONS The general issues discussed previously apply to any situation in which one is contemplating the application of valid information found in the literature to a specific patient situation. There are, however, other
160
ANDERSON
points to be considered when applying certain types of information to the patient’s situation. These are considered in turn. Diagnostic Tests Once valid information about a diagnostic test has been recovered from the literature, the practitioner must decide whether the test will be useful for a given patient. To make this decision, the answers to a few questions will provide guidance7. 1. Is the diagnostic test available, affordable, accurate, and precise in this setting? 2. Can a clinically sensible estimate of the patient’s pretest probability of disease be generated? • Can personal experience or prevalence statistics be drawn on? • Are the study patients similar to this patient? 3. Would the results of the test affect the management and help the patient? • Could the results influence the decision to treat the condition? • Would the patient be a willing partner in the treatment? First and most sensibly, practitioners must be assured that the test is available, affordable, accurate, and precise in their setting. The answer to the first two parts of this question are probably obvious. An electric pulp tester is easily available and usable at reasonable cost in most dental offices. On the other hand, computed tomography and the associated software are less available, and their use is certainly more expensive. The answer to the latter two parts of the question may be less apparent. A diagnostic test that has performed well in the office of a general practitioner may not perform as well in a specialist’s office or university clinic. The reason is that the prevalence of the condition being tested will probably be different in the two settings. Therefore, the difference in the rate of false-positive (or false-negative) findings will change the likelihood ratios of the test. Because the prevalence of the disease may be a major component of the pre-test probability of disease, the test may behave differently in different settings. Thus, the interpretation of the electric pulp test may be different in a general practitioner’s office and in a university teaching endodontic clinic. Similarly, a report of a new test that was validated in a tertiary care center must be applied with caution in a general practice setting. The appropriate use of a diagnostic test begins with a pretest estimate of the likelihood of disease. This estimate may be no more precise than the prevalence of the condition in the population. A patient who presents with throbbing pain and facial swelling, however, raises the pretest estimate of the likelihood of apical periodontitis beyond the general prevalence in the population. Even if the diagnosis is a guess, under these conditions it is a better estimate than simple prevalence. On the other hand, it is more difficult to generate an estimate of the
APPLYING EVIDENCE BASED DENTISTRY TO YOUR PATIENTS
161
pretest likelihood of a malignancy in a young patient with an unexplained ulcer in the palate. Where a practitioner estimates pretest likelihood, the most recent or most dramatic previous events encountered influence his or her judgment.5 The third question Sackett and colleagues pose is whether the results of the test will change the practitioner’s treatment behavior.7 For example, a patient presents with a maxillary lateral incisor consumed by decay that is sensitive to heat and percussion, with throbbing pain and swelling above the apex. It is highly unlikely that an electric pulp test of this tooth will change the treatment behavior of the dentist. In this situation, the treating dentist has already crossed a decision threshold to treat the tooth based on other clinical findings, and the electric test will add no new information. On the other hand, the dentist would not do a biopsy of a lesion in the palate of a teenager who reported burning his mouth on a hot pizza the night before. Similarly, the dentist has crossed a decision threshold in the other direction, deciding not to test the lesion because of the invasiveness of the test and the low probability of a finding that warrants treatment. It is in the area between these extremes, where the results of a test will influence the treatment behavior, that the time, cost, and discomfort of a test are warranted. Finally, if the test is painful or costly, the patient may choose not to know the results of the test rather than submit to the test. For example, a patient may be reluctant to submit to a CT scan with three-dimensional reconstruction to measure bone volume before the placement of two implants in his edentulous mandible when a conventional panoramic film and clinical examination confirm more than enough bone thickness and height. Clearly, the patient must be a willing participant in the diagnostic procedure with an expectation of obtaining valuable new information that will influence the outcome to justify the additional cost or discomfort. Prognosis Whether the information in an article about the prognosis of a condition should be applied to a specific patient can be decided by answering questions specific to this type of article: 1. Will the results lead directly to selecting or avoiding treatment for an individual patient? 2. Are the results useful for reassuring or counseling patients? Knowledge of the natural history of a condition clearly will influence the decision to select or avoid treatment. For example, with the clarification of the prognosis of juvenile periodontitis,8 treatment can be more focused and aggressive. On the other hand, an article by de Leeuw et al3 suggests that in patients with osteoarthrosis and reducing temporomandibular joint disk displacement, the prevalence of pain dropped from 43% to only 17% in 2 to 4 years and dropped further to
162
ANDERSON
only 2.4% after as much as 30 years. It would be difficult to suggest invasive surgical treatment in the face of this information. So, in addition to the general issues noted at the start of this article, readers of an article that describes a prognosis must ask whether the results will lead directly to selecting or avoiding treatment for an individual patient. Unless the results of an article about prognosis can be used in this way, it is unlikely that the results will have any application to the individual patient. In situations such as temporomandibular disorders, providing information for the patient may be enough treatment. Simply giving the patient some understanding of the natural history of the condition can do much to relieve anxieties by providing realistic expectations. A second question for the reader of articles about prognosis, then, is, ‘‘Are the results useful for reassuring or counseling patients?’’ Therapy In dentistry, numerous articles advocate improved techniques or materials over existing therapies. It is not always appropriate to apply the results of every therapeutic improvement to every patient, even if the evidence was found to be compelling when critically appraised. Certain questions specific to articles about therapy will help determine when to apply improvements to patients and when not to: 1. Are the results reported as outcomes that are important to patients? 2. Were all clinically important outcomes reported? 3. Are the likely treatment benefits worth the potential harms and costs? Evidence for the improvement usually takes the form of increased longevity (such as implants or fixed partial dentures), reduced numbers of failures (such as tooth loss), or improvement in subjective parameters (such as comfort or chewing ability). All of these outcomes are important to patients. Sometimes, when these outcomes are rare or take a long time to realize, substitute outcomes, such as attachment loss, bleeding on probing, and mobility, are used to predict those events that are important to patients. The use of these surrogates is reasonable and expedient only to the extent that they do, in fact, predict the events that are important to patients. A meta-analysis presented recently2 suggested that guided tissue-regeneration procedures would result in a mean increase in attachment level of 4.0 mm. This result is impressive, but the application of this information to an individual patient requires that an increase in attachment level predicts greater tooth longevity—an outcome more likely to be of interest to the patient than the level of attachment. If this link has been previously established, this information is meaningful; if this link has not been established, the usefulness of this information is limited, even though it is based on a meta-analysis (a strong design). A further problem was that the underlying studies used
APPLYING EVIDENCE BASED DENTISTRY TO YOUR PATIENTS
163
in the meta-analysis were limited to 1 year of follow-up. The reader of articles that report the results of trials of therapy must therefore be sure that the outcomes reported are important to patients and not merely surrogates lacking in predictive value. To ensure predictive value, it is important that all clinically important outcomes have been reported in the article. In a randomized trial of the efficacy of flurbiprofen taken for 3 months after implant surgery in reducing alveolar bone loss around implants, Jeffcoat and others4 noted that two patients had to be withdrawn from the study, one because of stomach upset and another because of a decrease in red blood cell counts thought to be related to the medication. The trial found a statistically significant reduction in bone loss between the third and sixth months after surgery, but at no other time up to 1 year. The difference in bone mass lost was between 8.6 and 12 mg. The reader therefore must consider whether the additional risks involved in that dosage of the drug are worth the benefit of saving those few milligrams of bone. The significance of saving those few milligrams of bone must also be considered. The clinician must balance the potential benefits of the treatment against the potential harms or costs of the treatment. The information presented in the article informs but does not dictate the clinician’s decision to apply the findings to the particular patient.
SUMMARY It should be evident by now that evidence based dentistry leaves much room for the application of clinical judgment to the literature. This article points out that judgment in evaluating certain factors is essential and that the practice of evidence based dentistry is not a process of blindly following the conclusions found in the literature. Clinicians can safeguard the patient and themselves against the inappropriate use of weak or irrelevant evidence in the conduct of daily practice. This skill adds confidence to decision making in clinical practice and prevents the decline in skills throughout a career.
References 1. Boerrigter EM, van Oort RP, Raghoebar GM, et al: A controlled clinical trial of implantretained mandibular overdentures: Clinical aspects. J Oral Rehabil 24:182–190, 1997 2. Bragger U: Evidence Based Outcomes of Periodontal Therapy: Clinical Decision Making in Prosthodontics and the Impact of Implants. Bern, Switzerland, 2001 3. de Leeuw R, Boering G, Stegenga B, et al: Symptoms of temporomandibular joint osteoarthrosis and internal derangement 30 years after non-surgical treatment. Cranio 13:81–88, 1995 4. Jeffcoat MK, Reddy MS, Wang IC, et al: The effect of systemic flurbiprofen on bone supporting dental implants. J Am Dent Assoc 126:305–311, 1995 5. Kassirer JP, Kopelman RI: Cognitive errors in diagnosis: Instantiation, classification, and consequences. Am J Med 86:433–441, 1989
164
ANDERSON
6. Nyman S, Lindhe J: A longitudinal study of combined periodontal and prosthetic treatment of patients with advanced periodontal disease. J Periodontol 50:163–169, 1979 7. Sackett DL, Straus SE, Richardson WS, et al: Evidence-Based Medicine. How to Practice and Teach EBM, ed 2. Edinburgh, Churchill Livingstone, 2000 8. Zambon JJ, Christersson LA, Genco RJ: Diagnosis and treatment of localized juvenile periodontitis. J Am Dent Assoc 113:295–299, 1986 Address reprint requests to James D. Anderson, BSc, DDS, MScD Department of Dentistry University of Toronto 124 Edward Street Toronto, Ontario M5G 1G6 Canada e-mail:
[email protected]