Guardians of Science: Fairness and Reliability of Peer Review

H.-D. Daniel Guardians of Science VCH Guardians of Science: Fairness and Reliability of Peer Review. H.-D. Daniel Copy...

Author: Hans-Dieter Daniel

74 downloads 699 Views 13MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

H.-D. Daniel

Guardians of Science

VCH Guardians of Science: Fairness and Reliability of Peer Review. H.-D. Daniel Copyright © 1993 VCH Verlagsgesellschaft mbH, Weinheim ISBN: 3-527-29041-9

©VCH Verlagsgesellschaft mbH, D-69451 Weinheim (Federal Republic of Germany), 1993 Distribution: VCH, P. O. Box 101161, D-69451 Weinheim, Federal Republic of Germany Switzerland: VCH, P. O. Box, CH-4020 Basel, Switzerland United Kingdom and Ireland: VCH, 8 Wellington Court, Cambridge CBl IHZ, United Kingdom USA and Canada: VCH, 220 East 23rd Street, New York, NY 10010-4606, USA Japan: VCH, Eikow Building, 10-9 Hongo 1-chome, Bunkyo-ku, Tokyo 113, Japan ISBN 3-527-29041-9 (VCH, Weinheim)

ISBN 1-56081-751-8 (VCH, New York)

H.-D. Daniel

Guardians of Science Fairness and Reliability of Peer Review Translated by William E. Russey

Weinheim · New York Basel · Cambridge · Tokyo

Priv.-Doz. Dr. H.-D. Daniel Das Rektorat der Universitat Schlofi D-68131 Mannheim Germany

This book was carefully produced. Nevertheless, author and publisher do not warrant the information contained therein to be free of errors. Readers are advised to keep in mind that statements, data, illustrations, procedural details or other items may inadvertently be inaccurate.

Published jointly by VCH Verlagsgesellschaft, Weinheim (Federal Republic of Germany) VCH Publishers, New York, NY (USA) Editorial Directors: Dr. Peter Golitz and Dr. Thomas Mager Translator: Prof. Dr. William E. Russey Production Manager: Elke Littmann

Library of Congress Card No. applied for. A catalogue record for this book is available from the British Library. Deutsche Bibliothek Cataloguing-in-Publication Data: Daniel, Hans-Dieter: Guardians of science : fairness and reliability of peer review / H.-D. Daniel. Transl. by William E. Russey. Weinheim ; New York ; Basel; Cambridge ; Tokyo : VCH, 1993 ISBN 3-527-29041-9 (Weinheim...) ISBN 1-56081-751-8 (New York) ©VCH Verlagsgesellschaft mbH, D-69451 Weinheim (Federal Republic of Germany), 1993 Printed on acid-free and low-chlorine paper. All rights reserved (including those of translation into other languages). No part of this book may be reproduced in any form -by photoprinting, microfilm, or any other means -nor transmitted or translated into a machine language without written permission from the publishers. Registered names, trademarks, etc. used in this book, even when not specifically marked as such, are not to be considered unprotected by law. Composition: U. Hellinger, D-69253 Heiligkreuzsteinach. Printing: betz-druckgmbh, D-64291 Darmstadt. Bookbinding: IVB Heppenheim, D-64646 Heppenheim. Printed in the Federal Republic of Germany

Dedicated to the Stifterverbandfur die Deutsche Wissenschaft

About the author Hans-Dieter Daniel graduated with degrees in Psychology, Sociology of Science, and Philosophy of Science from the University of Constance where he earned his PhD in 1983 and the venia legendi for psychology in 1992. Dr. Daniel has authored about 60 publications on research evaluation. He is the German expert member of the MONITOR committee of the European Commission and a consultant expert for the EC Research Evaluation Database. Dr. Daniel was one of the coordinators of the DFG's priority program "Science of Science" and he is the coordinator of the German network "Science Indicators" (funded by the Stifterverband fur die Deutsche Wissenschaft). He was involved in the 1993 nationwide survey of German university students conducted by the magazine DER SPIEGEL. Dr. Daniel is presently privatdozent at the University of Constance and head of the research project "Evaluation of Teaching and Learning in Higher Education" at the University of Mannheim.

Foreword

The Peer Review System. Some like it! Some dislike it! Some believe it is unfair! Some suspect it is ambiguous! Regardless of one's opinion, from the time of its inception in the 17th century it has remained controversial. The book "Guardians of Science" by H.-D. Daniel evaluates the peer reviews presented to the Editorial Office of Angewandte Chemie for all contributions submitted for publication in 1984. It will certainly be of interest to authors, reviewers, and followers of science alike, and hopefully it will help to mollify feelings of animosity and prejudice. Since authors are themselves peer reviewers and vice versa, they may take on a sort of split personality. In their bifunctionality each should be fair to the other for there is no other way to self-respect and self-control. Surely each "twin" can learn form the other. Daniel's book not only explains how the system works but it also teaches what the peer reviewer should be or at least try to become. In this sense, guidelines have been drawn which, when accepted and put into action, may be helpful in future peer reviews. In addition, the book is also good reading. The peer review definitely helps, either directly or indirectly, to improve the quality of published papers. Whether one likes the peer review system or not, and even if it should not be the very best method among the various options available to the scientific community for checking and improving the quality of its published works, it functions and fulfills its task. Should it not already exist, it would inevitably have to be invented. I hope that the "Guardians of Science" will initiate many interesting discussions amongst scientists and, in particular, chemists for the benefit of science. Munich, July 1993

Prof. Dr. H. Noth

Contents

List of Figures, Tables, and Synopses

XI

1

Peer Review as an Instrument for the Self-Regulation of Science

1

2 2.1 2.2

3 3

2.3 2.4

Peer Review as a Target for Criticism The Reliability of Manuscript Reviews Fairness in Manuscript Review: Subjective Judgmental Tendencies and Publication Bias The Validity of Manuscript Evaluation Summary and Assessment of Criticism Leveled at the Peer-Review Process ...

3 3.1 3.2 3.3

The Journal Angewandte Chemie The Category "Zuschriften" (Communications) The Refereeing of Communications Evaluation Form and Comment Sheet

9 9 10 11

4

Communications Received during the Year 1984

13

5

Initial Internal Evaluation, External Review, and Editorial Decisions

15

6

The Reviewers for Angewandte Chemie

17

7

The Reviews

19

8 8.1 8.2 8.3

Reliability of Manuscript Refereeing Statistical Measures for Chance-Corrected Agreement Reviewer Agreement Low Levels of Reviewer Agreement: Statistical Artifact or a Result of the Process by Which Reviewers are Selected?

21 21 23

Fairness in Manuscript Evaluation Lenient and Strict Reviewers Judgmental Tendencies of Reviewers and Publication Bias Academic Title of the Corresponding Author: Reviewer Judgments and Editorial Decisions

29 29 32

9 9.1 9.2 9.2.1

4 5 6

26

33

X

Contents

9.2.2 Subject Matter: Reviewer Judgments and Editorial Decisions 9.2.3 Nationality of the Corresponding Author: Reviewer Judgments and Editorial Decisions 10 The Validity of Manuscript Review 10.1 The Fate of the Rejected Manuscripts 10.2 Comparison of Mean Citation Rates for Accepted Manuscripts and Rejected Manuscripts Published Elsewhere: The Predictive Validity of Editorial Decisions 10.3 The Predictive Validity of Initial Judgments and Reviewer Recommendations

35 42 47 48

51 56

11

Suggestions for Reform of the Peer-Review Process

63

12

Summary

71

Synopses

77

Notes

89

References

99

Index

Ill

List of Figures, Tables, and Synopses

Figures Figure 1. Figure 2. Figure 3. Figure 4. Figure 5. Figure 6. Figure 7.

Figure 8.

Figure 9.

ISI Journal Impact Factors for top-ranked chemistry journals, 1983 to 1991 Evaluation form for communications Lenient and harsh referees Mean ratings by alternative referees involved in the evaluation of communications also reviewed by referees A-H Publication profile for the Federal Republic of Germany in chemistry (F.R.G. share of world output by Chemical Abstracts sections, 1988-90) Citation analysis: search strategy Comparison of mean citation rates for communications accepted by Angewandte Chemie with those rejected by Angewandte Chemie but published elsewhere Comparison of the citation rates for papers accepted and rejected by The Journal of Clinical Investigation but published elsewhere. The mean citation rates for manuscripts rejected by The Journal in 1970 and published elsewhere in 1971 are compared with those for the papers published by The Journal during the same year (Source: Wilson, 1978, p. 1699) Ethical Guidelines to Publication of Chemical Research

10 12 30 31 41 52

53

54 68

Tables Table 1. Table 2.

Table 3. Table 4.

Distribution by corresponding authors (N = 313) of 449 communications submitted for publication to Angewandte Chemie in 1984 13 Research institutions that submitted ten or more communications for publication in Angewandte Chemie in 1984 (in descending order by number of communications submitted) 14 Initial internal evaluation by the editor-in-chief of 429 communications submitted for publication to Angewandte Chemie in 1984 15 Final decision of the editor-in-chief to accept or reject 449 communi-

XII

List of Figures, Tables, and Synopses

Table 5. Table 6. Table 7. Table 8.

Table 9.

Table 10. Table 11. Table 12.

Table 12a. Table 13.

Table 14. Table 15. Table 15a. Table 16.

Table 17. Table 18.

Table 18a.

Table 19.

cations submitted for publication to Angewandte Chemie in 1984 Distribution in the number of communications reviewed by a given reviewer Percentages of first and second referees' responses to items on the evaluation form Agreement in first and second referees' responses to 392 communications submitted for publication to Angewandte Chemie in 1984 Agreement of referees on acceptance or rejection of communications submitted for publication to Angewandte Chemie in 1984 by Chemical Abstracts sections Degree of consensus in first and second referees' recommendations to accept or reject 392 communications submitted for publication to Angewandte Chemie in 1984, by accepted and rejected communications (in %) Concurrence and discrepancy in referees' responses to the question: "Do you recommend acceptance of the Communication?" First and second referees' mean recommendations as a function of the academic title of the corresponding author Publication outcomes for communications submitted to Angewandte Chemie in 1984 as a function of the academic title of the corresponding author (in %) Partition of the chi-square value from Table 12 into specific components with one degree of freedom (df) each according to Kimball (1954) Subject-matter distribution of communications accepted for publication by Angewandte Chemie and communications rejected by Angewandte Chemie but published elsewhere First and second referees' mean recommendations by sections of Chemical Abstracts Publication outcome of communications submitted to Angewandte Chemie in 1984 by sections of Chemical Abstracts (in %) Partition of chi-square value from Table 15 into specific components with one degree of freedom (df) each according to Kimball (1954) Communications submitted and communications accepted for publication as a function of country (in descending order of number of communications submitted) First and second referees' mean recommendations as a function of nationality of the corresponding author and nationality of referee Percentage of communications recommended for publication as a function of nationality of the corresponding author and nationality of first referee Percentage of communications recommended for publication as a function of nationality of the corresponding author and nationality of second referee Publication outcome of communications submitted to Angewandte Chemie in 1984 by German vs. foreign corresponding authors (in %)

15 17 20 24

25

26 27 33

34 34

36 38 39 39

43 44

44

45 45

List of Figures, Tables, and Synopses

Table 19a. Partition of the chi-square value from Table 19 into specific components with one degree of freedom (df) each according to Kimball (1954) Table 20. List of those journals that published communications rejected by Angewandte Chemie in 1984 (in descending order of number of publications) Table 21. "If you are of the opinion that the contribution is not suitable for publication in Angewandte Chemie please indicate which other journal you consider more appropriate." Table 22. Validity of the editor's decision. Comparison of mean citation rate for communications accepted for publication by Angewandte Chemie with the mean citation rate for communications rejected by Angewandte Chemie but published elsewhere, after adjustment of the time window for citation (one-way analysis of covariance) Table 23. Citations for three groups of papers up to 1984 (Source: Lock, 1985, p. 64) Table 24. Validity of initial evaluations by the editor-in-chief for communications submitted for publication in Angewandte Chemie. Comparison of mean citation rates for communications the editor-in-chief thought should be accepted or rejected, as well as communications with respect to which the editor-in-chief was uncertain about the appropriate course of action, after adjustment of the time window for citation (one-way analysis of covariance) Table 25. Validity of first referees' recommendations. Comparison of mean citation rates for communications the first referees thought should be accepted without alterations, accepted after minor alterations, accepted only after major alterations, or rejected, after adjustment of the time window for citation (one-way analysis of covariance) Table 26. Validity of second referees' recommendations. Comparison of mean citation rates for communications the second referees thought should be accepted without alterations, accepted after minor alterations, accepted only after major alterations, or rejected, after adjustment of the time window for citation (one-way analysis of covariance) Table 27. Validity of first and second referees' recommendations combined. Comparison of mean citation rates for communications both referees thought should be accepted or rejected and for communications that received mixed evaluations, after adjustment of the time window for citation (one-way analysis of covariance)

XIII

45 48

50

55 56

57

58

59

60

XIV

List of Figures, Tables, and Synopses

Synopses Synopsis 1. Editor's and referees' comments together with recommendations on communications cited most frequently after their publication (communications ranked by number of citations) Synopsis 2. Editor's and referees' comments together with recommendations on eight uncited communications published by Angewandte Chemie Synopsis 3. Frequently cited communications not accepted for publication by Angewandte Chemie, but published elsewhere (communications ranked by number of citations) Synopsis 4. Communications not accepted for publication by Angewandte Chemie but published elsewhere, and which had not been cited by the end of 1989

77 79

81

85

1

Peer Review as an Instrument for the Self-Regulation of Science1

According to Popper's evolutionary theory of epistemology (cf. Campbell, 1974), scientific understanding develops through a process of critical selection from among variants.2 One of the most important selection mechanisms involves peer review, which consists in effect of soliciting critical evaluations from professional colleagues (peers) with respect to academic appointment (cf. Shils, 1990), grant applications (cf. Cole, Cole & Simon, 1981; Neidhardt, 1988), or manuscripts that have been submitted for journal publication (cf. Lock, 1985).3 Reviewers thus assume the role of the "Gatekeepers of Science" (Crane, 1967), recommending, in the ideal case, only those applicants or manuscripts that meet the highest of scientific standards. Polanyi (1966) regards peer review as embodying the principle of mutual control, fostering the formulation of judgments with respect to the novelty, accuracy, and relevance of research results. Proponents of the system argue that it is more effective than any other known instrument for self-regulation in promoting the critical selection that is crucial to the evolution of scientific understanding (Atkinson & Blanpied, 1985; National Research Council, 1987).

Guardians of Science: Fairness and Reliability of Peer Review. H.-D. Daniel Copyright © 1993 VCH Verlagsgesellschaft mbH, Weinheim ISBN: 3-527-29041-9

2

Peer Review as a Target for Criticism

Ever since the early 1970s peer review as a regulatory mechanism has been the target of increasing criticism (cf. Chubin & Hackett, 1990). The system has been described, for example, as "unreliable, invalid, and harmful to the best type of research—that which is innovative" (Kornhuber, 1988, p. 377). Journals have been urged to abolish the practice of peer review (Mahoney, 1985), and from time to time this step has actually been considered by the editors of various professional journals (e.g., Adair & Trigg, 1979). Given the extensive criticism that has been leveled at peer review, certain journals founded in the 1980s refrained from the practice from the outset (Eysenck, 1980). In 1989 an international conference was held for the first time in Chicago under the motto "Guarding the Guardians" in an attempt to take stock of research into the peer-review issue as it applied to professional journals (cf. Rennie, 1990). Criticism of peer review is based largely on empirical studies that have probed selected questions related to the reliability, fairness, and validity of manuscript refereeing.4 What follows is a review of the current status of research into the three quality criteria for professional evaluations: interreferee agreement, fairness, and predictive validity.

2.1 The Reliability of Manuscript Reviews Editors of professional journals that invoke the peer-review system typically send any manuscript submitted for publication to two experts for their evaluation.5 It is expected that reviewers will examine the manuscripts carefully from a professional point of view, and then recommend either that they be accepted or rejected. Editors of psychological journals (American Psychologist, Developmental Review, Journal of Abnormal Psychology, Journal of Educational Psychology, Journal of Personality and Social Psychology, Personality and Social Psychology Bulletin, Sociometry) were among the first to investigate the extent to which different reviewers arrive at similar recommendations (cf. Patterson & Bailar, 1985, p. 68). The goals and results of such studies have been summarized by Marsh & Ball (1989, p. 153). It was concluded that in the case of psychological journals the extent of agreement between two reviewers, measured on a scale from -1.0 (entirely contradictory recommendations) to +1.0 (complete agreement), corresponded to an average value of 0.27 (intraclass-correlation coefficient).

Guardians of Science: Fairness and Reliability of Peer Review. H.-D. Daniel Copyright © 1993 VCH Verlagsgesellschaft mbH, Weinheim ISBN: 3-527-29041-9

4

2 Peer Review as a Target for Criticism

According to Bakanic, McPhail & Simon (1987, p. 632) a "correlation coefficient" (not further defined) of 0.16 was obtained for the leading sociological journal American Sociological Review.6 Hargens & Herring (1990, p. 14) report for the same journal an intraclasscorrelation coefficient of 0.28. Based on data reported by Lempert (1985, p. 531), editor of the journal Law & Society Review, Hargens & Herting (1990, p. 14) established an intraclass-correlation coefficient for that journal of 0.17. A study by the editor of the highly regarded New England Journal of Medicine indicated reviewer agreement—with a kappa coefficient of 0.26 (cf. Cicchetti, 1991, p. 123)—that was considered only "moderately better than a chance result" (Ingelfinger, 1974, p. 686). In a similar vein, Walling (n. d.), editor of the Journal of the American Chemical Society (JACS), perhaps the most important of the chemical journals, summarized his experience with reviewer appraisals in the observation: "the correlation within pairs (of referees) isn't very good" (p. 2).

2.2

Fairness in Manuscript Review: Subjective Judgmental Tendencies and Publication Bias

From the perspective of the psychology of science, the issue of primary interest here is whether or not poor agreement between reviewers reflects a tendency toward judgmentalism. For example, the observed lack of agreement might be due simply to the fact that specific manuscripts have been submitted to one very harsh reviewer and another of milder temperament (cf. Siegelman, 1991). In the quest for research funds, success apparently depends very heavily upon the choice of the reviewers. Cole, Cole & Simon (1981) established that "the fate of a particular grant application is roughly half determined by the characteristics of the proposal and the principal investigator, and about half by apparently random elements which might be characterized as the 'luck of the reviewer draw'" (p. 885). Journal manuscripts are supposed to be judged solely on the basis of their scholarly quality, not on particularistic characteristics of their authors (Luhmann, 1968). However, in a classic study of the reviewing process as practiced by professional journals, Zuckerman & Merton (197 Ia, b) were able to show that the professional status of the author also influences the probability that a manuscript will be accepted for publication. Another study by Peters & Ceci (1982)—which has itself been a subject of criticism—seems to indicate that the prestige of the research institution with which an author is affiliated can be decisive with respect to whether or not a submitted manuscript will be accepted for publication.7 Mahoney (1977) found evidence that manuscripts supporting preconceived opinions of the reviewers are more likely to be recommended positively than those defending opposing viewpoints. Replication studies (Neuliep & Crandall, 1990), as well as investigations that lead to statistically not significant findings (Sterling, 1959; Begg & Berlin, 1989), apparently stand a rather low chance of publication. Other factors that appear to influence reviewer judgments with respect to a manuscript include nationality (Gordon, 1978), university, and sex of the author, as well as the field from which the work originates (Sahner, 1982).

2.3 The Validity of Manuscript Evaluation

5

Ross (1980, 1993) provides evidence from the literature for a total of 16 types of publication bias. Sociologists of science regard findings such as these as an affront to the prescriptive norms of science, since factors like sex, status, and nationality of the author should play no role whatsoever in assessments of quality.

2.3

The Validity of Manuscript Evaluation

Assessing the validity of decisions by reviewers and editors requires that there exist a generally accepted criterion for scientific quality (cf. Eckmann, 1977; Lindsey, 1989). Unfortunately, it is usually very difficult to establish consensus on this point (cf. Beck & Hartmann, 1983; National Academy of Sciences, 1982). Moreover, a validity test requires information regarding the fate of rejected manuscripts. Research in this area is extremely laborintensive, presumably the reason why so few empirical studies have been conducted into the level of predictive validity associated with the manuscript-review process. In the absence of other operationalizable criteria, studies so far reported have been based exclusively on frequency of citation as a validity criterion. In one study for the National Science Foundation, Small (1974) reached the conclusion that, in chemistry, "papers that became highly cited received generally lower referee evaluations than papers which were cited less frequently" (p. 43).8 The editors of the Journal of Clinical Investigation9 (Wilson, 1978) and the British Medical Journal (Lock, 1985) have undertaken their own investigations into the question of validity. Thus, Wilson (1978, p. 1699) was able to show that the 306 manuscripts accepted for publication in the Journal of Clinical Investigation during the year 1970 were cited twice as frequently in the four years after their appearance as the 149 rejected manuscripts that subsequently appeared in other journals. For reasons of time and cost, Lock (1985) attempted to estimate the validity of manuscript evaluation on the basis of the "ISI Journal Impact Factors"10 (cf. Garfield, 1976 ff.) for journals that published manuscripts previously rejected by the British Medical Journal (BMJ). In the year 1979, the British Medical Journal received for publication 1551 manuscripts, of which 1223 (79%) were rejected. Of these 1223, 836 (68%) were published in other journals, but only 130 (16%) appeared in journals with Impact Factors equal to or greater than the Impact Factor of the BMJ, Lock (1985) speculated that these might in fact be papers whose quality was incorrectly assessed by the reviewers and editors of the British Medical Journal. Nevertheless, since the majority (84%) of the rejected manuscripts appeared in journals with Impact Factors lower than that of the BMJ, the editorial decisions still appear to reflect a rather high degree of predictive validity—just as in the case of the Journal of Clinical Investigation.

6

2 Peer Review as a Target for Criticism

2.4

Summary and Assessment of Criticism Leveled at the Peer-Review Process

Ross (1980, p. ii) summarizes the criticism of the peer-review process in the following way: "Manuscript refereeing, one aspect of peer review and self-management in the sciences, has been shown to be almost wholly lacking in interreferee agreement on the recommendation to publish (r2 = .04), without validity in forecasting the subsequent usefulness of a work to scientists as reflected in citations of the work in other scientific papers (r2 = .00), and biased in more than a dozen ways." Bornstein (1991, p. 139) comes to similar conclusions: "Peer review fails miserably with respect to every technical criterion for establishing the reliability and validity of an assessment instrument" (emphasis in the original). Nevertheless, criticism of the peer-review process overlooks the fact that reviewer disagreement tends to be overstated, because differing judgments reflect not only discordance but also elements of dislocation (Lienert, 1987, p. 320). Discrepancies attributable to interindividual differences in frames of reference (e.g., reviewer A invariably rates manuscripts one level lower than reviewer B) are rarely distinguishable from the differences in judgment that are in fact the issue. Moreover, account must be taken of the fact that different reviewers bring to the task different perspectives and different kinds of competence, not the uniform backgrounds one is forced to presuppose in the usual measures of reliability. Indeed, many editors of technical journals acknowledge that they make a deliberate effort to send manuscripts to one reviewer who is a specialist and another who is a generalist (cf. Lempert, 1985, p. 532; Kiesler, 1991, p. 151; Roediger, 1987, p. 232 and Note 1). A high level of agreement between reviewers in itself proves very little, since two reviewers might reach equally erroneous conclusions—and high reliability is no guarantee of valid judgments (cf. Kraemer, 1991). For these reasons Mahoney (1985) tends to be rather skeptical of what appears to be good reviewer agreement: "Many of the attacks and defenses of peer review and editorial policies have focused on the issue of reliability... and have overlooked the frailty of consensus as a form of epistemic warrant. Enforced reliability is not a likely solution; indeed, it might well exacerbate the problem" (p. 32, footnote 2; emphasis in the original). The few reported findings regarding the predictive validity of the peer-review process are mutually contradictory. One study by Small (1974), which concludes that chemistry manuscripts receiving favorable reviews are cited less frequently after publication than those judged negatively, is based on an extremely small set of data. By contrast, the statistically more broadly based studies by Wilson (1978) and Lock (1985) support the premise that the peer-review process does in fact function as a "quality filter"—at least for medical journals—and that it fulfills its assignment as an instrument for the self-regulation of science. Judgmental tendencies on the part of reviewers, as well as publication biases, can constitute a threat to the fairness of the reviewing process. Whether or not this is harmful to the progress of science is a question that has scarcely been investigated. Basic psychological research into the formation of social judgment has shown that reducing bias—by providing special training for evaluators, for example—does not necessarily increase the validity of the resulting decisions (Funder, 1987).

2.4 Summary and Assessment of Criticism Leveled at the Peer-Review Process

7

Bailar & Patterson (1985) criticized the existing studies on manuscript review for professional journals as follows: "Most studies of journal peer review have been methodologically weak, and most have focused on process rather than outcome. A large part of the published work deals with papers on psychology and related disciplines, much of the remainder has been designed and executed as research in the sociology of science rather than in technical communication" (p. 656). Against this background of previously conducted investigations, the present study raises the question of the extent to which criticism of the peer-review process is justified, using as an example reviews conducted for the journal Angewandte Chemie. Here for the first time a chemistry journal has been taken as the basis for a systematic and comprehensive examination of the reliability, fairness, and validity of manuscript review.

3

The Journal Angewandte Chemie11

Angewandte Chemie, edited by the Gesellschaft Deutscher Chemiker and produced by VCH Verlagsgesellschaft, is rated—along with the Journal of the American Chemical Society (JACS)—among the world's leading chemistry journals. Since 1985 it has in fact enjoyed a higher ISI Journal Impact Factor than JACS (cf. Fig. 1 and Pendlebury, 1988; Grissom, 1991). Angewandte Chemie appears monthly, and publishes "Aufsatze" (review articles), "Zuschriften" (communications), "Buchbesprechungen" (book reviews), and "Correspondenz" (correspondence) in the German language. Since 1961 there has also existed a complete English version of the journal, with issues released in the same month as the German originals under the title Angewandte Chemie International Edition in English. Peer review was introduced into Angewandte Chemie in 1982, primarily in conjunction with communications. Manuscripts as submitted are normally examined by two independent reviewers, and in the event of disagreement additional reviewers may be engaged. Like the book reviews, many of the review articles are prepared at the invitation of the editor-in-chief. Only in unusual cases are the review articles subjected to external review. Correspondence related to publications in Angewandte Chemie is published only very rarely, and is again not subject to external review.12

3.1

The Category "Zuschriften" (Communications)

"Communications" are short notes (limited to six manuscript pages) dealing with work in progress or recently concluded experimental or theoretical investigations from any of the various branches of chemistry. Such a communication—described by other publications as a "letter" or a "note"—is expected, because of its significance, novelty, or wide applicability, to be of broad general interest, or at least of special utility in the development of some important area of research.13 It must also be so written that even a non-specialist will recognize the significance the author attaches to the findings. Contributions that fail to meet these criteria are not accepted for publication even if they are otherwise beyond criticism from the standpoints of content and form (cf. the Instructions to Authors for Angewandte Chemie, published in each January issue).

Guardians of Science: Fairness and Reliability of Peer Review. H.-D. Daniel Copyright © 1993 VCH Verlagsgesellschaft mbH, Weinheim ISBN: 3-527-29041-9

10

3 The Journal Angewandte Chemie 5.5

1983

1984

1985

1986

1987

1988

1989

1990

1991

Year Angewandte Chemie JACS Organometallics JCS Chem. Comm. Tetrahedron Letters J. Organomet. Chem. Figure 1. ISI Journal Impact Factors for top-ranked chemistry journals, 1983 to 1991

3.2

The Refereeing of Communications

Each submitted manuscript is assigned a sequential communication number reflecting the date of its receipt, and the editor-in-chief sends an official note of confirmation to each corresponding author. A member of the editorial staff then reads the manuscript and proposes the names of two reviewers. In a very small number of cases the editor may immediately

3.3 Evaluation Form and Comment Sheet

11

reject the manuscript, believing a review to be unnecessary (two percent of the manuscripts received in 1984 were never subjected to external review).14 Reviews of all communications are conducted under the system of one-sided anonymity: i.e., reviewers are provided with the names of the authors, but authors are not told the identities of the reviewers. Roughly 20% of the reviewers respond immediately, while 60% take advantage of the allowed 14day review period and 15% react only after one or more reminders. In about 5% of the cases the editor-in-chief is forced to abandon pursuit of a particular review due to a continued lack of response after multiple reminders. Receipt of one positive and one negative review leads in about 30% of the cases to the involvement of a third reviewer. In the event of an appeal—ca. 7% of the negative decisions evoke protests from the authors—the editor-inchief turns to the services of a "reviewer-in-chief', frequently a member of the Angewandte Chemie Advisory Board. This Advisory Board consists of 14 members representing several different areas of specialization and drawn from industry, higher education, and non-university research institutes. Its role is to provide guidance and supervise the work of the editor and the editorial staff. Members of the Advisory Board are selected by the Board of Directors of the Gesellschaft Deutscher Chemiker (the German Chemical Society; cf. Golitz, 1990). A communication is normally accepted or rejected only after consideration of the referees' recommendations and comments. Comments furnished by the reviewers, or excerpts therefrom, are in most cases passed along to the authors, especially if the reviewers recommend rejection, or when they suggest that a manuscript be revised or supplemented. Anonymity of the reviewers is in every case strictly maintained. Of the communications accepted for publication, roughly one-third proceed immediately to editorial processing; the remainder are returned to the corresponding authors for revision, accompanied by some or all of the reviewers' comments. Authors may at this point decline to revise their manuscripts, although they would be expected to support their stands with plausible arguments. Any such argument would first be considered by the editor-inchief, but it might also be forwarded to the reviewers for further comment. Assuming a communication passes the test for acceptance, publication would be expected to follow: in an optimal case—i.e., prompt reviewing, no requests for changes, proper attention to stylistic matters—within six to eight weeks of its receipt in the editorial offices (cf. Heller & Kirstatter, 1989).

3.3

Evaluation Form and Comment Sheet

Reviewers receive with each manuscript a fully structured evaluation form together with a separate sheet for comments.15 The evaluation form contains a set of six questions and associated response categories (cf. Fig. 2). Included on each evaluation form and comment sheet is the number assigned to the communication by the editorial staff, an abbreviated title, the name of the corresponding author, and the date by which the reviewer is expected to respond. A date of receipt for the completed review is added when the forms are returned.

12

3 The Journal Angewandte Chemie

1) Are the contents of the manuscript a) of wide and general interest?

Yes D b) of extraordinary but special interest? Yes D

No D No D

2) Do the data obtained by experiment or calculation verify the hypotheses and conclusions?

Yes D

No D *>

3) Is the length of the manuscript appropriate to its contents?

Yes

D

No, the manuscript is too long

D *}

No, the manuscript is too short

D *}

4) The form of the manuscript (text, figures, tables, nomenclature etc.) is beyond reproach.

Yes D

No D *>

5) Do you recommend acceptance of the Communication? Yes, without alterations

Π

Yes, after minor alterations

Π *}

Yes, but only after major alterations

D *}

No

D

6) If you are of the opinion that the contribution is not suitable for publication in Angewandte Chemie please indicate which other journal you consider more appropriate?

*} Please give comments on the enclosed sheet. Figure 2. Evaluation form for communications

4

Communications Received during the Year 1984

The year 1984 was chosen for evaluating the Angewandte Chemie peer-review process because this was the first year after introduction of the system in which all communications were judged by two independent reviewers on the basis of a uniform rating form.16 Moreover, examining communications from the year 1984 ensured that adequate time would be available for analyzing frequencies of citation for both accepted manuscripts and manuscripts that were rejected by Angewandte Chemie but subsequently published elsewhere. The journal Angewandte Chemie received for possible publication in 1984 a total of 449 communications, prepared by 313 different corresponding authors. Table 1 shows the distribution of these manuscripts as a function of author. Three-fourths of all corresponding authors submitted only a single communication, while a very few provided Angewandte Chemie with as many as five such manuscripts during the year. Table 1. Distribution by corresponding authors (Λ/= 313) of 449 communications submitted for publication to Angewandte Chemie in 1984 Number of communications submitted

Corresponding authors No.

%

5 communications

7

2

4 communications

9

3

3 communications

15

5

2 communications

51

16

1 communication

231

74

These 313 corresponding authors represented 141 research institutions in 21 countries. The institutional sources accounting for the largest number of communications to Angewandte Chemie during 1984 are listed in Table 2. Generally speaking, a classification of the manuscripts according to corresponding author and research institution fails to reveal the striking concentrations one associates with "in-house periodicals" (cf. Yotopoulos, 1961; McDowell & Amacher, 1986; BackesGellner & Sadowski, 1988).

Guardians of Science: Fairness and Reliability of Peer Review. H.-D. Daniel Copyright © 1993 VCH Verlagsgesellschaft mbH, Weinheim ISBN: 3-527-29041-9

14

4 Communications Received during the Year 1984

Table 2. Research institutions that submitted ten or more communications for publication in Angewandte Chemie in 1984 (in descending order by number of communications submitted) Research institution [1 ] U η i versity of Wu rzbu rg

No. of communications submitted 20

Max Planck Institute for Coal Research, Mulheim

19

University of Munster

18

University of Gottingen

16

University of Bonn

15

University of Frankfurt

15

University of Marburg

13

Technical University of Munich

13

University of Heidelberg

12

University of Munich

12

University of Hamburg

11

University of Tubingen

11

University of Cologne

10

[1] Institutional affiliation of corresponding author

5

Initial Internal Evaluation, External Review, and Editorial Decisions

Communications received by Angewandte Chemie are first subjected to an informal internal evaluation. Thus, the editor-in-chief indicates on a standard form whether or not in his opinion a given communication should be accepted or rejected, or if he is in doubt as to the proper course of action. This preliminary evaluation is generally accompanied by a brief comment: "good work"; "very nice"; "interesting reaction ... quite poorly written"; "straightforward and brief, perhaps useful"; "provided the structure is correct, then everything is fine"; "much speculation, but little that is verifiably new"; "nothing new"; "too specialized"; "there's not enough here". Based on the initial appraisal of the editor-in-chief, 35% of the communications submitted in 1984 were worthy of publication and 8% should have been rejected. The editor was uncertain about the appropriate course of action for 57% of the manuscripts. Out of a total of 449 communications, 18 (4%) received no initial appraisal from the editor (cf. Table 3). Table 3. Initial internal evaluation by the editor-in-chief of 429 communications [1] submitted for publication to Angewandte Chemie in 1984 The communication is ...

Communications no. %

acceptable

151

35

questionable

243

57

35

8

not acceptable

[1] 18 communications received no initial internal evaluation by the editor-in-chief, two files are missing

After taking into account the subsequent formal reviews, the editor-in-chief eventually accepted 72% of the submitted communications; 26% were rejected, and in 2% of the cases the manuscripts were withdrawn by the authors themselves (cf. Table 4). Table 4. Final decision of the editor-in-chief to accept or reject 449 communications [1] submitted for publication to Angewandte Chemie in 1984 Editor's final decision

Communications no. %

Acceptance

323

72

Rejection

115

26

9

2

Manuscript withdrawn by the authors [1] Two files are missing

Guardians of Science: Fairness and Reliability of Peer Review. H.-D. Daniel Copyright © 1993 VCH Verlagsgesellschaft mbH, Weinheim ISBN: 3-527-29041-9

16

5 Initial Internal Evaluation, External Review, and Editorial Decisions

Of the 151 manuscripts regarded by the editor as acceptable from the outset, 95% were actually published after external evaluation, whereas 3% were rejected and 2% were withdrawn by the authors. Of the 35 communications subject to a negative reaction at the time of their receipt, 18% were nonetheless accepted based on the strength of the external reviews, while 80% were rejected and 3% were withdrawn. Two-thirds of the communications about which the editor expressed doubt were accepted after external review (N = 155), nearly one-third (N= 76) were rejected, and 3% were withdrawn by the authors. Angewandte Chemie accepted 63% of the 18 communications for which no preliminary assessment was issued, and the remaining 37% were rejected.

6

The Reviewers for Angewandte Chemie

Each of the communications submitted to Angewandte Chemie in 1984 was sent for evaluation to two reviewers. (In the discussion that follows we refer repeatedly to "first" and "second" reviewers. Since the two reviewers acted independently and with equal authority, this distinction is not meant to imply any difference in stature.) Instead of the expected 878 reviews, the editor-in-chief in fact received only 856 first and second reviews. Three percent of the reviewers contacted failed to respond—presumably for a variety of reasons, including conflict of interest, absence from their post, competing obligations, or lack of perceived competence with respect to the subject matter in question. In addition to the 856 first and second reviews, the editor-in-chief also solicited and received 43 third and fourth reviews.17 The complete set of 899 evaluations can be attributed to 315 different reviewers; approximately one-half (48%) of the experts in question provided advice with respect to only a single communication.18 Quite obviously, influence in the case of Angewandte Chemie is not concentrated in the hands of a small number of reviewers. Only ten reviewers evaluated ten or more manuscripts during 1984 (cf. Table 5), and the average Ange· wandte Chemie reviewer provided evaluations for three submitted communications. Table 5. Distribution in the number of communications reviewed by a given reviewer

No. of referees

No. of reviews

1

7

9

22

1

6

12

12

5

5

15

11

1

4

20

10

2

3

31

9

8

2

52

8

6

1

152

No. of reviews

27

No. of referees

Note: 315 referees provided a total of 899 reviews

Scientists from the Federal Republic of Germany prepared 83% of the reviews. The remaining reviews were solicited from chemists in Switzerland (a total of 54 reviews), the United States (33 reviews), France (20 reviews), the Netherlands (19 reviews), Great Britain (13 reviews), and six additional countries. Thus, 91% of the foreign reviews originated Guardians of Science: Fairness and Reliability of Peer Review. H.-D. Daniel Copyright © 1993 VCH Verlagsgesellschaft mbH, Weinheim ISBN: 3-527-29041-9

18

6 The Reviewers for Angewandte Chemie

in Switzerland, the United States, France, the Netherlands, and Great Britain. Chemists from over 100 different institutions were engaged in preparing reviews for Angewandte Chemie during 1984. In addition to scientists from the Max-Planck-Institut fur Kohlenforschung in Miilheim, professors at the Universities of Bonn, Frankfurt, Hamburg, Munchen, Strasbourg, Wiirzburg, and Zurich were particularly active in reviewing communications for Angewandte Chemie. A configuration frequency analysis of the code numbers for the first and second reviewers reveals that the editors of Angewandte Chemie relied on a total of 370 different reviewer pairs in 1984. Of these pairs, 324 evaluated only a single communication, 37 pairs received two manuscripts, 7 received three, and one pair each was entrusted with four and five manuscripts.19 The fact that certain pairs were assigned more than one manuscript for review is largely attributable to a single cause: authors occasionally submit for publication in Angewandte Chemie several communications simultaneously. In order to ensure that all the manuscripts in such a set are sufficiently distinct to warrant separate publication, a single reviewing team may be requested to evaluate the complete set of manuscripts.

7

The Reviews

Reviewers receive from the editors a fully structured evaluation form (questionnaire), together with a separate comment sheet. The evaluation form includes six questions (cf. p. 12), each with specified response options. Some of the possible responses are designated with asterisks, signifying that reviewers are encouraged to respond in greater detail on the comment sheet. Most of the reviewers in fact do supply responses on both the form and the comment sheet. Nevertheless, a few reviewers (9% of the reviewers during 1984, representing 5% of the reviews) decline to fill out the evaluation form as a matter of principle, restricting their responses to more or less extensive comments. Other reviewers fill out the form, but then elect not to provide any additional commentary.20 Only about one-fifth (21%) of the evaluation forms were filled out completely. Reviewers appeared to have the most difficulty with the first question: "Are the contents of the manuscript (a) of wide and general interest (responses: "yes", "no"), (b) of extraordinary but special interest? (responses: "yes", "no")". Since the two parts of the question are not mutually exclusive, considerable misunderstanding existed as to whether the editor expected one or two responses. Roughly 30% of the reviewers left the first part of the question blank, and 51 % the second part. The question regarding whether the form of the manuscript is beyond reproach was ignored by 19% of the reviewers. A total of 16% declined to indicate whether the experimental data or calculations supported the proposed hypotheses and conclusions. With respect to the appropriateness of the length of the manuscript, reviewer responses were missing in 14% of the cases. Reluctance to respond was least prevalent on the question of whether or not a manuscript should be accepted: only 4% of the reviewers failed to pass judgment. Table 6 records the frequency distributions for responses to the first five questions on the evaluation form (responses to question 6—"If you are of the opinion that the contribution is not suitable for publication inAngewandte Chemie please indicate which other journal you consider more appropriate?"—are discussed in Section 10.1). The questions regarding content, supportive data, length, and form of the manuscript elicited generally positive responses (64-87%). In the majority of cases the final recommendation was for acceptance of the communication, specifically "after minor alterations" or "only after major alterations" (mean response to the question for all first and second reviews on a fourcategory rating scale: 2.4).

Guardians of Science: Fairness and Reliability of Peer Review. H.-D. Daniel Copyright © 1993 VCH Verlagsgesellschaft mbH, Weinheim ISBN: 3-527-29041-9

20

7 The reviews

Table 6. Percentages of first and second referees' responses to items on the evaluation form Evaluation form item and available response categories 1 ) Are the contents of the manuscript a) of wide and general interest? Yes

No No response b) of extraordinary but special interest? Yes

No No response 2) Do the data obtained by experiment or calculation verify the hypotheses and conclusions?

Yes No No response 3) Is the length of the manuscript appropriate to its contents?

Yes No, the manuscript is too long No, the manuscript is too short No response 4) The form of the manuscript (text, figures, tables, nomenclature etc.) is beyond reproach.

Yes No No response 5) Do you recommend acceptance of the Communication? Yes, without alterations Yes, after minor alterations Yes, but only after major alterations

No No response

First referees Second referees Total N = 436 N = 420 N = 856

65 35

66 34

64

31

29

30

70

71

71

30

29

51

29 52

86

88

87

14

12

13

15

18

16

75

77

76

15

12

13

11

11

11

14

15

14

69

72

71

31

28

29

16

21

19

19

20

19

44

39

42

36

51

15

18

17

23

23

23

4

3

4

8

Reliability of Manuscript Refereeing

At first glance it would appear that first and second reviewers achieved a high level of agreement on all the questions. Thus, 19% of the first reviewers and 20% of the second reviewers recommended acceptance without alteration, and 23% in each case recommended rejection (cf. Table 6). Table 7 (p. 24) reveals that the percentages of agreement are very high in virtually all categories. With regard to the question of whether the data or calculations presented are supportive of the proposed hypotheses and conclusions (question 2), 82% of the reviewer pairs agreed in their answers. The percentage of agreement was smallest for the final recommendation (question 5)—as would be expected, given that here there are four possible responses rather than only two: the two reviewers agreed completely in their recommendations in only 38% of the cases. Nevertheless, percentage of agreement is not a suitable measure for judging the reliability of reviewer recommendations, because it fails to take account of chance agreement (Watkins, 1979). In question 2, for example, the expected level of chance agreement is 78%, and even in question 5 it is 29% (cf. Table 7, column 4, p. 24).

8.1

Statistical Measures for Chance-Corrected Agreement

Numerous suggestions have been made for estimating the level of inter-referee agreement. Conger & Ward (1984) discuss 16 measures for determining agreement between two raters on the basis of two-category nominal scales alone. Because of their practical and theoretical advantages, three methods have come to dominate the literature: the kappa statistic of Cohen (1960), the weighted kappa statistic (Cohen, 1968), and the intraclass correlation of Fisher (cf. Ebel, 1951). Cohen's kappa statistic is indicated in the case of binary and nominal data (e.g., as with questions 1-4 of the reviewing form), whereas the weighted kappa statistic and the intraclass correlation are appropriate for cardinal data, as in question 5 (cf. Bortz, Lienert & Boehnke, 1990; Cicchetti, 1991, pp. 120-121). The formula for Cohen's kappa statistic is: Kappa = °

e n

,

Guardians of Science: Fairness and Reliability of Peer Review. H.-D. Daniel Copyright © 1993 VCH Verlagsgesellschaft mbH, Weinheim ISBN: 3-527-29041-9

22

8 Reliability of Manuscript Refereeing

in which PQ stands for the empirically established fraction of concordant judgments, which must be corrected by the fraction of concordant judgments expected on the basis of chance alone (F6), which is readily determined by finding the joint probabilities of the marginals in a two-way table. The term 1 - Pe in the equation refers to the maximum possible difference between observed and chance agreement. The upper limit of kappa is + 1.0, occurring when there is perfect agreement between the two referees. The lower limit of kappa is between O and - 1.0 (depending on the marginal distributions). Kappa treats all disagreement equally. The weighted kappa described by Cohen (1968) provides for the incorporation of ratio-scaled degrees of disagreement to each of the cells of the k x k table of joint nominal scale assignments such that disagreements of varying gravity are weighted accordingly. The choice of specific linear weights is in principle arbitrary. Following Cicchetti (1976) and Hall (1974) we have chosen to use linear weights with weighted kappa for question 5 of the reviewing form, since the data are presumably ordinal. Thus, a case of complete reviewer agreement is assigned zero points, responses from adjacent categories receive one point, two responses separated by two categories receive two points, and the maximum degree of discrepant judgment is assigned three points.21 Although no corresponding provisions were offered in the definition of kappa by Cohen (1960), Lienert (1978, p. 647) notes that in the case of questions with more than two answer categories there is nothing to prevent developing separate estimates of agreement between two judges in each of k response categories by dichotomizing the response scale. This socalled category-specific agreement can be computed according to the following equation (cf. Fleiss, 1981, p. 217): 2 (ad -be) v Kappacs = —- , in which the letters a-d represent frequencies in a 2 χ 2 table, p{ and q{ are the first and second row sums, and/?2 and q2 are the two column sums. Crandall (1978) has suggested computing agreement coefficients in the case of ordinal data in such a way that minor disagreements in reviewer judgments (e.g., differences by one response category) are treated as concordances. Such kappa coefficients have been characterized by Tolman, Farrier & Farrier (1988, pp. 3^1·) as "Kappa with scores computed as agreement if within one point." It is often asserted that a response scale like that for question 5 of the Angewandte Chemie reviewing form results in data based on scaled intervals. If the objects of the study—short communications, for example—are all evaluated by different pairs of review22 ers, then the result of interest is an estimate of the reliability of an "average" reviewer. A statistic of choice here would be the intraclass correlation coefficient. The formula for the intraclass correlation coefficient (ICC), when different sets of reviewers evaluate each manuscript, derives from a repeated-measures (i.e., across reviewers) analysis of variance (ANOVA) model, and can be defined as: MS^-MS1 ICC = - I)MS1

8.2 Reviewer Agreement

23

in which MSb stands for the mean square between subjects (= communications), MS1 for the mean square error,23 and η for the number of reviewers per manuscript (cf. Cicchetti, 1991, p. 120, as well as Rosenthal, 1991, p. 160). In theory, the intraclass correlation coefficient can vary between -l.0/(n - 1) and + 1.0; i.e., for two reviewers per manuscript, between -1.0 and+ 1.0.

8.2

Reviewer Agreement

Table 7 indicates the extent to which communications submitted for publication in Angewandte Chemie during 1984 were subject to concordant evaluations by the corresponding reviewing pairs (first and second referees). The level of chance-corrected reviewer agreement was very low for all five questions on the reviewing form. Kappa coefficients range from 0.12 (question 4: "Is the form of the manuscript beyond reproach?") to 0.23 (question Ia: "Are the contents of the manuscript of wide and general interest?"). A kappa coefficient of 0.23 indicates that the reviewers agreed in their evaluations for 23% more of the manuscripts than would have been predicted on the basis of chance alone. The weighted kappa coefficient for the ultimate reviewer recommendation (question 5: "Do you recommend acceptance of the communication?") is 0.20, and the intraclass correlation coefficient is 0.25.24 Four of the six kappa coefficients are statistically highly significant (Question Ia: "Are the contents of the manuscript of wide and general interest?", kappa = 0.23, Z-value = 3.31,25/? < .001; Question 2: "Do the data obtained by experiment or calculation verify the hypotheses and conclusions?", kappa = 0.17, Z-value = 2.82, ρ < .01; Question 3: "Is the length of the manuscript appropriate to its contents?", kappa = 0.13, Z-value = 2.92, ρ < .01; Question 5: "Do you recommend acceptance of the communication?", weighted kappa = 0.20, Z-value = 4.86, ρ < .0001). These coefficients of reviewer agreement are the first ever calculated for a professional journal in chemistry,26 but the levels of reviewer agreement are very similar to those reported for peer reviews of behavioral science and life science manuscripts. The chance-corrected reliability coefficients generally fall in the range 0.20-0.40 (cf. Cicchetti, 1991, p. 123). This observation applies not only to the final reviewer recommendation, but also to other questions on the reviewing sheets related to content, data, length, and form of a manuscript. Consistent with our findings, Whitehurst (1982, p. 242) reports: "None of these scales is significantly more reliable than the 4-point summary judgment. Most are not as reliable. Some are completely unreliable" (cf. also Scott, 1974, p. 700; Cicchetti, 1991, p. 122; and Zentall, 1991, p. 167). From a statistical standpoint, the extent of reviewer agreement—despite the statistical significance of the coefficients—must be described as rather unsatisfying. According to Landis & Koch (1977), kappa coefficients between 0.00 and 0.40 correspond to a relatively low level of reviewer agreement. Kappa coefficients between 0.41 and 0.80 are said to reflect substantial reviewer agreement, and values > 0.81 indicate excellent agreement.

24

8 Reliability of Manuscript Refereeing

Table 7. Agreement in first and second referees' responses to 392 communications submitted for publication to Angewandte Chemie in 1984 Evaluation form item

1) Are the contents of the manuscript a) of wide and general interest? (Yes/No) b) of extraordinary but special interest? (Yes/No)

No. of pairs Actual Chance Cohen's of referees agreement agreement kappa responding coefficient to the item

204

0.65

0.54

0.23

107

0.64

0.58

0.12

2) Do the data obtained by experiment or calculation verify the hypotheses and conclusions? (Yes/No)

296

0.82

0.78

0.17

3) Is the length of the manuscript appropriate to its contents? (Yes/ No, the manuscript is too long/ No, the manuscript is too short)

309

0.67

0.62

0.13

4) The form of the manuscript (text, figures, tables, nomenclature etc.) is beyond reproach (Yes/No)

279

0.65

0.60

0.12

5) Do you recommend acceptance of the Communication? (Yes, without alterations/Yes, after minor alterations/Yes, but only after major alterations/No)

392

0.38

0.29

0.14 [1,2]

[1] Cohen's weighted kappa coefficient = 0.20. [2] ANOVA intraclass correlation coefficient = 0.25

Eberley & Warner (1990) suggest that reviewer agreement within various subdisciplines of a subject may be greater. This hypothesis could not be verified in the case of chemistry. Angewandte Chemie received during 1984 sufficient manuscripts in 4 of the 80 subdisciplines (sections) of chemistry to allow a calculation of concordance coefficients.27 Table 8 shows that certain of the section-specific kappa and intraclass-correlation coefficients do lie slightly above the aggregate values (for all sections combined), but in the area of organometallic compounds, which represents the largest number of papers published in Angewandte Chemie, the level of chance-corrected agreement for reviewer recommendations is lower than for chemistry as a whole (weighted kappa coefficient = 0.14, Z-value = 1.84, n.s.; intraclass-correlation coefficient = 0.19). It has been established for the journals Social Problems (Smigel & Ross, 1970, pp. 19 f), New England Journal of Medicine (Ingelfinger, 1974, p. 690), and American Psychologist (Cicchetti, 1985, p. 563) that reviewer agreement with respect to rejection is

8.2 Reviewer Agreement

§i 9?

(O ·+=

ο ο co CO Φ it:

0

CO CD

CD (M CD

CO CD

8 8

C

O

to

^

13 Q.

1,φ

O

O)

iCD

m

CM CD

C

'φ

%

T3 Φ JD

U

v_i-

φ CM ^ CD

CO CL

CO CM O

O)

O

„

^-

Q

CD

CD

O

1 φ

O

CO

I fe <—»

%I

(O

"c φ JZ.

O

C

Έ Φ is

O) .y

£>

25

O

~CO

C O

1 C

3 E

O • ^

O

8

"I

M-. O Φ H-

CO

e I

9

§

(O Φ

O T-

§. £

CO

B

ε

·· ο

^^ ο· Ζ s

i— ^ c .9

ε

Il

aE

CO --;

E CO

φ ^ ™ O £ ο

O

Ι · CtJ

All sections

Inorganic Chemicals and Reactions

Organic Chemistry (Sections 23-28)

Physical Organic Chemistry

Chemical Abstracts section

Organometallic and Organometalloidal

O

"^

[1] "Do you recommend acceptance of alterations", (3) "Yes, but only after maj

JO

Table 8. Agreement of referees on ace Angewandte Chemie in 1984 by Chen

O ο^ 'c w

CO -D

S$ Φ

"

CO

Φ^ φ 0 φ "tr οS

greater than that for acceptance. The recommendation "reject" was associated with the highest degree of reliability in the case of Angewandte Chemie as well: for the question "Do you recommend acceptance or rejection of the communication?" the category-specific kappa coefficient for the response "reject" was 0.28. The corresponding coefficients for the

26

8 Reliability of Manuscript Refereeing

Table 9. Degree of consensus in first and second referees' recommendations to accept or reject 392 communications submitted for publication to Angewandte Chemle in 1984, by accepted and rejected communications (in %) Degree of consensus [1] ++

+-

H

—

Accepted communications (N= 289)

37

43

14

7

Rejected communications (/V= 103)

43

25

25

7

All communications (N= 392)

38

38

17

7

[1] ++ : +- : H—: —:

Both referees offered identical recommendations Referees' recommendations differed by one category Referees' recommendations differed by two categories Referees disagreed completely

three response categories "Yes, without alterations", "Yes, after minor alterations", and "Yes, but only after major alterations" were 0.07, 0.10, and 0.09, respectively. Eberley (1986) established in a study of the peer-review process for the journal Rural Sociology that reviewer agreement was greater for rejected manuscripts than with those accepted for publication. Based on our findings, however, this result should not be accorded an undue level of attention, because in our case the set of rejected manuscripts attracted not only a greater amount of agreement but also wider disagreement than the set of accepted manuscripts (cf. Table 9). This observation can be explained by the fact that communications are generally rejected by Angewandte Chemie under two circumstances: when both reviewers recommend rejection, and when the reviewer judgments disagree sharply.

8.3

Low Levels of Reviewer Agreement: Statistical Artifact or a Result of the Process by Which Reviewers are Selected?

In view of very low chance-corrected concordance coefficients on one hand and very high percentages of agreement in the reviewer judgments on the other, Whitehurst (1984, 1985) has questioned whether Cohen's kappa statistic and intraclass-correlation coefficients in fact constitute suitable measures for expressing the true extent of reviewer agreement. He further points out the fact that in all agreement matrices so far published the percentage of truly discrepant judgments is actually relatively small. Table 10 reveals that in the case of Angewandte Chemie as well, three-fourths of all the reviewer pairs (76.5%) reached completely or nearly identical conclusions28 (the kappa coefficient calculated "with scores computed as agreement if within one point" is 0.67; this coefficient is significantly higher than either the standard or the weighted kappa coefficient).

27

8.3 Low Levels of Reviewer Agreement Table 10. Concurrence [1] and discrepancy in referees' responses to the question: ,,Do you recommend acceptance of the Communication?" \.

Referee 2 (/V= 392)

Yes, without alterations (N =77)

Yes, after minor alterations (N =154)

Yes, but only after major alterations (N =70)

Yes, without alterations (N= 79)

20

38

9

12

Yes, after minor alterations (A/= 167)

33

75

31

28

Yes, but only after major alterations (N =58)

10

22

15

11

No (A/ =88)

14

19

15

40

N.

Nv

Referee 1 (N =392)

N.

No

(N= 91)

Nx^

[1] Total agreement (main diagonal) and minor deviations (upper and lower off-diagonals) in referees'recommendations are set in boldface

Cicchetti (1988), Feinstein & Cicchetti (1990), and Cicchetti and Feinstein (1990) have also drawn attention to the paradox of poor reliability despite high percentages of agreement. They recommend taking into account the fraction of the reviewers concurring with a response of "yes" when interpreting the chance-free kappa coefficients. If this fraction is very large—in the questions on the Angewandte Chemie reviewer questionnaire the figure ranged between 73% and 90%—then the kappa statistic and intraclass correlation coefficients are likely to imply an unsatisfactory degree of reliability.29 Moreover, interpretation of concordance coefficients requires that one also consider the two potential sources of poor agreement between the reviewers. According to Hornbostel and Neidhardt (1991, p. 29), a low level of consensus can be interpreted either as evidence of unreliability in reviewer judgments or as a reflection of reviews that are based on different premises. The latter situation is in fact quite probable if reviewers have been selected on the principle of complementarity: "Reviewers are not selected to reproduce each other's results, but to supplement and complement each other" (Kraemer, 1991, p. 153; emphasis in the original). If the technical competencies of the reviewers are complementary, then their judgments will be based on different criteria; at the very least it is likely that in forming their judgments they will differ in the weights they assign to various factors. One should not anticipate a high degree of reviewer agreement under these circumstances; indeed, agreement may not even be desirable. "Too much agreement is in fact a sign that the review process is not working well, that reviewers are not properly selected for diversity, and that some are redundant" (Bailar, 1991, p. 138; emphasis in the original).30

28

8 Reliability of Manuscript Refereeing

Whether, in fact, the comments of reviewers really are based on different criteria—as intended by the editors—is a question that to our knowledge has been the subject of only two empirical studies.31 Fiske & Fogg (1990) analyzed the content of 402 reviewer questionnaires covering 153 manuscripts. The comments in general addressed nine different aspects of manuscript content. The authors summarize the results of their study as follows: "In the typical case, two reviews of the same paper had no critical point in common. It seemed that reviewers did not overtly disagree on particular points; instead, they wrote about different topics, each making points that were appropriate and accurate. As a consequence, their recommendations about editorial decisions showed hardly any agreement"32 (Fiske & Fogg, 1990, p. 591). Bakanic, McPhail & Simon (1989) reached a very similar conclusion after subjecting reviewers' comments on 323 manuscripts to a content analysis: "While there were few direct contradictions and numerous matches (especially among criticisms), most comments were neither contradicted nor matched. Instead, the referees simply observed and commented on different aspects of the manuscript" (p. 650). Cole (1983, 1991) regards the low degree of reviewer agreement as neither a statistical artifact nor a result of the procedure according to which reviewers were selected. Instead, the lack of agreement is simply a reflection of the low levels of cognitive consensus that exist at the research frontiers of all scientific disciplines.33

9

Fairness in Manuscript Evaluation

9.1

Lenient and Strict Reviewers

The literature on impression formation describes a great many systematic tendencies in the framing of judgments (cf. Cohen, 1969). In the context of manuscript review the general tendency toward favorable or unfavorable evaluation is particularly significant. Authors of rejected manuscripts often perceive themselves as the victims of particularly strict reviewers.34 The editor of the journal Radiology, Siegelman (1991), attempted to characterize on the basis of their mean judgments all those reviewers who received ten or more manuscripts for evaluation during the period 1985-1990. This journal utilizes a nine-point rating scale. The numerical anchor "1" stands for an "outstanding manuscript", whereas "9" corresponds to an "unacceptable manuscript". The mean vote of the 660 reviewers who had evaluated ten or more manuscripts corresponded to 4.8, with a standard deviation of 0.8. Siegelman (1991) described reviewers more lenient than the median as either "pushovers" (those with mean votes 1.5 standard deviations below the mean) or "zealots" (2.5 standard deviations below the mean). Reviewers stricter than the mean were similarly characterized by this author as "assassins" and "demoters". Based on the results, Siegelman (1991, p. 639) assigned 8.5% of the reviewers to the "strict" category (assassins and demoters), while 7.3% classified as "lenient" (pushovers and zealots). No definite tendency could be assigned to 84% of the reviewers.35 In the case ofAngewandte Chemie, only ten reviewers received ten or more manuscripts during the year 1984 (cf. Table 5, p. 17). Since not all the reviewers actually provided a definitive recommendation in every case, the analysis that follows is necessarily limited to eight reviewers who suggested acceptance or rejection of 10 to 25 manuscripts. These reviewers are in turn referred to here by the letters A through H. Figure 3 shows that the mean votes of these eight reviewers ranged from 1.50 to 3.20.36 The mean value for all 115 reviews corresponded to 2.56, which is slightly above the overall mean of 2.40. Each of the reviewers took advantage of responses in at least three of the four response categories. Whereas reviewers A and B did not recommend the rejection of any manuscripts, reviewers D, G, and H failed to recommend acceptance without alteration for any of the manuscripts. The broader the diamond in Figure 3, the more manuscripts the corresponding reviewer recommended for acceptance or rejection. The height of the diamond represents the 95% confidence interval for the mean appraisal of the reviewer in Guardians of Science: Fairness and Reliability of Peer Review. H.-D. Daniel Copyright © 1993 VCH Verlagsgesellschaft mbH, Weinheim ISBN: 3-527-29041-9

30

9 Fairness in Manuscript Evaluation

question. The mean judgments of two reviewers are significantly different if their diamonds do not intersect in the vertical dimension.

CJ)

I C CO (D

H [1 ] Referees who recommended more than ten communications for rejection or acceptance. [2] Four-category rating scale: (1) "Yes, without alterations" (2) "Yes, after minor alterations" (3) "Yes, but only after major alterations" (4) "No-

Figure 3. Lenient and harsh referees

At first glance it would appear that reviewers D through H offered stricter judgments, and to a statistically significant extent, than reviewer A, and that the reviews supplied by B were on the average more lenient than those of H. However, the obvious differences in means apparent from Figure 3 cannot be taken as firm evidence for strictness or leniency in judgment, because the various reviewers were not given the same manuscripts to evaluate. In other words, differences in mean judgments might be a consequence of the fact that, on average, reviewers A and B were provided with better communications than reviewers D

9.1 Lenient and Strict Reviewers

31

through H. For this reason, Figure 4 was developed to reflect the mean judgments of other reviewers who examined the same manuscripts as reviewers A through H. A comparison of Figures 3 and 4 reveals that, on average, the communications evaluated by reviewers D through H were judged just as harshly by the second reviewers. Only in the case of reviewer A is a real tendency apparent: the second reviewers for reviewer A's manuscripts submitted recommendations that were noticeably stricter; indeed, stricter to a statistically significant extent.

B

C

D

E

G H

Co-referees of referees A - H Figure 4. Mean ratings by alternative referees involved in the evaluation of communications also reviewed by referees A - H

The typology proposed by Siegelman (1991) is based on the implicit assumption that there exist definitive tendencies in judgment which are both stable in an intra-individual sense and reflective of the personality of the reviewer. However, psychological research has shown that the frame of reference of an appraiser varies as a function of the particular object subject to appraisal (for an overview see Cohen, 1973, and Laming, 1991). Attempting to typologize reviewers is thus a very questionable undertaking, at least from a psychological point of view.

32

9 Fairness in Manuscript Evaluation

9.2

Judgmental Tendencies of Reviewers and Publication Bias

Rather than attempting to classify reviewers in terms of traits, modern psychological and sociological research is more concerned with identifying those factors that may have a favorable or unfavorable influence on the decision-making process in a concrete case. Authors of rejected manuscripts occasionally address this problem in letters to the editor: "Thank you for informing us of the rejection of our communication. The referees of Angewandte (Chemie) tend to sit on very high horses when it comes to authors or research topics that are not among the favored few" (cited by Golitz, 1990, p.5). From among the numerous publication biases that have been discussed in the literature, we have selected three for an empirical-statistical analysis in the discussion that follows: • the academic status of the corresponding author, • the nationality of the corresponding author, and • the subject area to which the communication might be assigned. It has been asserted in the literature that all three of these factors exert an influence on reviewer judgments and editorial decisions.37 Following in the tradition of Zuckerman & Merton (197 Ia, b), Bailar and Patterson (1985) define bias as follows: "The use of criteria other than strict scientific and technical merit in framing comments and advice to the editor" (p. 655). Since the editor of a professional journal may also be subject to bias, we have attempted to make a distinction in what follows between biases in both reviewer judgments and final editorial decisions with respect to the academic title and nationality of the author as well as the subject matter of the communication. Since the scores of the members of a pair of reviewers are not (in a statistical sense) necessarily independent, results of statistical analyses for the two reviewer groups are presented separately in the following tables—in the sense of providing a criterion of validation (cf. Virgo, 1977). In order to investigate whether there exists a publication bias for a particular journal, manuscripts accepted for publication must be compared with those that have been rejected. However, this has not generally been the case in the studies published to date: "Only a handful of these observations (of the presence or absence of editorial bias) were developed from editorial decisions about manuscripts; most are based on the characteristics of published articles. Studies based on published articles provide only indirect evidence that there is, or is not, bias in the editorial process. To have direct evidence it is necessary to do not less than a comparison of the characteristics of published manuscripts with the characteristics of rejected manuscripts" (Ross, 1980, pp. 47 f.; emphasis in the original). Manuscripts rejected by the journal Angewandte Chemie were subdivided for the following analysis into those that were subsequently published in other professional journals and those that were not. With the aid of the bibliographic databases Chemical Abstracts (Chemical Abstracts Service, 1984 ff.) and Science Citation Index (Institute for Scientific Information, 1984 ff.) it was possible to establish whether, in fact, manuscripts rejected by Angewandte Chemie were published elsewhere, and, if so, in what journal. Of the 115 com-

9.2 Judgmental Tendencies of Reviewers and Publication Bias

33

munications rejected by Angewandte Chemie and the 9 additional communications withdrawn by the authors,38 a total of 88 (=71%) appeared in other journals. (This aspect of the investigation will be discussed in greater detail in Section 10.1)

9.2.1

Academic Title of the Corresponding Author: Reviewer Judgments and Editorial Decisions

Of all the communications received by Angewandte Chemie during 1984, 70% were submitted by professors, 10% by privatdozenten (assistant professors), and 20% by corresponding authors associated only with a doctorate degree. Consistent with Merton's interpretation of the Matthew effect,39 authors distinguished by a high academic rank should be subject to more favorable recommendations from reviewers than those of lower rank; i.e., work of the same intrinsic worth will be evaluated differently depending on the status of the author (cf. Merton, 1968, 1988). Tables 11 and 12 provide some support for this thesis. Communications from professors were indeed judged more favorably on average by first reviewers than communications from corresponding authors with only a doctorate degree (ί-value = 2.28,/? < 0.05).40 However, the corresponding judgments from second reviewers for these two categories of authors produced differences in means that were not statistically significant, although the trend was again in the expected direction (ί-value = 1.86, n.s.). Table 11. First and second referees' mean recommendations as a function of the academic title of the corresponding author Academic title of corresponding author

Do you recommend acceptance of the Communication?" [1] First referee Second referee mean rating mean rating A/ N

Professor Dr.

2.30 [2]

Privatdozent Dr. Dr.

283

2.36

279

2.41

41

2.46

39

2.60 [2]

81

2.62

76

[1] Four-category rating scale: (1) "Yes, without alterations" (2) "Yes, after minor alterations" (3) "Yes, but only after major alterations" (4) "No"

[2] Two sample tests for equal means and variances were carried out for all group pairs (Bartlett's test statistic and the single tests for equal variances across groups are statistically not significant). Communications from professors were judged significantly more favorably, on average, by first referees than communications from corresponding authors with only a doctorate degree (first referees: f-value = 2.28, ρ < .05; second referees: ί-value = 1.86, n. s.). Other differences in the mean ratings are statistically not significant.

34

9 Fairness in Manuscript Evaluation

Table 12. Publication outcomes for communications submitted to Angewandte Chemie in 1984 as a function of the academic title of the corresponding author (in %) Academic title of corresponding author

Communication published in ... Angewandte Other Chemie journals (Λ/=323) (N =88) [A] [B]

Communications not published (Λ/=20)[1] [C]

[1 ] Professor Dr. (N= 301)

77

18

5

[2] Privatdozent Dr. (N = 43)

79

14

7

[3] Dr. (/V= 87)

66

31

3

[1] In 18 other cases the academic title of the corresponding author is missing. 2

Chi value = 8.446, 4 degrees of freedom, not significant

The tendency toward more favorable appraisal of manuscripts from professors is also apparent in final editorial decisions: 77% of all communications submitted by professors were accepted by Angewandte Chemie for publication. In the case of corresponding authors with the doctorate as their highest academic rank the corresponding proportion was only 66%. The relationship between academic status of the corresponding author and publication of a submitted communication as detailed in Table 12 is not in fact statistically significant (χ2 value = 8.446, with 4 degrees of freedom). Nevertheless, if one utilizes the formulas of Kimball (1954) to dissect the contingency table into four independent components, each with one degree of freedom (cf. Table 12a), it then becomes apparent that communications from authors associated only with a doctorate are represented less frequently in Angewandte Chemie and more frequently in other professional journals relative to communications from professors and privatdozenten (χ2 = 7.293, 1 degree of freedom, ρ < .05).41 Table 12a. Partition of the chi-square value from Table 12 into specific components with one degree of freedom (df) each according to Kimball (1954) Model of independence

df

Chi2 value

P

1 x2-AxB

1

0.342

n. s.

1 x 2 - [A + B] x C

1

0.460

n. s.

[1+2]x3-AxB

1

7.293

<.05

[1 + 2] x 3 - [A + B] x C

1

0.350

n. s.

Total

4

8.445

n. s.

This result is consistent with the findings of Sahner (1982, pp. 92 f.), who showed for the Zeitschrift fur Soziologie that 52% of the manuscripts submitted by professors were accepted for publication, while the corresponding figures for privatdozenten and authors listing only a doctorate were 50% and 37%, respectively. On the other hand, Patterson, Bailey, Martinez & Angel (1987) reported for the journal American Political Science Review: "The data ... reveal rates of acceptance almost identical to rates of submission by professional rank.... Practice does not seem to suggest significant biases in Review publi-

9.2 Judgmental Tendencies of Reviewers and Publication Bias

35

cation in favor of 'established researchers', insofar as we could infer this from experience with scholars in different faculty ranks" (p. 1012). The question of whether and to what extent the academic title of the author influences reviewer judgments and editorial decisions cannot be resolved unambiguously from the data available. Since no experimental data exist to date,42 it is impossible to determine whether more favorable evaluations and above-average acceptance rates for authors with higher academic ranks can be traced to particular aspects of the reviewing and decisionmaking process, or if the different outcomes of the evaluation process are simply a function of the relative quality of the submitted manuscripts.

9.2.2

Subject Matter: Reviewer Judgments and Editorial Decisions

The subject area to which a particular communication should be assigned was established by reference to Chemical Abstracts, a reviewing medium prepared by the Chemical Abstracts Service (1984 ff.).43 Chemical Abstracts Service categorizes chemical publications into 80 different subject areas (sections). Every publication becomes associated with a single principal entry, which makes clearly apparent the most important aspect of the work, together with various subentries in the event that an article has significance in other areas of chemistry as well [cf. Rehm, Montforts, Ockenfeld & Wess, 1982; Braam & Bruil (1992) report that 80% of chemistry papers were placed in the "proper" CA main-section, according to authors]. The 80 sections are in turn collected into five primary fields of chemical research: • • • • •

biochemistry, organic chemistry, macromolecular chemistry, applied chemistry and chemical engineering, and physical, inorganic, and analytical chemistry.

Table 13 shows the sections to which published communications submitted to Angewandte Chemie during 1984 and subsequently published were assigned,44 subdivided into those published by Angewandte Chemie itself (N = 323) and those that were instead published somewhere else (N = 88).45 Angewandte Chemie in fact displays a very distinctive publication profile: during 1984, 60% of all published communications fell into only three of the sections of Chemical Abstracts: Section 29 (Organometallic and Organometalloidal Compounds, including not only classical organometallic topics, often regarded as essentially inorganic, but also many papers related to organic synthesis, in which organolithium and organotin reagents, for example, play an important role) which accounts for 40% of all the published communications, Section 22 (Physical Organic Chemistry, with 10% of the communications), and Section 78 (Inorganic Chemicals and Reactions, again contributing 10% of the communications). If one also includes typical organic chemical topics (Sections 23-28), which characterizes 20% of the communications, then these four areas alone encompass 80% of all the published communications.46 Three of the five primary fields of chemical research are either completely unrepresented in Angewandte Chemie (macromolecular chemistry) or

36

9 Fairness in Manuscript Evaluation

represented only to a very limited extent (biochemistry, and applied chemistry and chemical engineering). No communications whatsoever were submitted toAngewandte Chemie during 1984 for 44 of the sections of Chemical Abstracts. Communications accepted for publication in

Table 13. Subject-matter distribution of communications accepted for publication by Angewandte Chemie and communications rejected by Angewandte Chemie but published elsewhere [1] Chemical Abstracts Section

No. of communications published in Angewandte Chemie (N =323)

Biochemistry 1. Pharmacology 2. Mammalian Hormones 3. Biochemical Genetics 4. Toxicology 5. Agrochemical Bioregulator 6. General Biochemistry 7. Enzymes 8. Radiation Biochemistry 9. Biochemical Methods 10. Microbial Biochemistry 11. Plant Biochemistry 12. Nonmammalian Biochemistry 13. Mammalian Biochemistry 14. Mammalian Pathological Biochemistry 15. lmmunochemistry 16. Fermentation and Bioindustrial Chemistry 17. Food and Feed Chemistry 18. Animal Nutrition 19. Fertilizers, Soils, and Plant Nutrition 20. History, Education, and Documentation Organic Chemistry 21. General Organic Chemistry 22. Physical Organic Chemistry 23. Aliphatic Compounds 24. Alicyclic Compounds 25. Benzene, Its Derivatives, and Condensed Benzenoid Compounds 26. Biomolecules and Their Synthetic Analogs 27. Heterocyclic Compounds (One Hetero Atom) 28. Heterocyclic Compounds (More Than One Hetero Atom) 29. Organometallic and Organometalloidal Compounds 30. Terpenes and Terpenoids 31. Alkaloids 32. Steroids 33. Carbohydrates 34. Amino Acids, Peptides, and Proteins Macromolecular Chemistry 35. Chemistry of Synthetic High Polymers 36. Physical Properties of Synthetic High Polymers 37. Plastics Manufacture and Processing 38. Plastics Fabrication and Uses 39. Synthetic Elastomers and Natural Rubber 40. Textiles and Fibers

Other journals (N = 88)

1 1 1 5 1

2

2 2

3 33 7 16 16 5 7 13 130 2 4 2 9 5

13 5 2 6 3 4 7 17 1 2 1 5

1 1

9.2 Judgmental Tendencies of Reviewers and Publication Bias

37

Table 13: continued Chemical Abstracts Section

No. of communications published in Angewandte

Other

Chemie

journals

(N =323)

(N=QQ)

41. Dyes, Organic Pigments, Fluorescent Brighteners, and Photographic Sensitizers 42. Coatings, Inks, and Related Products 43. Cellulose, Lignin, Paper, and Other Wood Products 44. Industrial Carbohydrates 45. Industrial Organic Chemicals, Leather, Fats, and Waxes 46. Surf ace-Active Agents and Detergents Applied Chemistry and Chemical Engineering

47. Apparatus and Plant Equipment 48. Unit Operations and Processes 49. Industrial Inorganic Chemicals 50. Propellants and Explosives 51. Fossil Fuels, Derivatives, and Related Products 52. Electrochemical, Radiational, and Thermal Energy Technology 53. Mineralogical and Geological Chemistry 54. Extractive Metallurgy 55. Ferrous Metals and Alloys 56. Nonferrous Metals and Alloys 57. Ceramics 58. Cement, Concrete, and Related Building Materials 59. Air Pollution and Industrial Hygiene 60. Waste Treatment and Disposal 61. Water 62. Essential Oils and Cosmetics 63. Pharmaceuticals 64. Pharmaceutical Analysis

1 1

1

Physical, Inorganic, and Analytical Chemistry

65. General Physical Chemistry 66. Surface Chemistry and Colloids 67. Catalysis, Reaction Kinetics, and Inorganic Reaction Mechanisms 68. Phase Equilibriums, Chemical Equilibriums, and Solutions 69. Thermodynamics, Thermochemistry, and Thermal Properties 70. Nuclear Phenomena 71. Nuclear Technology 72. Electrochemistry 73. Optical, Electron, and Mass Spectroscopy and Other Related Properties 74. Radiation Chemistry, Photochemistry, and Photographic and Other Reprographic Processes 75. Crystallography and Liquid Crystals 76. Electric Phenomena 77. Magnetic Phenomena 78. Inorganic Chemicals and Reactions 79. Inorganic Analytical Chemistry 80. Organic Analytical Chemistry

2

1

1

4

2

1 1 11 32 1 3

1 1 12

[1] The assignment of communications to these sections was established by reference to Chemical Abstracts (volumes 101 to 105)

Angewandte Chemie were distributed among the 80 sections of Chemical Abstracts in about the same proportions as in the group of manuscripts rejected by Angewandte Chemie but nonetheless published elsewhere (Chi-square "goodness of fit" test statistic = 48.995, with 35 degrees of freedom, n.s.).47

38

9 Fairness in Manuscript Evaluation

Based on the findings presented in Table 13, the unique publication profile of Angewandte Chemie surely arises only in part from selectivity practiced by the editor. A much more significant factor is the available supply of submitted manuscripts. The same holds true for other professional journals: "The (subject matter) distribution of published papers reflects fairly accurately the distribution of papers submitted" (Anonymous, 1989, p. 406). Given the high degree of thematic concentration of manuscripts submitted for publication, the analysis that follows of the relationship between subject area, reviewer judgment, and editorial decisions is necessarily limited to Sections 22, 23-28,29, and 78 of Chemical Abstracts. Manuscripts from poorly represented sections have simply been combined into an "Other" category. Table 14 shows that organometallic communications (Section 29)48 received comparatively the highest ratings from both first and second reviewers. Nevertheless, the observed differences in mean ratings are statistically not significant. The most that can be said is that second reviewers rated organometallic communications more highly (mean rating 2.27) than communications from the field of physical-organic chemistry, which received an average rating of 2.71 on a four-point response scale (ί-value = 2.48, ρ < .10). Table 14. First and second referees' mean recommendations by sections of Chemical Abstracts Chemical Abstracts

"Do you recommend acceptance of the Communication?" [1]

Section

First referee mean rating

N

Second referee mean rating N

Organometallic and Organometalloidal Compounds

2.20

136

Inorganic Chemicals and Reactions

2.31

42

2.39

Physical Organic Chemistry

2.47

45

2.71 [2]

Organic Chemistry (Sections 23-28)

2.51

88

2.50

82

Other sections

2.33

79

2.36

76

[1] Four-category rating scale:

2.27 [2]

137

41

42

(1) "Yes, without alterations" (2) "Yes, after minor alterations" (3) "Yes, but only after major alterations" (4) "No"

[2] Two sample tests for equal means and variances were carried out for all pairs of CA sections (Bartlett's test statistic and the single tests for equal variances across groups are statistically not significant). Communications on Organometallic and Organometalloidal Compounds were judged, on average, slightly more favorably by second referees than communications on Physical Organic Chemistry (second referees: f-value = 2.48; first referees: f-value = 1.60). All differences in mean ratings are statistically not significant.

9.2 Judgmental Tendencies of Reviewers and Publication Bias

39

Not only were communications from the field of organometallic chemistry evaluated on average most favorably, they also had the (statistically significant) highest publication rate: 88% of all organometallic communications submitted to Angewandte Chemie in 1984 were accepted for publication (cf. Tables 15 and 15a). In the case of communications from the fields of organic chemistry (Sections 23-28), physical organic chemistry (Section 22), and inorganic chemicals and reactions (Section 78) the corresponding publication rates for submitted communications were only 70%, 72%, and 73%, respectively (average for all sections: 74%). Table 15. Publication outcome of communications submitted to Angewandte Chemie in 1984 by sections of Chemical Abstracts (in %) Chemical Abstracts Section

Communication published in ... Angewandte Other Chemie journals [A] [B]

[1] Organometallic and Organometalloidal Compounds (N = 147)

88

12

[2] Inorganic Chemicals and Reactions (N= 44)

73

27

[3] Physical Organic Chemistry (N = 46)

72

28

[4] Organic Chemistry (N = 91) (Sections 23-28)

70

30

[5] Other sections (N = 83)

77

23

Chi2 value = 14.448, 4 degrees of freedom, ρ < .01 Table 15a. Partition of chi-square value from Table 15 into specific components with one degree of freedom (df) each according to Kimball (1954) Model of independence

df

Chi2 value

ρ

3X4+AXB

1

0.036

n. s.

[3 + 4 ] x 2 + AxB

1

0.073

n. s.

[2 + 3 + 4 ] x 5 n - A x B

1

1.152

n. s.

[2 + 3 + 4 + 5 ] x 1 + A x B

1

13.187

<.01

Total

4

14.448

<.01

In a letter to the editor of the journal, H. Werner (Institut fur Anorganische Chemie, Universitat Wurzburg) called attention to a perceived publication bias in Angewandte Chemie to the benefit of organic chemistry: "During my tenure as guest professor in Toulouse during 1990 we of course had discussions regarding the level of chemical journals. Generally speaking, the Angewandte was singled out for a gold star, but with one blemish: whereas the selection of review articles was praised highly with respect to both quality and

40

9 Fairness in Manuscript Evaluation

breadth, there was a perception of imbalance in the case of the communications; they are too Organic'. Since then I have reexamined the issues for 1990 at leisure, and I would now not dispute this French critique."49 However, the remarkably strong emphasis on organometallic chemistry is not limited to communications published in Angewandte Chemie. Indeed, organometallic chemistry is a subject pursued very successfully and intensively throughout the Federal Republic of Germany. Thus, Chemical Abstracts records a total of 18 426 organometallic publications in the time period 1988-1990, 2 975 of which originated in the Federal Republic of Germany— corresponding to a 16.1% share of world output (cf. Daniel, 1991).50 Figure 5 shows that based on the total number of entries in the Chemical Abstracts database, the overall (West) German share of world research in chemistry as a whole amounted to only 6.1%. The 16.1% statistic for papers in organometallic chemistry therefore far exceeds the proportional contribution of the Federal Republic of Germany to worldwide research in chemistry generally. If one arranges the 80 sections of Chemical Abstracts in descending order according to number of publications, organometallic chemistry is found to occupy 5th place in Germany (cf. Daniel, 1991, p. 978), whereas worldwide it ranks 35th in number of publications. One indication of the scientific importance of this particular branch of chemistry is the fact that 8 of the Nobel prize winners in chemistry in the last 20 years have engaged in organometallic research (cf. Falbe & Regitz, 1991, p. 2716). Three German scientists have been honored with chemistry Nobel prizes for their achievements in organometallic chemistry (cf. Elschenbroich & Salzer, 1990, p. 16). Thus, Karl Waldemar Ziegler, director from 1943 of the Kaiser-Wilhelm-Institut (now the Max-Planck-Institut) fur Kohlenforschung in Mulheim a. d. Ruhr developed starting in 1953 a process for polymerizing ethylene at atmospheric pressure in the presence of mixed organometallic catalysts (cf. Ziegler, 1964). For his discoveries in the field of high polymers, Ziegler, together with G. Natta, was awarded the Nobel prize in 1963. Ernst Otto Fischer, since 1964 professor at the Technische Hochschule (now Technische Universitat) Munchen, was awarded the 1973 Nobel prize jointly with G. Wilkinson for his discovery that certain combinations of metals with organic substances possess a "sandwich-like" structure (cf. Wilkinson, 1974). Georg Wittig, professor since 1956 at the Universitat Heidelberg, received the 1979 Nobel prize jointly with H. C. Brown for his discovery of the Wittig olefin synthesis. Through his work on the applications of organoboranes and methylenephosphoranes in organic synthesis, Wittig made important contributions to the development of organometallic chemistry (cf. Brown, 1980). Organometallic chemistry laid the groundwork in the 1950s for homogeneous catalysis, which in turn opened the way to industrial production of polymers and organic intermediates (e.g., alcohols and acetic acids) with high specificity and at low temperature. Organometallic compounds such as silicones and lead alkyls are today prepared in great quantity by the chemical industry (cf. Elschenbroich & Salzer, 1990, p. 18). Parshall (1987) suspects that the industrial significance of organometallic chemistry will continue to grow because of the promise of additional applications in chemotherapy51 (e.g., an organometallic catalyst can be used to prepare a derivative of the chiral amino acid L-DOPA, which has therapeutic activity in the treatment of Parkinson's disease), agricultural chemistry (e.g., development of a new generation of environmentally sound pesticides), and electronics (e.g., the synthesis of new types of semiconductors, sensors, and ceramic materials).

9.2 Judgmental Tendencies of Reviewers and Publication Bias

41

O Ό

(%) indino ρμοΜ jo ejBqs O'u'

Germany's leading position in the scientifically and technically very significant field of organometallic chemistry has doubtless played a role in the fact that Angewandte Chemie receives for publication a large number of excellent organometallic manuscripts, especial-

42

9 Fairness in Manuscript Evaluation

Iy since the most important specialized journals in this field (Organometallics and the Journal of Organometallic Chemistry) are associated with markedly lower ISI Journal Impact Factors than Angewandte Chemie (cf. Fig. 1, p. 10). Given this background, it is inappropriate to subject Angewandte Chemie to a general charge of publication bias in the organometallic direction. For example, this journal could hardly expect to be successful in acquiring the best manuscripts in the field of biochemistry, because the potential for achieving wide readership is greater in the leading biochemistry journals (with ISI Journal Impact Factors as great as 48) than in Angewandte Chemie (cf., e.g., Marton, 1983).

9.2.3

Nationality of the Corresponding Author: Reviewer Judgments and Editorial Decisions

Many scientists probably share the opinion of Ernst & Kienbacher (1991) that most professional journals display a national publication bias; that is, "journals favour the printing of papers coming from the same country" (p. 560). The American Journal of Cardiology, for example, published during 1984 contributions from 23 countries—but 70.5% of all the papers were derived from American authors (cf. Roberts, 1985). Braun & Nagy (1982, p. 449) report that in the Journal of the American Chemical Society during 1977 American authors were responsible for 78.8% of the contributions, and contributions from Switzerland accounted for 76.8% of the articles appearing during the same year in Helvetica Chimica Acta. In a similar way it is possible to characterize the publication pattern of Angewandte Chemie: 84% of the published communications covered in our study were submitted by corresponding authors from the Federal Republic of Germany. Table 16 provides a list of the number of manuscripts submitted and accepted as a function of the country of origin of the corresponding author. Based on this information, Angewandte Chemie received manuscripts during 1984 from scientists in 21 different countries. Papers were accepted for publication from 13 countries other than the Federal Republic of Germany, including 10 communications from British scientists, 8 each from scientists in Japan and the United States, and seven from scientists working at research institutions in Switzerland. Since the numbers associated with these and the remaining nine countries are relatively small, communications from foreign corresponding authors were grouped together for the purposes of the following statistical analysis. Table 17 shows that communications from foreign corresponding authors were judged, on average, (statistically significantly) more harshly than those from German corresponding authors, irrespective of whether they were sent to German or foreign reviewers (the observed t-values are considerably higher than the critical values at the 95% confidence level for multiple pairwise comparisons of means). The reviewers for Angewandte Chemie recommended acceptance of communications from the Federal Republic of Germany significantly more frequently than in the case of foreign corresponding authors (cf. Tables 18 and 18a). Whereas over 80% of the domestic communications were recommended for acceptance—-by German and foreign reviewers alike—communications from foreign corresponding authors were recommended for acceptance in only 53% (German first reviewers) to 68% (foreign second reviewers) of the cas-

9.2 Judgmental Tendencies of Reviewers and Publication Bias

43

Table 16. Communications submitted and communications accepted for publication as a function of country [1] (in descending order of number of communications submitted) Country

No. of communications submitted (N= 449)

No. of communications accepted (N =323)

340

272

Japan

15

8

Great Britain

13

10

Italy

13

4

Switzerland

12

7

U.S.A.

11

8

France

6

4

Federal Republic of Germany

Israel

6

3

Spain

5

2

Hungary

5

O

India

4

1

Canada

4

1

Taiwan

4

1

Poland

3

O

Hong Kong

2

1

Australia

1

O

Belgium

1

O

Brazil

1

O

Nigeria

1

O

South Korea

1

O

Sweden

1

1

[1] National affiliation of the author to whom correspondence is to be addressed

es. The nationality of the reviewer apparently had no influence on this recommendation: the interaction effect between nationality of the corresponding author and nationality of the reviewer is statistically not significant. Foreign corresponding authors had a similarly (statistically significant) lower success rate in terms of publication (cf. Table 19). Whereas Angewandte Chemie accepted for publication 80% of all domestic communications, the acceptance rate for foreign communications was only 47% (χ2 = 47.596, 2 degrees of freedom, ρ < .0001). Most of the foreign communications rejected by Angewandte Chemie were nevertheless published in other professional journals, although a partitioning of the χ2 value in Table 19 shows that the differ-

44

9 Fairness in Manuscript Evaluation

Table 17. First and second referees' mean recommendations as a function of nationality of the corresponding author and nationality of referee Nationality of

"Do you recommend acceptance of the Communication?" [1 ]

corresponding author and nationality of referee

First referee mean rating

Second referee mean rating N

N

Corresponding author from the FRG and referee from abroad

2.23 [2]

48

2.11 [2[

44

Corresponding author from the FRG and referee from the FRG

2.28 [2]

270

2.37 [2]

265

Corresponding author from abroad and referee from abroad

2.58

24

2.64

22

Corresponding author from abroad and referee from the FRG

2.95 [2]

76

2.88 [2]

74

[1] Four-category rating scale: (1) "Yes, without alterations" (2) "Yes, after minor alterations" (3) "Yes, but only after major alterations" (4) "No"

[2] Two sample tests for equal means and variances were carried out for all group pairs. Communications from foreign corresponding authors were judged, on average, statistically significantly more harshly than those from German corresponding authors, irrespective of whether they were sent to German referees (first referees: unequal variance t test, f-value = 4.72, ρ < .001; second referees: equal variance ftest, f-value = 3.88, ρ < .01) or foreign referees (first referees: equal variance t test, f-value = 3.61, ρ < .01; second referees: equal variance ftest, f-value = 3.76, ρ < .01). Other differences in mean ratings are statistically not significant.

Table 18. Percentage of communications recommended for publication as a function of nationality of the corresponding author and nationality of first referee Corresponding author from the FRG

Corresponding author from abroad

First referee from the FRG

84 (N = 228)

53 (N = 40)

First referee from abroad

85 (N = 41)

63 (N = 15)

Note: Chi2 value = 38.217,1 degree of freedom, ρ < .0001

9.2 Judgmental Tendencies of Reviewers and Publication Bias

45

Table 18a. Percentage of communications recommended for publication as a function of nationality of the corresponding author and nationality of second referee Corresponding author from the FRG

Corresponding author from abroad

Second referee from the FRG

82 (N = 216)

61 (N = 45)

Second referee from abroad

82 (N = 36)

68 (N = 15)

2

Note: Chi value = 15.031,1 degree of freedom, ρ < .001

Table 19. Publication outcome of communications submitted to Angewandte Chemie in 1984 by German vs. foreign corresponding authors (%) Nationality of corresponding author

[1] Corresponding author from the FRG (N =340) [2] Corresponding author from abroad (/V= 109)

Communication published in Other Angewandte Chemie journal (N =323) (N=BS) [A] [B]

Communication not published (Λ/=38) [C]

80

13

7

47

40

13

Note: Chi2 value = 47.596, 2 degrees of freedom, ρ < .0001

Table 19a. Partition of the chi-square value from Table 19 into specific components with one degree of freedom (df) each according to Kimball (1954) Chi2 value

Model of independence

df

1 x2^- A x B

1

44.031

< .0001

1 χ 2 +- [A + B] χ C

1

3.565

n. s.

Total

2

47.596

< .0001

P

ence in proportions for manuscripts that were never published for German (7%) and foreign (13%) corresponding authors is statistically not significant (cf. Table 19a). Our results are in agreement with the few existing empirical findings relevant to national publication bias (cf. Gordon, 1978, 1979a; T. M. Daniel, 1991). In one study of the reviewing practices of two British journals in physics, Gordon (1978, p. 67) determined that "UK authors are, on average, less critically evaluated by UK referees than they are by North American referees (χ2 = 3.365), while North American authors are less critically evaluated by North American referees (χ2 = 0.8603)." Nevertheless, as shown by the χ2 values, these findings by Gordon (1978) are statistically not significant. The author attrib-

46

9 Fairness in Manuscript Evaluation

utes the results to the small number of American authors and reviewers associated with the journals examined. In the case of the Journal of Laboratory and Clinical Medicine, according to information supplied by the editor, T. M. Daniel (1991), only 16% of the accepted manuscripts, but 36% of the submitted manuscripts, were derived from foreign authors (χ2 = 50.3, 1 degree of freedom, p< .001). However, not all foreign authors seem to be affected by national publication bias to the same extent. According to the findings of Gordon (1979b), authors from developing and threshold countries are subject to a particularly low acceptance rate in British journals. The two physics journals covered by his study rejected 57% of the manuscripts from authors in less developed countries. For purposes of comparison, the rejection rate for British authors was 13.9%, for Americans 20.1%, and for other Western Europeans 35.5%. Similarly large nationality differences are reflected in the rejection rates for Angewandte Chemie: Federal Republic of Germany, 20% (N = 340); Great Britain, 23% (N = 13); United States, 27% (N =11); Switzerland, 41% (N = 12); Japan, 47% (N = 15); Italy, 69% (N = 13); Taiwan, 75% (N = 4); Hungary, 100% (N = 5).

10 The Validity of Manuscript Review

In contrast to Cicchetti (1991), whose analysis focuses on questions of reliability and fairness in the peer-review process, Kraemer (1991, p.154) suggests that "the review process be judged more by the results it produces (valid findings) than by the procedures it uses to produce those results, such as the 'reliability' of reviewers." We now turn our attention to the so far seldom-examined issue of the validity of the peer-review process. Two related approaches have been taken in studies to date seeking estimates of the validity of manuscript evaluation. Wilson (1978) compared average citation rates for papers published in the Journal of Clinical Investigation (JCf) with the citation rates for papers rejected by JCI but nevertheless published in other professional journals. For reasons of time and cost, Lock (1985) attempted to determine the validity of manuscript review on the basis of ISI Journal Impact Factors52 (cf. Garfield, 1976 ff.) for those journals that published manuscripts rejected by the British Medical Journal (BMJ). Finally, as a check on the use of the impact factor as a criterion, Lock (1985, p. 58) compared the citations to a small but statistically valid random sample of 39 papers accepted by the BMJ with those of 39 articles rejected by the BMJ and published elsewhere, as well as those of 39 articles that had reached the hanging committee stage but had still been rejected and published elsewhere. Based on the earlier work of Wilson (1978) and Lock (1985) we have also attempted to conduct a study of the validity of the peer-review process as a part of the current project. In the sections that follow we begin by identifying professional journals in which manuscripts rejected by Angewandte Chemie were published and noting the magnitudes of their ISI Journal Impact Factors. We then present the results of citation analyses for the individual papers themselves, in which we compare citation rates for accepted communications with those for communications that were rejected but nevertheless published elsewhere. Finally, citation counts are used not only to assess the validity of editorial decisions, but also to investigate the predictive validity of initial appraisals of the manuscripts and subsequent reviewer recommendations.

Guardians of Science: Fairness and Reliability of Peer Review. H.-D. Daniel Copyright © 1993 VCH Verlagsgesellschaft mbH, Weinheim ISBN: 3-527-29041-9

48

10 The Validity of Manuscript Review

10.1 The Fate of the Rejected Manuscripts With the help of the bibliographic databases Chemical Abstracts (Chemical Abstracts Service, 1984 ff.) and Science Citation Index (Institute for Scientific Information, 1984 ff.) it was possible to ascertain whether, and also where, manuscripts rejected by Angewandte Chemie were later published elsewhere. Of the 115 communications for which publication Table 20. List of those journals that published communications rejected by Angewandte Chemie in 1984 (in descending order of number of publications) Journal

Tetrahedron Letters lnorganica Chimica Acta

Circulation [1]

Impact Factor 1986

No. of publications

3380

2.158

10

2.439 [2]

8

-

1.792

8

Journal of the Chemical Society, Chemical Communications

4000

2.385

8

Zeitschrift fur Naturforschung, Sek. B (Chemie)

3200

1.405

7

Chemische Berichte

2140

1.680

5

Chemistry Letters

4650

1.612

4

Synthesis (Communications)

3300

1.290

4

-

0.905

3

2450

1.326

2

Journal of Organometallic Chemistry

Journal of Fluorine Chemistry Zeitschrift fur Anorg. undAllg. Chemie Advances in Hydrogen Energy

800

-

-

1

Archiv der Pharmazie

940

0.562

1

Biopolymers

900

2.259

1

Bulletin of the Korean Chemical Society

3000

0.607

1

Canadian Journal of Chemistry

2300

1.239

1

Carbohydrate Research

-

1.462

1

Chemical Physics

-

2.079

1

Chemiker-Zeitung

4300

0.609

1

-

0.619

1

2400

1.605

1

Gazzetta Chimica ltaliana Helvetica Chimica Acta Heterocycles

700

1.021

1

Indian Academy of Sciences. Proceedings, Chemical Sciences

1000

0.399

1

Inorganic Chemistry

3979

2.628

1

Journal of Catalysis

-

2.527

1

10.1 The Fate of the Rejected Manuscripts

49

Table 20: continued Journal

Circulation [1]

Impact Factor 1986

1600

0.685

1

-

1.026

1

9907

2.079

1

Journal of the American Chemical Society

13019

4.435

1

Journal of the Chinese Chemical Society

2000

0.183 [2]

1

Journal of the Electrochemical Society

8600

1.588

1

Kenkyu Hokoku-Asahi Garasu Kogyo Gijutsu Shoreikai

—

-

1

2639

2.227

1

National Academy of Sciences, India. Science Letters (India)

-

0.029

1

Organic Preparations and Procedures International

600

0.456

1

Organometallics

2905

3.588

1

Journal of Heterocyclic Chemistry Journal of Molecular Structure Journal of Organic Chemistry

Macromolecules

No. of publications

Polymer Bulletin

-

0.918

1

studia biophysica

-

0.494

1

Surface Science

-

3.176

1

3000

2.031

1

Tetrahedron

[1] Source: Ulrich's International Periodicals Directory 1992-93. New Providence, NJ: Bowker, 1992. The circulations of journals published by Elsevier are not provided by this source. [2] ISI Journal Impact Factor, 1987

in Angewandte Chemie was denied, as well as 9 communications that were withdrawn by the authors, a total of 88 (=71%) later appeared in other journals.53 Table 20 provides a list of the journals in which the manuscripts rejected by Angewandte Chemie were published. Communications rejected by Angewandte Chemie appeared in a total of 39 different professional journals. The following ten journals published two-thirds of the rejected papers: Tetrahedron Letters (10 papers); Inorganica Chimica Acta (8); Journal of Organometallic Chemistry (8); Journal of the Chemical Society, Chemical Communications (8); Zeitschrift fur Naturforschung (7); Chemische Berichte (5); Chemistry Letters (4); Synthesis (4); Journal of Fluorine Chemistry (3); and Zeitschriftjur Anorganische und Allgemeine Chemie (2). Only three of the communications appeared in journals with wider circulations than Angewandte Chemie [Ulrich's (1992) reports the circulation for the German edition to be 5705, and 3000 additional copies were ascribed to the English edition]: one publication each in the Journal of the American Chemical Society (with a circulation of 13 019), the Journal of Organic Chemistry (with a circulation of 9907), and the Journal of the Electro-

50

10 The Validity of Manuscript Review

chemical Society (with a circulation of 8600). The ISI Journal Impact Factors54 for the journals listed in Table 20 fall between 0.029 (National Academy of Sciences, India. Science Letters) and 4.435 (Journal of the American Chemical Society). The weighted mean ISI Journal Impact Factor for all 39 journals is 1.747. It is worth noting that 83% of the manuscripts that were rejected but nevertheless published elsewhere appeared ultimately in their original form—i.e., as short communications (letters, notes). Only 15 of the manuscripts (= 17%) were published as full papers. Table 21 provides the reviewers' responses to the question "If you are of the opinion that the contribution is not suitable for publication in Angewandte Chemie please indicate which other journal you consider more appropriate?". As a general rule, reviewers qualified their recommendations by suggesting that the work be reported in the form of a full paper in the journal designated. Nevertheless, most of the authors declined to follow this advice from the reviewers. Based on ISI Journal Impact Factors, the Angewandte Chemie editorial decisions could be described as highly valid: none of the rejected manuscripts appeared in a journal with an Table 21. "If you are of the opinion that the contribution is not suitable for publication in Angewandte Chemie please indicate which other journal you consider more appropriate?" Journal

No. of times mentioned

Chemische Berichte Aufsatz (full paper) Notiz (note)

16

Synthesis (full paper)

14

Journal of Organometallic Chemistry (regular paper)

12

12 4

Tetrahedron Letters

9

Justus Liebigs Annalen der Chemie

7

Organometallics

5

Journal of Organic Chemistry (full paper)

4

Synthetic Communications

4

Zeitschrift fur Naturforschung (Section C: Biowissenschaften)

4

Journal of the American Chemical Society (full paper)

3

Journal of Molecular Structure (full paper)

3

Acta Crystallographica (Section C: Crystal Structure Communications)

2

Biopolymers

2

Helvetica Chimica Acta (Vollmitteilung / full paper)

2

lnorganica Chimica Acta

2

Journal of Fluorine Chemistry

2

Note: 22 additional journals were mentioned only once

10.2 Comparison of Mean Citation Rates

51

Impact Factor higher than that of Angewandte Chemie itself—in one of the multidisciplinary "high-impact" journals, for example. The mean Impact Factor for journals publishing these manuscripts was 1.747, significantly lower than that of Angewandte Chemie (1986: 5.335).

10.2 Comparison of Mean Citation Rates for Accepted Manuscripts and Rejected Manuscripts Published Elsewhere: The Predictive Validity of Editorial Decisions ISI Journal Impact Factors represent only a very crude measure of validity, because all papers appearing in a given journal are characterized in terms of a single average value. Very frequently cited papers are thus undervalued, whereas papers that are rarely or never cited are overvalued (cf. also Teevan, 198O).55 For this reason we have gone beyond Journal Impact Factors and determined the frequencies with which individual papers were cited subsequent to their publication.56 There exists only a single source for citation data, the Science Citation Index, published by the Institute for Scientific Information (ISI) in Philadelphia. Apart from ISI itself, the only place one can access the complete magnetic tapes that actually constitute this database is the Information Science and Scientometrics Research Unit (ISSRU) of the library of the Hungarian Academy of Sciences in Budapest. An on-line version of the database (called SCISEARCH) is accessible through the host information service DIMDI (Deutsches Institut fur Medizinische Dokumentation und Information) with headquarters in Cologne. Citation frequencies can be established in principle on the basis of the printed library version of Science Citation Index, but the process is time-consuming and subject to error, particularly in the case of publications from Angewandte Chemie. This is true because Angewandte Chemie is available in both German and English editions. Whereas foreign authors generally acknowledge only the English version of an article from Angewandte Chemie, German authors frequently cite both editions. In order to prevent duplication in the counting process it is therefore necessary that one work with a data set reflecting all those papers that have cited a communication from Angewandte Chemie in either its German or English version; i.e., it is necessary that one eliminate the overlap. The corresponding effort starting from the printed edition of Science Citation Index is virtually prohibitive in the case of a frequently cited communication. For this reason, citation frequencies in the present study were established on the basis of the ISSRU magnetic tapes and the on-line version of Science Citation Index.51 ISSRU ascertained in December 1990 citation frequencies to the end of 1989 for the 323 communications selected for publication in Angewandte Chemie during 1984 (at that time the data for 1990 were not yet available at ISSRU). Citation frequencies for the 88 manuscripts

52

10 The Validity of Manuscript Review

find reference=herrmann wa,1984,v96,p364 1.00 NUMBER OF HITS IS 43 9

find reference=herrmann wa,1984,v23,p383 2.00 NUMBER OF HITS IS 58 7 find 1 or 2 3.00 NUMBER OF HITS IS 67 7 find 3 and ed=84 to 89 4.00 NUMBER OF HITS IS 60

Question 1: How many times was the German version cited? Answer: 43 times Question 2: How many times was the English version cited? Answer: 58 times Question 3: How many times is either the German or the English version cited? Answer: 67 times Question 4: How many citations occurred within the time period 1984 to 1989? Answer: 60 citations

Figure 6. Citation analysis: search strategy

rejected by Angewandte Chemie but nevertheless published elsewhere were determined in August 1991 through the on-line host DIMDI. The search strategy employed is illustrated by the following example based on a communication submitted to Angewandte Chemie by W. A. Herrmann (cf. Fig. 6; the results obtained at various stages in the process are indicated in italic type). First it was necessary to establish how frequently the German version of the communication (Angewandte Chemie, Vol. 96, p. 364) was cited in other publications subsequent to the date of its appearance (result: 43 citations). A second step led to the number of publications that made reference to the English version of the communication (Angewandte Chemie International Edition in English, Vol. 23, p. 383; 58 citations). A third step was required to create a combined set of all citing publications, thereby eliminating duplicate counting (67 citations). Finally, the citation analysis was narrowed to the time period 1984-1989 (60 citations). The complete results of the citation analyses are presented in Figure 7. Based on the data acquired, communications accepted by Angewandte Chemie during the year 1984 were cited twice as frequently on average as communications that were rejected by Angewandte Chemie but published elsewhere. Whereas only 2% of the communications published in Angewandte Chemie had never been cited by the end of 1989, 10% of the papers appearing in other professional journals fell into this category.58 Communications appearing in Angewandte Chemie were subject to as many as 60 different citations, whereas the maximum number of citations for communications published elsewhere was 35. The two observed distributions of citation frequencies can be approximated very well by the negative binomial distribution (distribution of citation frequencies for communications in Angewandte Chemie: Chi-square "goodness of fit" test statistic = 34.98, with 32 degrees of freedom, n.s.; distribution of citation frequencies for contributions in other journals: Chisquare "goodness of fit" test statistic = 12.05, with 10 degrees of freedom, n.s.).59 Thus, citation frequencies for both accepted manuscripts and those rejected but published elsewhere display distributions with a steep left flank but a gradual slope on the right; in other

10.2 Comparison of Mean Citation Rates Angewandte Chemie no. of communications (in %) 14

12

No. of citations

Other journals no. of papers (in %)

10

13,64

Angewandte Chemie on average, 12 citations (323 communications)

~ o,62 ι

0,31

0,31

0,31

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1,14

1,14 1,14

1,14

Other journals

—

on average, 6 citations (88 papers)

Figure 7. Comparison of mean citation rates for communications accepted by Angewandte Chemie with those rejected by Angewandte Chemie but published elsewhere

53

54

10 The Validity of Manuscript Review

per yea

words, most of the publications were cited only rarely, but a very few were cited relatively frequently (cf. also Daniel & Fisch, 1986, and Daniel, 1988, pp. 226-238). Table 22 illustrates (as does Fig. 7) that communications published by Angewandte Chemie were cited roughly twice as frequently as those that were rejected by Angewandte Chemie but later published in other professional journals as either notes or full papers. The difference in mean citation rates is statistically highly significant (F17408 = 11.17, ρ < .001). The analysis of covariance that was conducted takes into account the fact that communications rejected by Angewandte Chemie appeared in other journals after a delay of roughly six months, which means that the period of time available for citation of these works was somewhat shorter than that for communications published in Angewandte Chemie.^ The statistical significance of our findings becomes even more apparent if one follows the suggestion of Andrews (1961) and transforms the raw citation frequencies logarithmically prior to carrying out an analysis of covariance to obtain scores with a normal distribution and homogeneous variances across groups of papers (/7^408 = 20.41,/? < .00I).61 Figure 8 reproduces results from a citation analysis undertaken by Wilson (1978), which compares mean citation rates for articles accepted for publication during 1970 by the Journal of Clinical Investigation with mean citation rates for manuscripts that were rejected by this journal but published elsewhere during 1971. Articles appearing in the Journal of Clinical Investigation were cited during the first four years after their publication nearly twice as frequently as rejected manuscripts that were subsequently published elsewhere.

JCI (N =306)

a

S

Non-JC/

ϋ

(A/ =149)

§

1971

1972

1973

1974

Year Figure 8. Comparison of the citation rates for papers accepted and rejected by The Journal of Clinical Investigation but published elsewhere. The mean citation rates for manuscripts rejected by The Journal in 1970 and published elsewhere in 1971 are compared with those for the papers published by The Journal during the same year (Source: Wilson, 1978, p. 1699)

£· ϊ ρ
EL

££.

55.08

π

D

O ^ cos Q δ* 3

™

«*·.

Ss-

CO

I*

P

>i

C/5

ff.

Q I'

σ· 3- Sr ^ "B δ

3 δ 3.

ΙΊ

S* SΙ ί| ο' « g

| .

§

CO

-3 I I

δ

11 *

Q

Ki l>rf

S*

CD

I—'

-3.72

ο

^1/408 =20·41' P - ·001 (dependent variable = logarithmically transformed raw citation counts)

Note: F1/408 = 11.17, ρ < .001 (dependent variable = raw citation counts)

5.98

+0.87

60.54

11.53

Angewandte Chemie (N = 323)

Other journals (A/= 88)

Mean citation rate (with no. of months since publication of the manuscript held constant)

Mean time window for citation (months)

Mean citation rate, 1984/85 to 1989

Communications published in ...

Table 22. Validity of the editor's decision. Comparison of mean citation rate for communications accepted for publication by Angewandte Chemie with the mean citation rate for communications rejected by Angewandte Chemie but published elsewhere, after adjustment of the time window for citation (one-way analysis of covariance)

p^S? £ * 3 £±i O) W3

Bj τ* B4

5?

113

55* 3 o 3

56

10 The Validity of Manuscript Review

mittee of the journal (the "Hanging Committee"). Lock (1985) selected a random sample of 39 manuscripts accepted by the British Medical Journal for publication during 1979 and compared their mean citation rates during the subsequent five years with manuscripts that had been rejected either on the basis of reviewer recommendations or on the advice of the Publication Committee. Table 23 shows that, once again, articles accepted for publication by the British Medical Journal were cited twice as frequently as manuscripts that were rejected on the basis of reviewers' comments but were later published elsewhere (1.82 vs. 0.9 citations per year)—a result fully consistent with the findings described above for Angewandte Chemie and the Journal of Clinical Investigation. Table 23. Citations for three groups of papers up to 1984 (Source: Lock, 1985, p. 64) Accepted by BMJ (N =39)

Citations per year

Rejected by BMJ Rejected by Rejected at hanging any stage committee (N= 3Q) (/V= 39)

1.82

1 .64

0.9

2

(λ = 66.5; 2 degrees of freedom; ρ < 0.001)

Based on mean citation rates for accepted manuscripts and rejected manuscripts that were nevertheless published elsewhere, editorial decisions in all the existing studies reflect a high degree of predictive validity.

10.3 The Predictive Validity of Initial Judgments and Reviewer Recommendations Ross (1980) summarized the status of research into the predictive validity of manuscript review as follow: "Manuscript refereeing.. .has been shown to be.. .without validity in forecasting the subsequent usefulness of a work to scientists as reflected in citations of the work in other scientific papers" (p. ii). However, this conclusion is based upon only a single empirical investigation: "Gottfredson's work (1978) is the only work we could find which correlates readers' (referees') judgments with an indicator of the usefulness of the scientific work" (p.42). Gottfredson (1978) subjected the quality62 and impact63 of articles published during 1968 in leading psychological journals to the judgment ten years later of experts nominated by the authors of the papers. Experts' judgments of both the quality and impact of the papers were then correlated with the total numbers of times these pieces of work were cit64 ed during the first eight years after their publication. The study led to the following conclusion: "While largely statistically significant, these relations are very weak. The highest observed (between experts' judgments of impact and the log of the total citations made of the articles) was .37 (p < .001)" (Gottfredson, 1978, p. 931).

57.13

7.92

not acceptable (N= 24)

2/389

F

'

= 5 61

'P- .01 (dependent variable = logarithmically transformed raw citation counts)

Note: F27389 = 2.42, ρ < .09 (dependent variable = raw citation counts)

58.92

9.44

questionable (N =221)

-2.13

-0.93

+1.42

60.57

Mean citation rate (with no. of months since publication of the manuscript held constant)

Mean time window for citation (months)

12.08

Mean citation rate 1984/85 to 1989

acceptable (N= 148)

The communication is ...

Table 24. Validity of initial evaluations by the editor-in-chief for communications submitted for publication in Angewandte Chemie. Comparison of mean citation rates for communications the editor-in-chief thought should be accepted or rejected, as well as communications with respect to which the editor-in-chief was uncertain about the appropriate course of action, after adjustment of the time window for citation (one-way analysis of covariance).

5« H§

CiJ

gi <

C^ CD Q 5

a

< CD

S3 O CD CD ·™ί

S

^

1

^1*11 CD

-

2

CD

O

»5

P

Q-

hH

CD D. § , a fta ST· ·< M ^ M

^. ρ

i—"

S3

CD

J3

O η g; ^ oo

5?>

^Ii il

P

S I H * -5' P cr £L

O

h^

9 C5 V-^

3 S^*&

Ci- O

§ iHi Ul Il

O £5

HiI f

^

S· 3 >< ^ S

s. . ί a &

H

O- ^

CD o

(D ^"

C2

*"*"

5- «»

Ϊ3

ζ

^

Q

5?

_^

Q

»-i*

*·* 3 ο £

5?

S? ° ο & &

Λ

^t

CO

1

CC

5d

I

o'

o.

§

C/5

I

I CTQ

£

S]

*·< O

£L Bt

O*

T 5 & g. S» #

C^

I

H 3

2: ° 2 1 s-

^ hrt

r

ρ U)

n>

fri* ο

g^

S gf
O D. CD E3 ^O ι—

™ ζ

«> Q

CL ^ '

Ό CD

3* p 3 ξ 3 g p

CD

10 The Validity of Manuscript Review

58

of raw data or as logarithmically transformed scores) is taken to be the dependent variable; the number of months since publication of a communication represents the covariate. Table 24 shows that communications the editor thought should be accepted were, on average, cited more frequently than either questionable communications or communications the editor thought should be rejected. The differences in the mean evaluations are sta-

§

CO

G

CVi

Τ3

Φ

Sυ

c ^-^

Φ .^ ±±

*

OO

S

I I

S

S S

ο ^W

CO 8 ^ ς^

C

I 1

S

§ V°

> CO

c3> CM

C)

alt «

E

c3 W φ

CO

*o>

erations

^,

ino

/V =

alteratio

ecommend ac mmunication?"

Il

O) IO

ο

PB

ο

II

φ φ

§ E

I

Vl

Vl

α α

8 S >^

€ c ο 2

I

CD Il

^-V

IP IsIl

O

CD Il

in in oo oo

1 0.34 9.65 8.49

"Yes, after minor alterations" (N = 1 54)

"Yes, but only after major alterations" (N = 66)

"No" (/V= 81)

57.38

58.89

60.21

60.79

^3/373 = 2.13, ρ < .10 (dependent variable = logarithmically transformed raw citation counts)

Note: ^3/37S = 2.05, ρ < .1 1 (dependent variable = raw citation counts)

13.14

"Yes, without alterations" (N =77)

-1.65

-0.75

-0.29

+2.41

Mean citation rate (with no. of months since publication of the manuscript held constant) D- §

δ

2.

ρ

*0

CL

I

Q

O ^ O 3

Μ

N> D-

< ^

if Ιί

$ ο- ο σ §

1

l|§ £| 1 ^« I §· ^ CP L^ 3 a

S3

1 H Λί 3»'Ps CD

I

i

CD

1

δ'

I

1

g 1 f aI

nj &. ο S ΐ

IN if

CL

C

^

1 1 1 J|I

οι-h ce§ ρ8 ^> s, fpi:

i Is'I'f η 3 S « S

| | | i §

C

ill s!

S % <§ B- ^

sifii

Mean time window for citation (months)

"Do you recommend acceptance of the Communication?"

Mean citation rate 1 984/85 to 1 989

5> g § £· g Q ^ S* £$· ο 3 S. £ EL ST c±, cr ° &

Table 26. Validity of second referees' recommendations. Comparison of mean citation rates for communications the second referees thought should be accepted without alterations, accepted after minor alterations, accepted only after major alterations, or rejected, after adjustment of the time window for citation (one-way analysis of covariance). • ThePredictiv

5.74

Both referees recommended rejection of the communication (N= 31)

-4.00

-0.43

58.21

55.61

+0.50

Mean citation rate (with no. of months since publication of the manuscript held constant)

60.38

Mean time window for citation (months)

^2/362 = 2.21 , ρ < .1 1 (dependent variable = logarithmically transformed raw citation counts)

Note: [1] The response categories "Yes, without alterations", "Yes, after minor alterations", and "Yes, but only after major alterations" were combined. ^2/362 = 1 .91 , ρ < .1 5 (dependent variable= raw citation counts)

9.76

11.07

Mean citation rate 1984/85 to 1989

The communication had mixed evaluations (N=SS)

Both referees recommended acceptance of the communication [1] (N= 247)

Configuration of first and second referees' recommendations

Table 27. Validity of first and second referees' recommendations combined. Comparison of mean citation rates for communications both referees thought should be accepted or rejected and for communications that received mixed evaluations, after adjustment of the time window for citation (one-way analysis of covariance).

O)

I y S-

O

OJ

I'

C/5

ί

I'

O

§

3

O

ε

O)

C/3*

f

C/3

O ^h

I £3

& O)

£3 J"*

£3

O)

O £3 c«

&' P.

1

O

i-i

Io 3 H S 3

δ

r-f-

C/5

P

O o

I' I S' 5'

D-

(V

Iί

I3

I §

rfO

aC

O

^

HH

I

j^_ I' o I ί3 I3. I'

1'

P

o'

i. Ts oV

8 8 π

ί

V3

O)*

I

oV

£Τ O)

£7* o'

3

5 O)

o

O) D^ O) H-t

O)

C/5

D-

C/3

Iί KII δ' £3

S,

£L

? O)

g ft 3

O

O) O)

TB 2. <* O)

g· oT 3 Q y £3

O* P

3

O)

O CTQ

HI

Ρ1.

§

I

Ξ & ^ ξ2

ί

3 3 3 •^ 2. G O)'

2

!-+5

O

10.3 The Predictive Validity of Initial Judgments and Reviewer Recommendations

61

The differences in mean citation rates become more apparent if the recommendations of first and second reviewers are combined (cf. Table 27). Communications that were recommended for acceptance in Angewandte Chemie by both reviewers (with or without reservations) were subject on average to 4.5 more citations than communications that both reviewers recommended rejecting. These findings with respect to the predictive validity of manuscript review at Angewandte Chemie suggest that Ross (1980) has greatly overstated the case by claiming that "the data about validity against the single criterion of citation of a work urge that current editing-refereeing processes cannot distinguish a 'good' paper from a 'bad' paper" (p. 46). On the contrary: all the existing studies indicate that manuscripts judged favorably by reviewers are cited more frequently subsequent to their publication than those judged negatively.

11 Suggestions for Reform of the Peer-Review Process

In a survey conducted in 1986 by the American Council of Learned Societies, three-fourths of all those questioned criticized the peer-review process: "Sizable majorities of scholars in seven broad disciplines think the peer-review system for deciding what gets published in scholarly journals is biased in favor of 'established' researchers, scholars from prestigious institutions, and those who use 'currently fashionable approaches' to their subjects" (Jacobson, 1986, p. 1). Among those scholars questioned, 42% concurred with the statement "The peer-review system in my discipline needs reform" (Morton & Price, 1989, p. 29). Nevertheless, only a very few scholars wish to dispense with manuscript review altogether. This assertion is based on a member survey carried out by the American Psychological Association in 1967. To the question, "If the journal's editor has independently decided to publish your manuscript, would you prefer that the manuscript not be reviewed by anyone else in the field, reviewed by one more person, or reviewed by two more persons?" only 13% of those questioned said no one else should review the manuscript (emphasis in the original). An additional 20% selected the second response, while 67% selected the third option; thus, two-thirds of those questioned declared that two reviewers as well as the editor should evaluate the content of manuscripts (Brackbill & Korten, 1970). Numerous suggestions have appeared in recent years in letters and essays to the editors of scholarly journals arguing for reform in the peer-review process. These have been directed particularly toward improvements with respect to the reliability and fairness of such reviews. More specifically, the following suggestions have been advanced in the interest of reform: • • • • • • •

Formalizing the reviewing instrument Increasing the number of reviewers Involving authors in the selection of reviewers Eliminating reviewer anonymity Reviewing according to a double-blind procedure Establishing a right of appeal for authors Developing uniform guidelines for manuscript review

To some extent suggestions such as these have already been implemented by various journals. In the discussion that follows we propose to investigate—to the extent possible on the basis of existing information—the degree to which the proposed measures have proven useful. Formalization of the reviewing instrument. Many journals fail to make use of a structured form that specifies unambiguously the dimensions a review should address. In order Guardians of Science: Fairness and Reliability of Peer Review. H.-D. Daniel Copyright © 1993 VCH Verlagsgesellschaft mbH, Weinheim ISBN: 3-527-29041-9

64

11 Suggestions for Reform of the Peer-Review Process

to increase the extent of reviewer agreement, journals in this category might consider adopting a more formal questionnaire, thereby introducing uniformity into the criteria and limiting the scope of reviewer subjectivity. However, as Cicchetti & Eron (1979) have demonstrated with manuscript reviews for the Journal of Abnormal Psychology, explicit definition of the reviewing criteria does not necessarily result in an increase in the reliability of reviewer judgments: "The reviewer agreement levels for 1973-1975 [the years when the manuscript attribute rating form (MARF) was not available] were, in fact, slightly higher than agreement levels for 1976-1977 [when the MARF was first applied by reviewers]" (p. 596). Increasing the number of reviewers. Glenn (1976, pp. 184 f.) has suggested that professional journals should make use generally of at least three reviewers for their manuscript evaluations. One advantage of increasing the number of reviewers would be a reduction in the number of defective manuscripts approved for publication.65 Moreover, an increase in fairness could be anticipated, since the probability of a common bias is lower with three reviewers rather than two. The Journal of Politics acted upon this suggestion and began subjecting all submitted manuscripts to evaluation by three reviewers. Giles, Patterson & Mizell (1989) report that, as a result, "virtually all (manuscripts), even those eventually accepted, receive at least one negative review" (p. 62). Thus, the final decisions of the editor are in fact made more difficult. On the other hand, increasing the number of reviewers might have the effect of encouraging an editor simply to accept the majority opinion in the event of disagreement, no longer devoting careful critical attention to opposing arguments raised by the various reviewers. Moreover, financial considerations are likely to prevent many journals from increasing the size of their reviewing panels. Apart from economics, time factors must also be taken into account. Increasing the number of reviewers is undoubtedly more practical with respect to full papers and review articles than for short communications. One of the most important arguments in favor of retaining the latter type of publication is the opportunity it provides for establishing researcher priority, but this in turn requires that publication be as rapid as possible. Increasing the number of reviewers would also further complicate an editor's problems with effective time management. Editors of professional journals occasionally report difficulty in recruiting a sufficient number of competent reviewers. Since 1979 the journal Biochemistry has selected its reviewers with the aid of a computer-based information system that includes editors' evaluations of the quality of previously furnished reviews (Neurath & Garson, 1979; Garson, 1980). Author involvement in the selection of reviewers. Certain journals reserve for their authors the right to nominate at least one of the reviewers for a proposed article as a way of countering accusations that reviewers selected by the editors are incompetent or biased. The right to choose one reviewer is also generally accorded candidates for degree examinations, and such a policy is consistent with the way reviewers are selected by the Deutsche Forschungsgemeinschaft (cf. Neidhardt, 1988, pp. 50-64). To the best of our knowledge, no reports have yet been circulated regarding the effect author choice of one of the reviewers has on the reliability, fairness, or validity of manuscript review. Elimination of reviewer anonymity. By far the most frequent suggestion in the literature is that the anonymity of reviewers should be lifted. "The least we should accept with

11 Suggestions for Reform of the Peer-Review Process

65

the return of our rejected papers and mutilated thoughts is the signature of the judge who has passed judgement on one's hardearned but unacceptable piece of work, so that one can sleep peacefully at night, without the recurring nightmare that soon one will see appear on the horizon a familiar idea dressed up in different words but very much one's own" (Altura, 1990, pp. 117 f.).66 Several (unsuccessful) attempts were made in the United States during the 1980s to use legal means to force journals to reveal the identities of reviewers suspected of plagiarism (for an overview of the subject cf. Chubin, 1982; a pending case is described by DeBakey, 199O).67 Nevertheless, in a representative survey of 318 American communications scientists, only 25% agreed with the suggestion that "authors should be given the names of referees who have reviewed their manuscripts" (Ryan, 1982, p. 279). Ziman (1976) regards the anonymity of reviewers as advantageous to all parties involved, and feels that it should under no circumstances be abandoned: "Anonymity is better for all concerned: for the referee, who does not have to mix emotional factors into his intellectual judgements; for the editor, who gets a more honest opinion to guide his decisions; for the reader, who gets more reliable and better expressed papers that have been subjected to a higher standard of criticism; and, strangely enough, for the author who, when his mistakes are pointed out, can vent his chagrin harmlessly in the direction of an impersonal critic without falling into the mortal sin of acquiring a supposed enemy" (p. 264). A very few journals (e.g., Current Anthropology, Behavioral & Brain Sciences) operate under the concept of "open peer review"; that is, reviews are published alongside the corresponding articles, with authors retaining the right to express themselves briefly with respect to the accompanying reviews. The Journal of Molecular and Cellular Immunology (JMCI), founded in 1983, also practiced an "open peer review" policy—but it was forced to cease publication in 1990. The editor of the journal justified the suspension on the following grounds: "The idea of open reviewing was central to the conception and operation of JMCI. I believe it was also a principal reason for the failure of this journal.... We have encountered three problems with open reviewing: First, authors are reluctant to see critiques of their papers printed, even if the critique is accompanied by their rebuttal. This contributed to a low submission rate.... Second, many reviewers were reluctant to be critical in public, especially in the preferred form of a review that might be printed for attribution. This made it difficult to obtain reviewers or suitable reviews and contributed to delays in the processing of articles.... Third, the time consumed in reviewing, replying and adjusting the critique to revisions and rebuttals further delayed publication of articles, an unacceptable price in the eyes of most authors" (Janeway, 1990, p. 293). A number of journals offer reviewers the option of revealing their identities to the authors on a case-by-case basis. Relman (1980, p. 56), editor of the New England Journal of Medicine, reports that only 13% of all reviewers take advantage of this opportunity, even though they are expressly encouraged to do so in the instructions for manuscript review. The reviewing instrument of the Journal of General Internal Medicine contains the statement "We encourage you to sign your review." This advice is heeded by 43% of the reviewers (cf. McNutt, Evans, Fletcher & Fletcher, 1990). Since 1977 the Journal of Laboratory and Clinical Medicine has also taken into account in the selection of reviewers whether or not they are willing to sign comment sheets that are sent to the authors. Knox (1981) has summarized the experience of a journal following this type of policy in the

66

11 Suggestions for Reform of the Peer-Review Process

observation: "Signed reviews outnumbered unsigned reviews in both periods. The percentage of signed reviews was higher for manuscripts that were ultimately accepted than those that were rejected. The trend in both categories was for a decrease in the percentage of signed reviews in 1980 as compared with 1977. Thus the arguments for signed reviews did not seem to win converts" (p. 1). Review by a double-blind procedure. The proponents of two-sided anonymity believe that this regimen provides authors with the best assurance of fair manuscript appraisal. Three-fourths of the scholars questioned by Ryan (1982) supported the suggestion that "an author's name and institutional affiliation should be deleted from a manuscript before it is sent to a referee" (p. 278). A number of journals have adopted as a general policy manuscript review according to a double-blind procedure (e.g., American Sociological Review, Psychologische Rundschau), whereas others operate in this fashion only at the express request of an author (e.g., Ceramic Bulletin, Journal of Personality and Social Psychology}Nevertheless, it remains an open question whether the double-blind procedure actually functions as it is intended. Adair (1982) estimates that reviewers for Physical Review Letters are able to identify the authors of unattributed short communications in 80% of all cases. In a study by Ceci & Peters (1984), three-quarters of the scholars questioned were of the opinion that they would be able to identify the author of an anonymous manuscript. In fact, however, only 35.6% of the reviewers involved in this study were actually able to do so. This is consistent with the findings of Rosenblatt & Kirk (1980) and McNutt et al. (1990), who established respectively that two-thirds and three-fourths of the reviewers in their studies were unable to provide names for the authors of anonymous manuscripts. Reviewers from the study by McNutt et al. (1990) who were in fact able to identify various authors cited two primary grounds for their success: clues from publications by the authors themselves in the corresponding reference lists, and personal knowledge of the authors' work. In the study by McNutt et al. (1990), one reviewer in each case was asked to evaluate manuscripts on the basis of one-sided anonymity, whereas the other operated under the principle of two-sided anonymity. The quality of the two sets of comments was subsequently evaluated by both the editor and the authors.68 In the opinion of the editor of the journal, reviews of significantly higher quality resulted when the manuscripts were judged on the basis of a double-blind procedure, but author responses suggested that the nature of the reviewing process had no effect on the quality of the reviews (p. 1374). The mean reaction by authors on the criterion "fairness" was 3.9 on a five-point rating scale for both types of review (where the value "1" represented an unfavorable reaction and "5" a favorable reaction). The American Economic Review has tried and evaluated reviewer-blinded refereeing (cf. Blank, 1991; Deaton, Guesnerie, Hansen & Kreps, 1987). The results from a randomized experiment on the effects of double-blind versus single-blind peer reviewing on acceptance rates and referee ratings indicate that acceptance rates are lower and referees are more critical when the reviewer is unaware of the author's identity. These patterns are not significant different between female and male authors. Authors at top-ranked universities and at colleges and low-ranked universities are largely unaffected by the different reviewing practices, but authors at near-top-ranked universities and at nonacademic institutions have lower acceptance rates under double-blind reviewing (Blank, 1991).

11 Suggestions for Reform of the Peer-Review Process

67

Ziman (1968) has issued an express warning against invoking a double-blind procedure for evaluating manuscripts of the short-communication type. From his perspective it is essentially impossible to pass a valid judgment on such a paper without some personal knowledge of the author: "Sometimes a letter is just a very short paper, to which there is no objection; all too often it is a claim to discovery substantiated only by the scientific standing of the author" (Ziman, 1968, p. 110). The right of appeal by authors. Based on the experience of Horrobin (1974), one review in five contains false statements of fact. For this reason it would seem to be important that authors be made aware of reviewer comments and provided with an opportunity to react to them. In contrast to Angewandte Chemie, many journals withhold the content of the reviewer forms from their authors, primarily because they fear that providing this information might provoke lengthy disputes and excessive correspondence, both of which are difficult for an overworked editorial team to master. The development of guidelines for manuscript review. Ingelfinger (1974, p. 688) and Forscher (1980) suspect that a large part of the variation in reviewer judgments is attributable to the fact that editors' expectations are not made clear to the reviewers. In an attempt to express these expectations in more concrete form, several professional journals have in recent years published explicit guidelines for manuscript review; e.g.: • • • • • • • • • • • • • • • •

Ceramic Bulletin (Stull, 1989) Geophysics (Schoenberger, 1989) British Journal of Cancer (Twentyman & Selby, 1991) British Journal of Surgery (Anonymous, 1983) Archives of Surgery (Baue, 1985) Canadian Medical Association Journal (Squires, 1989) Human Pathology (Anderson, 1990) Journal of the American Dietetic Association (Monsen, 1983) Environment, Science & Technology (Glaze, 1988) Journal of Range Management (Hobbs, 1988) American Economic Review (Anonymous, 1989) Journal of the American Society for Information Science (Kraft, 1987) Information Processing & Management (Saracevic, 1986) Journal of Forecasting (Armstrong, 1982, pp. 99-102) Journal of Personality and Social Psychology (Greenwald, 1976) Personality and Individual Differences (Eysenck & Eysenck, 1992)

The American Chemical Society (1985) has issued a set of guidelines for publication of papers in all the Society's journals in which the tasks of both editors and reviewers are very carefully delineated (cf. Fig. 9). For example, reviewers are urged "(to) explain and support their judgments adequately" and "(to) be alert to failure of authors to cite relevant work by other scientists." The editors of the journals are expressly obligated "(to) give unbiased consideration to all manuscripts offered for publication, judging each on its merits without regard to race, religion, nationality, sex, seniority, or institutional affiliation of the author(s)."

68

11 Suggestions for Reform of the Peer-Review Process

Ethical Guidelines to Publication of Chemical Research The guidelines embodied In this document were adopted by the editors of the Books and Journals Division of the American Chemical Society In January 1985 and endorsed by the Society Committee on Publications.

PREFACE The American Chemical Society serves the chemistry profession and society at large in many ways, among them by publishing journals which present the results of scientific and engineering research. Every editor of a Society journal has the responsibility to establish and maintain guidelines for selecting and accepting papers submitted to that journal. In the main, these guidelines derive from the Society's definition of the scope of the journal and from the editor's perception of standards of quality for scientific work and its presentation. An essential feature of a profession is the acceptance by its members of a code that outlines desirable behavior and specifies obligations of members to each other and to the public. Such a code derives from a desire to maximize perceived benefits to society and to the profession as a whole and to limit actions that might serve the narrow self-interests of individuals. The advancement of science requires the sharing of knowledge between individuals, even though doing so may sometimes entail foregoing some immediate personal advantage. With these thoughts in mind, the editors of journals published by the American Chemical Society now present a set of ethical guidelines for persons engaged in the publication of chemical research, specifically, for editors, authors, and manuscript reviewers. These guidelines are offered not in the sense that there is any immediate crisis in ethical behavior, but rather from a conviction that the observance of high ethical standards is so vital to the whole scientific enterprise that a definition of those standards should be brought to the attention of all concerned. We believe that most of the guidelines now offered are already understood and subscribed to by the majority of experienced research chemists. They may, however, be of substantial help to those who are relatively new to research. Even well-established scientists may appreciate an opportunity to review matters so significant to the practice of science. Formulation of these guidelines has made us think deeply about these matters. We intend to abide by these guidelines, strictly, in our own work as editors, authors, and manuscript reviewers. GUIDELINES A. ETHICAL OBLIGATIONS OF EDITORS OF SCIENTIFIC JOURNALS 1. An editor should give unbiased consideration to all manuscripts offered for publication, judging each on its merits without regard to race, religion, nationality, sex, seniority, or institutional affiliation of the author(s). An editor may, however, take into account relationships of a manuscript immediately under consideration to others previously or concurrently offered by the same author(s). 2. An editor should consider manuscripts submitted for publication with all reasonable speed. 3. The sole responsibility for acceptance or rejection of a manuscript rests with the editor. Responsible and prudent ex-

ercise of this duty normally requires that the editor seek advice from reviewers, chosen for their expertise and good judgment, as to the quality and reliability of manuscripts submitted for publication. In reaching a final decision, the editor should also consider additional factors of editorial policy. 4. The editor and members of the editor's staff should not disclose any information about a manuscript under consideration to anyone other than those from whom professional advice is sought. (However, an editor who solicits, or otherwise arranges beforehand, the submission of manuscripts may need to disclose to a prospective author the fact that a relevant manuscript by another author has been received or is in preparation.) After manuscripts have been accepted for publication, the editor and members of the editor's staff may disclose or publish manuscript titles and authors' names, but no more than that unless the author's permission has been obtained. 5. An editor should respect the intellectual independence of authors. 6. Editorial responsibility and authority for any manuscript authored by an editor and submitted to the editor's journal should be delegated to some other qualified person, such as another editor of that journal or a member of its Editorial Advisory Board. Editorial consideration of the manuscript in any way or form by the author-editor would constitute a conflict of interest, and is therefore improper. 7. Unpublished information, arguments, or interpretations disclosed in a submitted manuscript should not be used in an editor's own research except with the consent of the author. However, if such information indicates that some of the editor's own research is unlikely to be profitable, the editor could ethically discontinue the work. When a manuscript is so closely related to the current or past research of an editor as to create a conflict of interest, the editor should arrange for some other qualified person to take editorial responsibility for that manuscript. In some cases, it may be appropriate to tell an author about the editor's research and plans in that area. 8. If an editor is presented with convincing evidence that the main substance or conclusions of a report published in an editor's journal are erroneous, the editor should facilitate publication of an appropriate report pointing out the error and, if possible, correcting it. The report may be written by the person who discovered the error or by an original author. B. ETHICAL OBLIGATIONS OF AUTHORS 1. An author's central obligation is to present an accurate account of the research performed as well as an objective discussion of its significance. 2. An author should recognize that journal space is a precious resource created at considerable cost. An author therefore has an obligation to use it wisely and economically. 3. A primary research report should contain sufficient detail and reference to public sources of information to permit the author's peers to repeat the work. 4. An author should cite those publications that have been influential in determining the nature of the reported work and that will guide the reader quickly to the earlier work that is essential for understanding the present investigation. Except in a review, citation of work that will not be referred to in the reported research should be minimized. 5. Any unusual hazards inherent in the chemicals, equipment, or procedures used in an investigation should be clearly identified in a manuscript reporting the work. 6. Fragmentation of research reports should be avoided. A scientist who has done extensive work on a system or group of related systems should organize publication so that each report gives a well-rounded account of a particular aspect of the general study. Fragmentation consumes journal space excessively and unduly complicates literature searches. The convenience of readers is served if reports on related studies are published in the same journal, or in a small number of journals.

11 Suggestions for Reform of the Peer-Review Process 7. In submitting a manuscript for publication, an author should inform the editor of related manuscripts that the author has under editorial consideration or in press. The relationships of such manuscripts to the one submitted should be indicated. 8. It is in general inappropriate for an author to submit manuscripts describing essentially the same research to more than one journal of primary publication. However, there are exceptions as follows: (a) resubmission of a manuscript rejected by or withdrawn from publication in one journal; (b) submission of overlapping work to a second journal in another field, if workers in the other field are unlikely to see the article published in the first journal, providing that both editors are informed; and (c) submission of a manuscript for a full paper expanding on a previously published brief preliminary account (a "communication" or "letter") of the same work. 9. An author should identify the source of all information quoted or offered, except that which is common knowledge. Information obtained privately, as in conversation, correspondence, or discussion with third parties, should not be used or reported in the author's work without explicit permission from the investigator with whom the information originated. Information obtained in the course of confidential services, such as refereeing manuscripts or grant applications, should be treated similarly. 10. An experimental or theoretical study may sometimes justify criticism, even severe criticism, of the work of another scientist. When appropriate, such criticism may be offered in published papers. However, in no case is personal criticism considered to be appropriate. 11. The co-authors of a paper should be all those persons who have made significant scientific contributions to the work reported and who share responsibility and accountability for the results. Other contributions should be indicated in a footnote or an "Acknowledgments" section. An administrative relationship to the investigation does not of itself qualify a person for co-authorship (but occasionally it may be appropriate to acknowledge major administrative assistance). Deceased persons who meet the criterion for inclusion as co-authors should be so included, with a footnote reporting date of death. No fictitious name should be listed as an author or co-author. The author who submits a manuscript for publication accepts the responsibility of having included as co-authors all persons appropriate and none inappropriate. The submitting author should have sent each living co-author a draft copy of the manuscript and have obtained the co-author's assent to co-authorship of it. C. ETHICAL OBLIGATIONS OF REVIEWERS OF MANUSCRIPTS 1. Inasmuch as the reviewing of manuscripts is an essential step in the publication process, and therefore in the operation of the scientific method, every scientist has an obligation to do a fair share of reviewing. 2. A chosen reviewer who feels inadequately qualified to judge the research reported in a manuscript should return it promptly to the editor. 3. A reviewer (or referee) of a manuscript should judge objectively the quality of the manuscript, of its experimental and theoretical work, of its interpretations and its exposition, with due regard to the maintenance of high scientific and literary standards. A reviewer should respect the intellectual independence of the authors. 4. A reviewer should be sensitive to the appearance of a conflict of interest when the manuscript under review is closely related to the reviewer's work in progress or published. If in doubt, the reviewer should return the manuscript promptly without review,

69

advising the editor of the conflict of interest or bias. Alternatively, the reviewer may wish to furnish a signed review stating the reviewer's interest in the work, with the understanding that it may, at the editor's discretion, be transmitted to the author. 5. A reviewer should not evaluate a manuscript authored or co-authored by a person with whom the reviewer has a personal or professional connection if the relationship would bias judgment of the manuscript. 6. A reviewer should treat a manuscript sent for review as a confidential document. It should neither be shown to nor discussed with others except, in special cases, to persons from whom specific advice may be sought; in that event, the identities of those consulted should be disclosed to the editor. 7. Reviewers should explain and support their judgments adequately so that editors and authors may understand the basis of their comments. Any statement that an observation, derivation, or argument had been previously reported should be accompanied by the relevant citation. Unsupported assertions by reviewers (or by authors in rebuttal) are of little value and should be avoided. 8. A reviewer should be alert to failure of authors to cite relevant work by other scientists, bearing in mind that complaints that the reviewer's own research was insufficiently cited may seem self-serving. A reviewer should call to the editor's attention any substantial similarity between the manuscript under consideration and any published paper or any manuscript submitted concurrently to another journal. 9. A reviewer should act promptly, submitting a report in a timely manner. Should a reviewer receive a manuscript at a time when circumstances preclude prompt attention to it, the unreviewed manuscript should be returned immediately to the editor. Alternatively, the reviewer might notify the editor of probable delays and propose a revised review date. 10. Reviewers should not use or disclose unpublished information, arguments, or interpretations contained in a manuscript under consideration, except with the consent of the author. If this information indicates that some of the reviewer's work is unlikely to be profitable, the reviewer, however, could ethically discontinue the work. In some cases, it may be appropriate for the reviewer to write the author, with copy to the editor, about the reviewer's research and plans in that area. D. ETHICAL OBLIGATIONS OF SCIENTISTS PUBLISHING OUTSIDE THE SCIENTIFIC LITERATURE 1. A scientist publishing in the popular literature has the same basic obligation to be accurate in reporting observations and unbiased in interpreting them as when publishing in a scientific journal. 2. Inasmuch as laymen may not understand scientific terminology, the scientist may find it necessary to use common words of lesser precision to increase public comprehension. In view of the importance of scientists' communicating with the general public, some loss of accuracy in that sense can be condoned. The scientist should, however, strive to keep public writing, remarks, and interviews as accurate as possible consistent with effective communication. 3. A scientist should not proclaim a discovery to the public unless the experimental, statistical, or theoretical support for it is of strength sufficient to warrant publication in the scientific literature. An account of the experimental work and results that support a public pronouncement should be submitted as quickly as possible for publication in a scientific journal. Scientists should, however, be aware that extensive disclosure of research in the public press might be considered by a journal editor as equivalent to a preliminary communication in the scientific literature.

Figure 9. Ethical Guidelines to Publication of Chemical Research (Reprinted with permission. © 1985 1993 American Chemical Society)

12 Summary

It has become common practice in the sciences for peers (experienced colleagues) to be requested to assess the qualifications of applicants for academic positions, the quality of research proposals, and the suitability of manuscripts submitted for publication in professional journals. For a number of years these "gatekeepers of science", as Crane (1967) and Luck (1981, 1982) describe the reviewers or referees, have themselves been the subject of sharp criticism. Indeed, the publishing world has been asked to dispense with manuscript review altogether, substituting in its place a system of "uncensored" publication (Kornhuber, 1988). Critics of the peer-review system contend that the process is unreliable, invalid, and most damaging to the best research—that which is innovative (cf. Ross, 1980, 1993; Kornhuber, 1988; Bornstein, 1991). In view of the significance that has been attached to the peer review process as an instrument of self-discipline within science, it must come as some surprise that the process itself has rarely been the subject of empirical investigation (Bailar & Patterson, 1985, p. 654). Empirical research has concentrated primarily on questions of reliability and fairness in the peer-review process. By contrast, there has been little study of the validity of reviewer judgments and editorial decisions. The present study has taken the manuscript review process of the journal Angewandte Chemie as the basis for an inquiry into the question of the extent to which criticism of the peer-review process is justified. At the heart of the study are the three criteria for professional judgment: reviewer agreement (reliability), fairness, and predictive validity. The present evaluation is based on the 449 communications received from throughout the world for possible publication in Angewandte Chemie during the year 1984. Each communication was evaluated by two external referees working with fully-structured reviewing forms and operating under the principle of one-sided anonymity (i.e., the referees knew the names of the authors, but the authors did not know those of the referees). All reviewers received from the editor-in-chief both a fully structured reviewing form and a comment sheet. An analysis of the reliability of the manuscript review process led to the following results: In agreement with the findings of others, the kappa and intraclass-correlation coefficients observed in this study fell in the range 0.12-0.25. Low concordance coefficients were established not only for the ultimate reviewer recommendations, but also with respect to the other questions on the reviewing form, dealing with the content, data, length, and form of the manuscript. From a statistical standpoint, the observed extent of referee agreement must be regarded as rather unsatisfying.

Guardians of Science: Fairness and Reliability of Peer Review. H.-D. Daniel Copyright © 1993 VCH Verlagsgesellschaft mbH, Weinheim ISBN: 3-527-29041-9

72

12 Summary

Nevertheless, no basis exists for dramatizing these low concordance coefficients, for the following reasons: • Major discrepancies between referee votes are relatively rare: in only about seven percent of all cases did one referee recommend acceptance without alterations while the other recommended rejection. • The true level of existing agreement is systematically underestimated, because diverging opinions generally reflect not only the discordance component, the factor of interest, but also dislocational components (Lienert, 1987, p. 320). Differences in the frames of reference for reviewer ratings (e.g., referee A may be fundamentally inclined to rate all manuscripts one level lower than referee B) must not be interpreted as truly discordant judgments. Data available for the present study unfortunately do not permit a clear separation of these two components through a one-way analysis of variance with repeated measures, because in most cases a given pair of reviewers received only a single manuscript to evaluate. For this reason, the observed reliability coefficients suggest a greater degree of discordant judgment than exists in reality. • If one follows the recommendation of Crandall (1978) and treats slightly differing votes as fully concordant (i.e., cases in which the reviewer recommendations differ by only a single category), then the resulting concordance coefficients are much higher. In our study such a "kappa with scores computed as agreement if within one point" (Tolman, Farrier & Farrier, 1988, pp. 3-4) produces a value of 0.67. A concordance coefficient of this magnitude is generally interpreted as reflecting a substantial degree of reviewer agreement. • Editors of professional journals usually solicit as their reviewers one specialist and one generalist. A content analysis of reviewer comments indicates that the arguments provided by two reviewers examining the same manuscript rarely touch on the same points—a result entirely consistent with the wishes of the editors (cf. Bakanic, McPhail & Simon, 1989; Fiske & Fogg, 1990). A high level of agreement in the recommendations cannot be anticipated if editors select their reviewers according to the principle of complementarity as a way of avoiding redundant comments (Hargens, 1991). • As illustrated by the example of Yalow (cf. note 9), a high level of agreement between two reviewers actually proves relatively little, because the reviewers might be equally mistaken in their judgments. In other words, high reliability provides no assurance whatsoever of valid conclusions (cf. Kraemer, 1991).69 • For these reasons, Mahoney (1985) is actually inclined toward skepticism in the face of a high level of reviewer agreement: "Many of the attacks and defenses of peer review and editorial policies have focused on the issue of reliability... and have overlooked the frailty of consensus as a form of epistemic warrant. Enforced reliability is not a likely solution; indeed, it might well exacerbate the problem" (p. 32, footnote 2; emphasis in the original). In addition to questions of reliability, this study also explored the frequently expressed concern of authors that there exist both lenient and harsh reviewers. If this were indeed the case, then it could pose a threat to the fairness of a review in a particular case. Consistent with the findings of Siegelman (1991), we were able to show that very few scholars belong

12 Summary

73

in one or the other of these reviewer categories. Most reviewers show no general tendency to issue either favorable or unfavorable judgments. From among the numerous publication biases discussed in the literature, we have chosen three for further investigation. Specifically, analyses were conducted into the influence on reviewer recommendations and editorial decision of: • the academic status of the corresponding author, • the subject area applicable to the communication, and • the nationality of the corresponding author. Consistent with Merton's interpretation of the Matthew effect (cf. Merton, 1968,1988), we have found evidence to suggest that communications from professors do tend to be rated more highly than those from corresponding authors possessing only a doctorate. The trend toward a more favorable treatment of manuscripts from professors is also apparent in editorial decisions: 77% of the communications submitted by professors were accepted by Angewandte Chemie for publication, whereas only 66% were accepted from corresponding authors whose highest academic degree was a doctorate. Angewandte Chemie presents a rather distinctive publication profile: 40% of all the published communications involved work that could be described as organometallic in nature. Even though communications concerned with organometallic chemistry also achieved relatively the most favorable evaluations, only a portion of the unique publication profile of this journal can be attributed to recommendations of reviewers and subsequent decisions by the editors. Indeed, the research reported in Angewandte Chemie is more nearly a reflection of the subject-matter distribution of those manuscripts that were submitted for publication. Like most professional journals, Angewandte Chemie displays some measure of national publication bias. That is to say, communications from corresponding authors within the Federal Republic of Germany were recommended for publication more frequently, and to a statistically significant extent, than communications from foreign authors. Consistent with this finding, the journal accepted for publication 80% of the domestic communications, but only 47% of the communications from foreign countries. However, this national publication bias did not effect all foreign authors to an equal extent. Authors from developing and threshold countries are significantly less successful at introducing their publications into the leading journals of the world than authors from the highly developed industrial nations (cf. Gordon, 1979b). It is often impossible for scholars in less developed nations to pursue internationally competitive research due to the lack of an adequate infrastructure (e.g., laboratories, computers, and libraries). The example of national publication bias underscores a problem that affects bias research generally: a lack of experimentally derived findings makes it impossible to establish unambiguously whether work from a particular group of scientists is the recipient of better reviews and thus a higher acceptance rate due to preferential biases affecting the review and decision-making process, or if favorable review and greater success in publication is a simple consequence of the high scientific quality of the corresponding manuscripts. It will presumably never be possible to eliminate all doubts regarding the fairness of the reviewing process. For this reason there would surely be merit in presenting authors with the option of choosing manuscript review according to a double-blind regimen.

74

12 Summary

The validity of the peer-review process was investigated in the present study in analogy with the work of Wilson (1978) and Lock (1985). We began by identifying journals that accepted for publication manuscripts previously rejected by Angewandte Chemie, and noting the corresponding ISI Journal Impact Factors. We then carried out a complete citation analysis, comparing the citation frequencies of manuscripts accepted by Angewandte Chemie with those of rejected manuscripts that were nevertheless published elsewhere. This portion of the investigation led to the conclusion that decisions made by the editors of Angewandte Chemie can be characterized as highly valid: none of the rejected manuscripts appeared in a journal with an ISI Journal Impact Factor higher than that of Angewandte Chemie itself. The mean ISI Journal Impact Factor for journals accepting the rejected manuscripts was 1.747, a value significantly lower than that assigned to Angewandte Chemie (1986: 5.335). Moreover, communications published in Angewandte Chemie were cited in the five succeeding years roughly twice as often as rejected communications that were published elsewhere. The difference in mean citation rates is statistically highly significant. Based on the mean citation rates of accepted manuscripts and rejected manuscripts published elsewhere, Angewandte Chemie editorial decisions can be said to have shown a high predictive validity—just as in the case of the studies by Wilson (1978) and Lock (1985). One might quite properly raise as an argument against this form of validity test the possibility that papers accepted by Angewandte Chemie were cited on average more frequently than those published elsewhere simply because they appeared in a journal with a high Impact Factor. It would thus be useful to establish as part of the validity test how frequently the papers published elsewhere would have been cited had they appeared originally in Angewandte Chemie. Unfortunately, such a control must be consigned to the realm of "thought experiments" unlikely ever to be conducted in practice, because duplicate publication is not permitted—at least not in the physical and life sciences—and violators are subject to strict punishment at the hands of editors (cf., e.g., Wenzel, Maki, Crow, Schaffner & McGowan, 1990). Finally, we have explored the question of how much predictive validity is associated with reviewer judgments and the initial reactions of the editor in the case of manuscripts submitted to Angewandte Chemie. Communications that the editor believed should be accepted were cited on average (statistically significantly) more frequently than communications the editor thought should be rejected, or ones about which the editor was in doubt. On the other hand, it was not possible to establish a relationship beyond the limits of chance between mean citation frequencies and judgments by the reviewers, although a trend was observed in the direction of more frequent citations for papers receiving more favorable reviews. These findings with respect to editorial predictive validity as it relates to Angewandte Chemie suggest there is considerable exaggeration in the claims by Ross (1980, 1993) that editors of professional journals and their reviewers are not in any position to distinguish a good manuscript from a poor one. After presenting our results concerning reliability, fairness, and validity in the manuscript review process at Angewandte Chemie we then discussed various suggestions for reform of the peer-review process. Our discussion makes reference to the following specific proposals: formalization of the reviewing instrument, increasing the number of reviewers, author involvement in the selection of reviewers, dispensing with reviewer anonymity,

12 Summary

75

review by a double-blind procedure, a right to appeal for authors, and the development of uniform guidelines for manuscript review. All these suggestions have as their primary goal increasing the reliability and fairness of the manuscript-review process. Some of the suggestions have already been implemented by various journals. Nevertheless, the experience of editors suggests that not all the proposals for change have in fact proven useful. The literature on reliability, fairness, and predictive validity of the peer-review process has so far completely neglected one important function of manuscript evaluation, namely the improvement of a piece of work on the basis of reviewer comments. In a contribution by Abelson (1980) on the occasion of the 100th anniversary of the journal Science, the author poses the question "But why peer review? Why not an objective, all-knowing, allwise genius to serve as editor?" His response reads: "Such mortals do not exist. It is essential to divide the task of evaluation and to bring expertise to bear on the various papers that are submitted. In many instances, the volume of material and wide scope make it impossible for one person to handle the job. The general experience of many editors is that peer review leads to improvement of nearly every manuscript" (p. 62). The conclusion by Abelson (1980) that nearly every manuscript is improved as a result of reviewer comments certainly applies to communications appearing in Angewandte Chemie. The editors routinely record whether a given communication has been published without change or only after it has been supplemented or revised by the original authors. Nearly two-thirds (63%) of all communications published in Angewandte Chemie are revised by their authors. Comparisons based on mean citation rates reveal that revised manuscripts are cited with precisely the same frequency as manuscripts that have not been subjected to revision (unequal variance Mest, r-value = 0.45, n. s.).70 This may be taken as evidence that reviewers' suggestions for improvement of both the content and the presentation have had a significant impact on a large fraction of the published manuscripts. A linguistic analysis carried out by Kretzenbacher & Thurmair (1992) on a selected set of reviewers' comments prepared for Angewandte Chemie revealed that the reviews consisted of the following components: • • • • • •

a recommendation for acceptance or rejection of the manuscript, correction of details, suggestions for elaborating, shortening, or recasting the manuscript, general remarks regarding the publication type "Communication", general remarks regarding the experimental basis of the manuscript, comments related to the character of the specific publication medium Angewandte Chemie, and (relatively seldom) • personal observations relevant to the corresponding author. Based on this analysis, utterly negative evaluations are quite rare. On the other hand, reviewers do seem to perceive a need for supplying positive comments if they recommend a manuscript for publication. According to Kretzenbacher & Thurmair (1992), metaphors characteristic of a uniformly positive review tend to be taken from the aesthetic realm. Typical epithets (here highlighted by italics) include: attractive manuscript, charming compound, appealing presentation.

76

12 Summary

In the event that a communication contains nothing that is truly novel, reviewers generally recommend rejection. Assertions to this effect are generally supported in a very precise way with pertinent literature citations. Kretzenbacher & Thurmair (1992) encountered in their study only a very few of the irrational and malicious undertones that have been ascribed to reviewers' comments in the literature (cf. Spencer, Hartnett & Mahoney, 1986): "A professional tone generally prevailed even in those reviews recommending rejection, and it was apparent that efforts had been made to improve the final publications— thereby serving the interests of the entire scientific community—through often very detailed suggestions for changes in both content and style" (p. 144). We conclude with a synoptic presentation of four sets of (anonymous) reviewer comments and recommendations, together with the corresponding observations of the editor. These synopses are intended to provide the reader with more insight into the nature of the data upon which our investigation was based, leading to a clearer picture of the content and form of typical reactions of editors and reviewers. The four sets of data are derived from manuscripts differentiated as follows: • accepted communications that were later subject to frequent citation (cf. Synopsis 1); • accepted communications that did not lead to citations (cf. Synopsis 2); • rejected communications published elsewhere and then cited relatively frequently (cf. Synopsis 3); • rejected communications published elsewhere but not subject to citation (cf. Synopsis 4).

A good piece of work!

This sounds like a further extension of familiar techniques.

Good, as always.

It's all there. The hunt is on!

Simple (and short), perhaps useful.

2.

3.

4.

5.

Editor's comment

1.

No.

Since the synthesis follows an essentially familiar approach ... a publication in Angewandte Chemie ... is not justified....

(The comment sheet was not filled out)

? ?

Despite the attraction of the starting compound I cannot convince myself completely to recommend acceptance of the manuscript. I personally would publish these results as a full paper. In principle, none of the reactions described is novel....

(Accepted by telephone)

A very nice paper.

Referee's report #1

+

?

+

ing

1

Rat-

D

A

O

B

A

ing2

Rat-

I recommend the publication of this paper as a communication in Angewandte Chemie. The procedure described here will certainly be welcomed by many research groups. The very lack of a detailed procedure has meant that the cited works of J. S. in this area are almost useless.

(Three specific suggestions for revision)

(Accepted by telephone)

This can be important for structure determination in certain compound classes and is, therefore, of general interest. However, parts of the manuscript in its present form are either ambiguous or obscure.

The communication submitted is very interesting from the standpoint of the results, and the arguments for the proposed structure are exhaustive and convincing.

Referee's report #2

A

A

O

C

A/B

ing2

Rat-

42

42

49

54

60

No. of citations3

Synopsis 1. Editor's and referees' comments together with recommendations on communications cited most frequently after their publication (communications ranked by number of citations)

OO

Π>

3 O

Attractive compound, synthesized and characterized.

(No Editor's Comment)

That is too little! Will be combined with No. 10.

9.

10.

11.

1

Rat-

Acceptance of the manuscript in its present form is recommended, since it is of general interest.

+

(The paper was not submitted to referees.)

This paper is a borderline case for me.

The paper is an important contribution to a very timely class of phosphorus compounds. The paper is objective and technically beyond reproach, and I recommend its acceptance for publication.

+

9

(The comment sheet was not filled out).

The X-ray structure analysis presented here is extremely up-to-date and of general interest. After the following inconsistencies have been removed, publication of the results in Angewandte Chemie can be warmly recommended.

Referee's report #1

+

O

ing

"Do you recommend acceptance of the Communication?" (A) Yes, without alterations (B) Yes, after minor alterations (C) Yes, but only after major alterations (D) No (O) No rating

A reaction scheme, compact text with experimental data, and a normal length for a communication.

8.

Acceptable Questionable Not acceptable No rating

If the structure is correct, then it is good!

7.

= = = =

The author of the Communication should make reference to X, and X for his part should also cite this communication.

6.

Notes:1 + ? O

Editor's comment

No.

Synopsis 1: continued Rat-

O

B

B

A

C

3

ing^

Number of citations 1984/85-1989

(The comment sheet was not filled out)

The paper represents a good contribution to the chemistry of arsenic containing metal complexes. The authors have to recall and to quote the previous metal complexes containing the As3 unit (4 references).

An interesting paper which, because of the novel construction of a P., is of general importance, and whose publication I recommend without qualification.

For the silicon chemist the findings are very interesting, for the heterogeneous readership of Angewandte Chemie less so. Therefore, I recommend suggesting that the author publish the paper in a journal for experts....

Fig. 2 should be improved if possible.

Referee's report #2 Rat-

33

34

O

A

34

37

39

tions3

No. of cita-

A

D

B

ing2

Editor's comment

Better than No. X (rejected communication), but still no masterpiece.

This rearrangement type is described in (ref. 7); the paper is boring if anything.

Very nice!

Rather specialized!

Rather specialized!

No

1.

2.

3.

4.

5.

?

9

(The comment sheet was not filled out)

The manuscript is recommended for acceptance. (Three specific suggestions for alterations)

ion.

A nice paper describing the characterization of the yet unknown homothiopyrylium

The authors describe an unusual rearrangement of the structural framework into carbanions, which is worth publication in Angewandte Chemie.

9

+

I consider the observation remarkable.... The resulting products should be of pharmacological interest. Recommend acceptance.

Referee's report #1

?

Rating1

B

B

B

B

A

Rating2

This short paper is nicely written. It is dedicated to a small, detailed problem that certainly only a few readers will find exciting.... In other words, there are certainly more important papers, but standards of quality are always relative.... The only doubts I have apply to the exaggerated tone in the title and the introduction, which creates the impression that the problem of

(Four specific comments and suggestions for revision)

The synthesis of ... is interesting, but in my opinion it does not justify prompt publication as a communication in Angewandte Chemie. The reaction products have been partially described already, and are of no further general importance. The authors should be advised to publish the results of this investigation, after its completion, as a full paper.

(One specific suggestion for revision)

Referee's report #2

Synopsis 2. Editor's and referees' comments together with recommendations on eight uncited communications published by Angewandte Chemie

B

C

D

B

Rating2

Acceptable Questionable Not acceptable No rating

+

+

9

Rating1

The paper is experimentally flawless and convincingly documented. On the other hand, novelty is lacking in the results, a characteristic of special interest for publication in Angewandte Chemie.

Certainly X is not a priority subject for Angewandte Chemie, and the submitted original paper can expect to attract only limited interest. It is, however, one of the few experimental chemical papers on the nature of X containing a new methodological approach. It should also be appreciated that this original contribution from (...) is dedicated to its director. I would still like to recommend publishing this paper as a communication.

Deserves attention for its report of the first successful isolation of (...)· Acceptance of the communication, which I recommend, would in my opinion require the following changes ... (6 specific suggestions).

Referee's report #1

"Do you recommend acceptance of the Communication?" (A) Yes, without alterations (B) Yes, after minor alterations (C) Yes, but only after major alterations (D) No (O) No rating

(No editor's comment)

8.

= = = =

Interesting

7.

+ ? O

Useable, since interesting, but somewhat sloppy.

6.

Notes:

Editor's comment

No

Synopsis 2: continued

D

B

C

Rating2

This is a very nice work in an area of current interest and it should be published. Of relevance to this work is the report of ... (one reference).

This paper is certainly interesting, and one must surely make allowances for the fact that results with such a complex starting material can in fact only be interpreted phenomenologically. Despite this, the authors should interpret more carefully.... The authors should explain (experimentally) why the oxygen content increases so dramatically.

The paper is a nice contribution to the mechanism of P. formation. Acceptable after revision and adaptation to the Angewandte Chemie style.

the century has been solved here. This is not the case, and the paper should be "moderated". Interestingly, no relevant bibliography is offered.

Referee's report #2

A

C

B

Rating2

1

OO O

Nice paper in the tradition E.,

2.

o.k.

One page is enough for these meagre results!

Editor's comment

1.

No.

+

7

Rating1

In my opinion the results reported in this manuscript are not original enough to be published as a communication in Angewandte Chemie. The observation ... is already well established in the literature (6 references). In addition, the new X complexes ... do not represent a really new type of organometallic complex.

The Japanse authors cited in ref. 4 are the first to describe the ... hydrolysis of ... in Tet 32, 1893(1976). But since these authors started with a ... mixture ... and obtained relatively only very modest chemical and optical results, this carefully-written paper should be published in Angewandte Chemie. The references for this paper should be checked.

Referee's report #1 Referee's report #2

The authors show that.... This is very nice, and definitely should be published, but not as a communication in Angewandte Chemie; the X trick is well known, high Y values have already been obtained with this reaction, and even the further use of the products in the Z synthesis have been published (6 literature references). The authors should be congratulated for their useful results, but should also be advised to publish this with full experimental details in Annalen or Chemische Berichte.

This paper will attract general interest (at least among the organometallic experts). The underlying difficulty, which should show the meaning of the results in their true light, is expressed much more clearly, in my opinion, in the accompanying letter than in the manuscript.

Rating2

B

D

B

D

Rating2

The lability of ... is already well documented, above all in papers by the author of this manuscript. The question is therefore raised whether or not the findings ... are sufficiently novel to justify publication in Angewandte Chemie.... While the paper at hand is convincing with respect to quality, I would like to express my reservations regarding the novelty of its contents. The jour-

This manuscript is not suitable for publication. Much of this work is obvious, and in any event a substantial fraction of it has appeared in print in the latest Communication by C. S. The general strategy is thoroughly understood, and several industrial laboratories have already examined these problems before C. S. A full paper on this work, with complete experimental description, might be worth the effort. It is not, however, communication material. I would also point out that a number of important details are omitted from the paper.

Special referee's report

O

O

Rating2

25

35

No. of citations3

Synopsis 3. Frequently cited communications not accepted for publication by Angewandte Chemie, but published elsewhere (communications ranked by numbers of citations)

!

Sloppy, but useful.

This is also not a great step forward!

4.

Editor's comment

3.

No

Synopsis 3: continued

?

9

Rating1

The manuscript ... constitutes an interesting contribution to the chemistry of the X reaction. To avoid looking like a purely intellectual exercise over a piece of unsurprising (cf. ref. 7) chemistry, the authors must clarify the following (five) points: ...

Should be handled within the context of a broader summary.

Referee's report #1

C

D

Rating2

This communication should be understood as the investigation of a mechanistic detail from an earlier publication by the authors.... Although some interesting deuteration experiments are presented, there can be no question of publication in Angewandte Chemie because the principal conclusions are not unusual.... This paper should be ... published in a journal that deals mainly with organometallic chemistry or catalytic processes.

Borderline case!

Referee's report #2

D

O

Rating2

... the existence is ... not unlikely. Because of this the novelty of the communication at hand and its interest for a wide circle of readers is limited.

This result is interesting, but neither unexpected nor new (cf. ref. 5).

nals J. Organomet Chem., Chem. Ber. (note), or HeIv. would not be unsuitable as publication vehicles.

Special referee's report

D

D

Rating2

19

24

No. of citations3

§

oo to

Editor's comment

(No editor's comment )

(No editor's comment)

No,

5.

6.

Synopsis 3: continued Referee's report #1

This is a nice piece of work in which interesting results are presented. However, previous work (one literature reference) has shown that ... As a result of (...), the observation of X et al. are not exceptional.... Although the spectroscopic results are of interest, rapid publication in Angewandte Chemie does not seem to be justified, and a full paper would probably be more appropriate. From the standpoint of their constitution the complexes are in no sense new, and therefore publication in Angewandte Chemie is not justified.... As with every paper by X, this one contains sound chemistry, but according to the Angewandte Chemie guidelines it is not suitable for this journal. Suggestion: note in Z furNaturforsch.

Rating1

?

O

B

... it would be nice if 13CNMR spectra could be measured and the results present-

C ed.

B

Rating2

(Two notes about revising the communication)

Referee's report #2

D

Rating2

Special referee's report Rating2

16

16

tions3

No. of cita-

C/2

v;

+ ? O

= = = »

Acceptable Questionable Not acceptable No rating

(No editor's comment)

7.

Notes:

Editor's comment

No

Synopsis 3: continued

?

Rating1

D

Rating2

B

Rating2

Special referee's report

Number of citations 1984/85-1989

The communication can, in principle, be recommended for acceptance. I suspect the circle of interested readers is rather large, since compounds of this type are in fashion just now. Normally such papers are preceded by more rousing introductions, however, in order to secure the hoped-for attention for the product. The author should try to correct this. Regarding the experimental details, I would like to say that the information given is worse than deficient for the supplemental compounds. Lastly, I would like to point out that the laboratories in X have done excellent work in recent years, the results of which usually appear in Chem. Comm. This should be taken into consideration, since we should take a healthy interest in keeping our German journals "in the marketplace".

Referee's report #2

"Do you recommend acceptance of the Communication?" (A) Yes, without alterations (B) Yes, after minor alterations (C) Yes, but only after major alterations (D) No (O) No rating

(The comment sheet was not filled out.)

Referee's report #1

Rating2

15

tions3

No. of cita-

on *<

A lot of speculation, nothing much substantially new!

Interesting reactions ... very badly written!

(No editor's comment)

2.

3.

Editor's comment

1.

No

?

?

?

Rating1

The paper ... is, in my opinion, of interest to people working in the area of photolysis of water. The manuscript requires a change. The last section on pg. 4 should be dropped. The information as presented simply does not make sense.

Undoubtedly the authors have made some interesting observations. However, the presented experimental material does not allow for more than some interesting speculation with regard to vital points of the addressed problem. There are no experimental data.... Therefore, I would recommend publication of this work in Angewandte Chemie only after secure confirmation of the proposed structure by, e.g., X-ray crystallography.

The paper at hand does offer an interesting approach to peptide synthesis, but this method is not of preparative significance since ... the competing processes for protective groups and peptide couplings are far superior to this method in terms of yield and expected freedom from racemization. There is, therefore, no reason for a communication.

Referee's report #1

B

D

O

Rating2

The title of this paper implies that X has been discovered as a new photocatalyst for generating hydrogen. This is not the case, however, a fact which only comes to light in the last paragraph of page 2.... There is no experimental proof that X is superior to Y, and that the hydrogen gen-

This is quite an interesting paper on an unusual reaction in organometallic chemistry. (One question to the authors about the data received.) ... The manuscript has been prepared with less than the necessary care ...

I would like to draw particular attention to the fact that what the authors call the "acyl displacement reaction" is referred to as unknown. This is not the case: (one literature reference).... The modest yields, to which the authors themselves admit, should not exactly raise the entire synthetic sequence to the level of a successful peptide synthesis.... It would make more sense if this paper were withheld until the authors are able to demonstrate peptide preparation ... with a clean experimental example.

Referee's report #2

Synopsis 4. Communications not accepted for publication by Angewandte Chemie but published elsewhere, and which had not been cited by the end of 1989

D

C

C

Rating2

O>

I

Edito"s comment

Too specialized. Follow-up paper (Chem. Commun.)

Nothing new.

Interesting and preparatively sound.

No

4.

5.

6.

Synopsis 4: continued

(Assessment unnecessary)

(Assessment unnecessary)

This manuscript is not suitable for publication in Angew. Chem. for several reasons: 1 . This manuscript describes mostly the synthesis of X. However, the synthesis itself is quite mundane, particulary after similar types of compounds are being published in lnorg. Chim. Acta. 2. The characterization of compounds were treated too briefly. 3. The open dimer conformation as well as the failure to achieve the 4 e~ reduction of oxygen defeats the goals of this work. 4. The lack of experimental detail on the electrochemical studies renders this aspect of work impossible to evaluate. 5. The Y, which is part of the title of this manuscript, was completely ignored in the characterization and subsequent electrochemical study. If it is not discussed, why mention it?

?

Referee's report #1

_

Rating1

D

Rating2

This is a competently executed piece of work in a field of actual interest.... Except for the dimers .... which are described in the accompanying preprint of the authors, the ... dimers in this manuscript represent the only Y dimers reported to date. The manuscript is well organized and the documentation is good. It is suggested that the manuscript is accepted for publication after consideration of the following (five) comments: ....

eration is actually catalytic with respect to X.... (T)he author should be able to prove in particular—and in contrast to the referee's opinion—why the X system presented here should be particulary worth citing.

Referee's report #2

B

Rating2

C/3 ^

Acceptable Questionable Not acceptable No rating

Useable chemistry

9.

= = = =

Too specialized

8,

Notes:1 + ? O

(No editor's comment)

Editor's comment

7.

No

Synopsis 4: continued

Communication is very poorly written. Paper should be presented within a larger framework. (Transcript of a statement over the telephone).

I have no basic misgivings about the paper. The form and content are flawless and very succinct, but in my opinion it does not fullfil the criteria for a communication in Angewandte Chemie since 1 . The important subsidiary step ... has already been published in Angewandte Chemie (cf . ref. 3) 2. ... has already been reported on as well (...,cf. ref. 3) 3. X reactions as such are not new 4. Y (ref. 3) and Z (submitted manuscript) are simply too similar.

?

?

"Do you recommend acceptance of the Communication?" (A) Yes, without alterations (B) Yes, after minor alterations (C) Yes, but only after major alterations (D) No (O) No rating

The authors describe the interesting reaction.... In this way the investigations proceed far beyond the hitherto known reactions dealing with Diels-Alder reactivity of phoshaalkenes. Therefore, I recommend acceptance of the paper for publication in Angewandte Chemie.

Referee's report #1

?

Rating1

D

O

B

Rating2

The paper describes a piece of sound chemistry.... On the other hand I do not wish to hold back my reservations: - The transformation ... is, in principle, known (one literature reference). - The fact that the ... sulfur can be ... methylated appears trivial, and therefore not very characteristic.

As is, few can make much sense out of the tables ... The paper is not written in an especially suitable manner for Angew. Chem. Dr. X, whose work in the area has not been cited, should be given an opportunity to review the paper.

Acceptance of the paper as a communication in Angewandte Chemie is justfied. NB: The first ... with phosphaalkenes were published as early as 1982 (one literature reference)....

Referee's report #2

A

C

B

Rating2

Notes

1. This study was originally prepared with support from the Deutsche Forschungsgemeinschaft under the title "Peer-Review als Selbststeuerungsinstrument der Wissenschaft, untersucht am Beispiel von Manuskriptbegutachtungen fur die Zeitschrift Angewandte Chemie" ("Peer Review as an Instrument for the Self-Regulation of Science, Examined Using the Journal Angewandte Chemie as an Example") in the context of the priority program "Science of Science". I wish to express my appreciation to Ruth Schlaf, Claudia Stumpp, Andreas Hohenstein, Frank Miinzinger, and Sebastian Zieger for their assistance in the project. 2. For a comprehensive review of the most important models for the development of science, together with factors controlling that development, see Daniel & Fisch (1987, in press). 3. The meaning of the word "peer" is characterized by the Encyclopedia Americana (Price, 1975, p. 469) as follows: "The general sense of the word peer is that of equal and in England the claim of a man to 'trial by his peers' established by Magna Carta was, in effect, the acknowledgment of his right to be judged by his social equals". 4. In addition to shortcomings with respect to reliability, fairness, and validity, the peer-review process is also charged with not being able to recognize scientific fraud. For overviews of the subject see Broad & Wade (1983) and Chalk (1988). 5. According to Hargens (1990, pp. 1348 f.), journals in physics are exceptions to this rule. In place of the usual "two initial referees" system, these journals in most cases employ a "single initial referee" system. 6. In many cases a product-moment correlation is computed instead of an intraclass correlation. When presented with a product-moment correlation coefficient r = 0.54, Hendrick (1976), editor of the journal Personality and Social Psychology Bulletin, initially indicated that he was "quite pleased at the relatively high level of reviewer agreement on PSPB papers" (p. 207). In a later editorial, however, Hendrick (1977) declared: "It is with embarrassment and disappointment that I must revise those reported reliabilities, downward. I am indebted to Lawrence J. Strieker for questioning the appropriateness of the product moment correlation that was used to compute the reliabilities. Without belaboring the details the appropriate index was the intraclass correlation. The value of the intraclass coefficient was + .21 .... Clearly the level of consensus between reviewers on acceptability of manuscripts is low" (p. 1). However, in the event that ratings by first and second reviewers have identical means and variances, then the two coefficients are also identical (cf. Ebel, 1951, p. 422). 7. Cf. the critical commentaries on this study in the journal Behavioral and Brain Sciences (1982, pp. 196-246, and 1985, pp. 743-747). 8. Unfortunately, this part of the study is based only on a very small sample: 35 manuscripts and 75 reviewer judgments. Guardians of Science: Fairness and Reliability of Peer Review. H.-D. Daniel Copyright © 1993 VCH Verlagsgesellschaft mbH, Weinheim ISBN: 3-527-29041-9

90

Notes

9. In 1955 the Journal of Clinical Investigation initially rejected a paper from S. A. Berson, R. S. Yalow, A. Bauman, M. A. Rothschild, and K. Newerly with the title "Insulin-I131 Metabolism in Human Subjects: Demonstration of Insulin Transporting Antibody in the Circulation of Insulin Treated Subjects". This paper, which describes a new method for quantitive hormone analysis with radioactively labeled hormones on the basis of antigen-antibody reactions, was also rejected by Science. In 1977 Ms. Yalow was awarded the Nobel Prize in Physiology or Medicine for development of the radioimmunoassay method. Ms. Yalow published the letter of rejection from the Journal of Clinical Investigation in the written version of her Nobel Prize acceptance lecture (cf. Yalow, 1978, p. 1238). 10. The "ISI Journal Impact Factor" is defined as follows: "A measure of the frequency with which the 'average cited article' in a journal has been cited in a particular year. The impact factor is basically a ratio between citations and citable items published. Thus, the 1979 impact factor of journal X would be calculated by dividing the number of all the SCI source journals' 1979 citations of articles journal X published in 1977 and 1978 by the total number of source items it published in 1977 and 1978" (cf. Garfield, 1976 ff.; emphasis in the original). The ISI Journal Impact Factor for Angewandte Chemie for the year 1986 is computed as follows: T

tF

1986-

Citation frequency for publications from 1984 and 1985 during 1986 Number of papers published in the year 1984 and 1985 677

-5335

Apart from the ISI Journal Impact Factor there also exist other impact factors for professional journals (e.g., Schubert, Glanzel & Braun, 1989, pp. 8-10). Weisheit & Regoli (1984) and Todorov & Glanzel (1988) present summaries of the most important techniques for ranking journals. Bauin & Rothman (1991) discuss the journal impact factor as a measure for approximating the citation frequencies of individual publications. 11. I wish to thank Dr. Peter Golitz, editor-in-chief of Angewandte Chemie, and the Gesellschaft Deutscher Chemiker for granting permission to examine the reviewing procedures of this journal. Thanks are also due to the editorial associates for their generous support during the course of the study. 12. In addition to manuscript review, which is the primary "quality filter" for publications, correspondence with the editor acts as a secondary quality filter. For example, during the time period 1947-1979 the journal American Sociological Review received correspondence related to 12% of the published contributions, virtually all of which contained critical comments (cf. Snizek, Dudley & Hughes, 1982). 13. Since the earliest days of professional journals in the 17th century, this type of publication has served researchers above all as a vehicle for establishing priority (cf. Kronick, 1976, p. 56; Roediger, 1987, p. 225). The editor of the Journal of Chemical Physics (JCP) characterizes such "communications" as follows: "Communications in the JCP are reports of preliminary results of 'current and extreme interest to relatively large numbers of workers in the field.' It is expected that a fuller description of the work described in a communication will later be published as a regular article. The only justification for such preliminary publication is that the rapid dissemination of the new results would be of great importance to workers in the field and that the delay in waiting for a complete regular article would substantially impede progress" (Stout, 1986, p. 9). According to Kean & Ronayne (1972), however, over 70% of the short communications appearing in Chemical Communications and Tetrahedron Letters are never followed by full papers pre-

Notes

91

senting all the experimental details of the work. For this reason, journals would be well advised to ensure that even short communications contain adequate experimental information. 14. In a survey conducted by Juhasz, Calvert, Jackson, Kronick & Shipman (1975, p. 179), the editors of journals in 43 disciplines reported that they rejected without external review an average of 10% of all manuscripts submitted. The editors of the journal American Sociological Review reject 5% of the manuscripts received for publication without external review (cf. Bakanic, McPhail & Simon, 1987, p. 634, footnote 10). According to Gordon (1978), the corresponding proportion for both Biometrica (p. 26) and the British MedicalJournal (p. 20) is 10%, while that for Nature (p. 34) is 25%. 15. The comments were analyzed linguistically as representatives of the text type "academic language" by Prof. Harald Weinrich, Dr. Maria Thurmair, and Dr. Heinz L. Kretzenbacher of the Institut fur Deutsch als Fremdsprache (Institute for German as a Foreign Language) of the University of Munich. This linguistic analysis, conducted within the context of the Working Group for Academic Language of the Berlin Academy of Sciences, concentrated on the following aspects of the reviewers' comments: textual construction and specificity with respect to the subject, stylistic considerations related to peer review, terminology used in expressing judgments, and objective and metalinguistic elements (cf. Thurmair & Kretzenbacher, 1991, as well as Kretzenbacher & Thurmair, 1992). 16. The questionnaire used initially was replaced by a new one during the course of 1983. 17. The 43 third and fourth reviews have been ignored in the sections that follow because not all of them represent special evaluations. It was sometimes the practice during holiday periods to contact three reviewers simultaneously, all of whom—contrary to expectations—may have submitted responses. 18. Just as in the case of Angewandte Chemie, roughly half (49%) of the 1085 reviewers for the journal Analytical Chemistry received in 1984 only a single manuscript for review (cf. Petruzzi, 1985, p. 868 A). 19. The editors declined to seek reviews for ten communications, and only one review was available for eleven communications. 20. Due to an oversight during the design of the questionnaire, reviewers were not requested to explain on the accompanying sheet their reasons for recommending rejection of a communication submitted to Angewandte Chemie (cf. p. 12, question 5). Although not specifically requested to do so, most of the reviewers did nevertheless justify their negative votes. 21. Cohen (1968) has pointed out that the weighted kappa coefficient and the product-moment correlation are identical if the k χ k table of joint nominal scale assignments displays identical marginal distributions and the weights (e.g., on a four-level response scale) are selected according to the following pattern: Cells on the principal diagonal (concordant judgments) are assigned the weight O, cells in the adjacent diagonals the weight 1, cells in the next two diagonals 4, and the cells in the upper-right and lower-left corners 9. 22. For the distinction between reliability of a single "average" reviewer and the reliability of a group of reviewers, cf. Asendorpf & Wallbott (1979, p. 244). 23. The mean square error (MSr) in this model of variance analysis incorporate both MSreviewer and MS

residuar

92

Notes

24. Kappa coefficients were computed with the program Kappa from DynaStat (Tolman, Farrier & Farrier, 1988). All other calculations—unless otherwise noted—were carried out with the statistical program package KOSTAS (Nagl, Walter & Staud, 1986). 25. Z-values modified according to Fleiss, Cohen & Everitt (1969). 26. Unfortunately, Walling (n.d.) does not report coefficients of reviewer agreement for the Journal of the American Chemical Society. Walling (n.d.) analyzed reviewer agreement on the basis of a random sample of 121 communications and 85 papers submitted to the journal for publication during 1977. Both reviewers recommended acceptance of 43 of the communications (36%). Rejection was recommended by both reviewers for 31 communications (26%), and 47 communications (39%) received mixed reviews. In the case of the papers, 45 (53%) were unanimously recommended for acceptance, 12 (14%) were unanimously recommended for rejection, and 28 (33%) received mixed votes. 27. According to Cicchetti (1976, p. 455), the minimal sample size (N) for calculating reviewer agreement can be established from the equation N>2k2, where k is the number of response categories. In other words, with 2 response categories it is necessary to consider at least 8 reviewer pairs, and with 4 response categories the number climbs to 32. 28. For the remaining items on the questionnaire (l^l·) the agreement between the reviewers ranges from 64% to 82% (cf. Table 7, p. 24). 29. Cf. also Whitehurst (1984): "The method of determining and correcting for chance agreement stands as the crux of the issue" (p. 22). "A highly skewed distribution of ratings will yield high estimates of chance agreement in just those cells in which obtained agreement is also high. This makes for low estimates of true agreement in the face of high levels of obtained agreement" (p. 25). 30. Cf. in this context Hargens (1991): "Referees' evaluations are often not parallel measures of a latent unidimensional trait, and the low observed associations do not necessarily imply that peerreview evaluations are unreliable" (p. 150). 3 1 . Evaluation criteria utilized by reviewers in establishing their recommendations for professional journals have been examined in the following studies: Armstrong (1982), Coe & Weinstock (1967), Daft (1985), Maher (1978), Patterson, Bailey, Martinez & Angel (1987), Shadish (1989), Smigel & Ross (1970), Spencer, Hartnett & Mahoney (1986), Strauss (1969), and Wolff (1970). Criteria for judging research proposals have been extracted from reviews by Cole, Rubin & Cole (1978), Hartmann (1988, 1990), Neidhardt (1988), and McTavish, Cleary, Brent, Perman & Knudsen (1977). Criteria for evaluating interdisciplinary research projects are provided by Blaschke (1986). A reviewer's questionnaire for evaluating research projects in the social sciences after their completion has been developed and tested by Hartenstein, Boos & Bertl (1988). Various dimensions of the criticism of book reviews are discussed in Champion & Morris (1973), Hartmann & Dubbers (1984), Hartmann (1991), and Weymann (1991). Lewis (1975) has conducted a thematic analysis of letters of recommendation for applicants seeking tenured university positions. All these studies concentrated primarily on identifying criteria for academic quality and academic qualification. The extent to which reviewers and others preparing recommendations maintain consistent standards in evaluating one and the same object has not been studied. 32. Fiske & Fogg (1990, p. 596) report an intraclass correlation coefficient of 0.20. 33. In contrast to Cole (1983), Hargens (1988a, b) supports the premise that cognitive consensus is more prevalent in the natural sciences than in the social sciences, because the rejection rates for

Notes

93

journals in the natural sciences are significantly lower than in the social sciences (cf. also Cole, Simon & Cole, 1988). 34. Unfortunately, the present data cannot be used to test Binet's observation (1912) that "in general, specialists tend to be rather strict within their own field. An anatomist or a surgeon sets higher standards in anatomy than a chemist or a clinician. A professor of Roman law is less forgiving than a political scientist" (p. 23). Smigel & Ross (1970) were not able to confirm the premise within sociology that specialists are stricter than non-specialists: "There is no significant difference in the ratings made by referees in the role of expert and in the role of nonexpert" (p. 20). 35. Smigel & Ross (1970), in their study of reviewers for the journal Social Problems, report that "No support was found for the hypothesis that editorial consultants are differentially lenient and severe in their over-all judgments" (p. 21). 36. The statistical analyses and figures were prepared with the program JMP from SAS Institute, Inc. (cf. Lehman, Sail & Cole, 1989). 37. Several of the biases discussed in the literature could not be investigated more closely in the context of this study, and for various reasons. For example, the role of the prestige level of the institution with which the author is associated could not be analyzed because no ranking yet exists for departments of chemistry in the Federal Republic of Germany (cf. Daniel & Fisch, 1990). The number of female corresponding authors is much too small for a statistical analysis (N = 3), results of replication studies are excluded by definition for the publication of short communications (cf. also note 13), and the statistical significance of a set of findings is of considerably less significance in chemical research than in research in medicine or the social sciences. 38. Authors often appear to withdraw their manuscripts as a way of avoiding rejection by the editors. For this reason, withdrawn manuscripts are treated as rejected manuscripts in the analysis that follows. 39. Merton (1968) bases his terminology on the passage "For whosoever hath, to him shall be given, and he shall have more abundance; but whosoever hath not, from him shall be taken away even that he hath" taken from the Gospel According to Matthew (New Testament, Matthew 13:12). 40. The use of /-tests for the multiple pairwise comparison of a set of means is somewhat problematic, because multiple comparisons increase the probability that a significant result will occur by chance. We have decided not to use the extremely conservative test proposed by Scheffe (1953) because, although he maintains a small probability for making a type one error (false rejection of the null hypothesis), the probability of a type two error (false retention of the null hypothesis) is increased (cf. Eimer, 1978, pp. 76-81). A Bonferroni alpha adjustment was employed instead. In order to keep below .05 the probability that one or more type one errors would occur in the following pair-wise comparisons of means, alphas for the individual decisions were set at: a = 1 0.951^ (cf. Stelzl, 1982, p. 117). Critical t-values for each of k < 10 pairwise comparisons at the 5% alpha level for a one-tailed test according to Bonferroni were taken from Sachs (1990, p. 138). 41. Since the χ2 test was applied more than once to the same set of data, critical values were taken from the Bonferroni χ2 table (cf. Sachs, 1990, pp. 149-151, as well as Table 141 in Sachs, 1978, p. 369). 42. Peters & Ceci (1982) were accused of having violated the ethical principles for research with human subjects, in that the reviewers and editors of journals were deceived, abused, and unwitting participants in a (quasi) experiment (Fleiss, 1982). With respect to the ethical problems asso-

94

Notes ciated with bias research in the context of manuscript review, see also Coughlin (1988) and Mahoney (1990).

43. Assignments of the communications to subject groups were based on the printed version of Chemical Abstracts because, even though the online version includes the search field "Category Codes" (numbers of the abstract sections), it is not possible to limit a search to the single publication type "short communication" (as could be done in the ISI data bases). The search field "Document Type" provides distinctions only between the following types of documents: journals, patents, technical reports, dissertations, books, and conference proceedings. Contributions to professional journals cannot be further differentiated into articles, reviews, or short communications (cf. Mimzinger & Daniel, 1992). 44. Data in each case are based on the corresponding principal entry. 45. In order to ensure consistency in assignments to the subject areas in Chemical Abstracts, the analysis that follows is limited to the 411 communications from the year 1984 that were ultimately published in some form. The 38 communications that could not be located in either Chemical Abstracts or Science Citation Index were ignored. 46. An evaluation from the year 1990 revealed that 80% of all published communications again fell in these four subject areas. Whereas the proportion in Section 29 (Organometallic and Organometalloidal Compounds) dropped from 40% in 1984 to 32% in 1990, the proportions in Sections 22 (Physical Organic Chemistry) and 78 (Inorganic Chemicals and Reactions) rose from their 10% levels in 1984 to 12% and 16%, respectively, in 1990. The proportion of typical organic contributions (Sections 23-28) remained constant at exactly 20%. 47. Cf. Sachs (1978, pp. 251 f.). 48. Section 29 (Organometallic and Organometalloidal Compounds) is circumscribed in Chemical Abstracts (1991,114, p. 792) in the following way: "This section includes the synthesis, stabilization, purification, physical organic studies, reactions, and determination of molecular structure of compounds that contain one or more carbon-metal or carbon-metalloid covalent (sigma and pi) bonds. Metal heterocyclics containing no carbon-metal bond in the ring, e.g., borazine, phosphazene, and cyclotrisiloxane, or homocycles, e.g., cyclopentagermane, are also included here. Simple metal carbonyls, cyanides, carbides, and cyanates are included in Sections 49 or 78". 49. Cited from "Zuviel Organik — zuviel Anorganik" ("Too Much Organic—Too Much Inorganic", Anonymous, 1991). Cf. also note 46. 50. The data were kindly supplied by Judith Watson of the Chemical Abstracts Service (Columbus, Ohio). 51. Chemotherapy was founded as a discipline by Paul Ehrlich, who in 1909 introduced the use of the Organometallic compound salvarsan to combat Treponema pallidum, the causative factor in syphilis (cf. Elschenbroich & Salzer, 1990, p. 197). Ehrlich was a joint recipient with I. Metschnikow of the Nobel Prize for Physiology or Medicine in 1908. 52. The definition of the ISI Journal Impact Factor is provided in note 10. 53. Wilson (1978, p. 1700) reports that 85% of the manuscripts rejected in 1970 by the Journal of Clinical Investigation appeared later in other professional journals. This was also the case for manuscripts rejected in 1975 by the New England Journal of Medicine. 80% of the manuscripts published in other journals are not revised by their authors (Relman, 1980, p. 58). According to Lock (1985, p. 59), 68% of the manuscripts rejected by the British Medical Journal in 1979

Notes

95

appeared subsequently in other journals. By contrast, only 30% of the astronomical manuscripts rejected in 1984 by one of the three journals (Publications of the Astronomical Society of the Pacific, Astronomical Journal, Astrophysical Journal) investigated by Abt (1988) were published elsewhere. Abt (1988) attributes this to the fact that "the field is small enough that authors have few alternatives of other journals and feel that they must opt for revision rather than 'journal hopping'" (p. 508). As a consequence, the rejection rate for these journals is a relatively low 9-12%. 54. The communications submitted for publication to Angewandte Chemie in 1984 actually appeared there or elsewhere between 1984 and 1985. Table 20 gives 1986 ISI Journal Impact Factors for the appropriate journals, because these indicate average citation frequencies for publications from the years 1984 and 1985 in 1986. 55. Teevan (1980) solicited from 35 reviewers appraisals of the relative prestige of six different sociology journals together with estimates of the quality of five (anonymous) articles from each of the journals. The results showed that "highly regarded articles appear in less highly regarded journals ..., while less highly regarded articles appear in highly ranked journals" (p. 112). 56. Citation frequency has been a controversial measure of both quality and scientific progress (cf. Collins, 1991, pp. 19-28; Porter, Chubin & Jin, 1988; MacRoberts & MacRoberts, 1988). Nevertheless, Lawani & Bayer (1983) succeeded in demonstrating for cancer research that publications regarded shortly after their appearance as highly significant by experts in the appropriate research fields were cited much more frequently in subsequent years than publications that were less highly regarded. The Chemistry Division of the United States National Science Foundation carried out a citation analysis in 1973 with the goal "to explore the use of this relatively new tool for what it might tell about the discipline, and its practitioners". The results of the study "generally support the idea that citations are meaningful" (Dewitt, Nicholson & Wilson, 1980, p. 265). 57. I wish to thank Dr. Wolfgang Glanzel (ISSRU, Budapest) and Dipl. Biol. Frank Munzinger (Universitat Konstanz) for their support in the citation analysis. 58. Ghosh & Neufeld (1974) report that of the 83 communications appearing during January and February, 1965, in the Journal of the American Chemical Society (JACS), only a single paper (= 1.2%) failed to receive any citations in the subsequent six years. Of the 252 "letters" published in January, 1965 in Nature, 8.3% received no citations in the subsequent eight years (Ghosh, 1975). According to a study by the Institute for Scientific Information (ISI) in Philadelphia, conducted on behalf of the journal Science, 55% of all articles published between 1981 and 1985 were never cited during the five years after their appearance (cf. Hamilton, 1990). 38.8% of all publications from 1984 in chemistry were not cited in the subsequent four years (cf. Hamilton, 1991). Within chemistry, applied chemistry is the subdiscipline with by far the highest proportion of uncited publications (78%), followed by biochemistry and molecular biology (19.4%), organic chemistry (18.6%), and inorganic and nuclear chemistry (17.0%). 59. The chi-square "goodness of fit" test statistic was computed with the program IRWIN (cf. Glanzel, n.d.). 60. The time period within which a communication might be cited after its publication in Angewandte Chemie varied between 67 and 49 months (average: 61 months). In the case of rejected communications subsequently published elsewhere the corresponding time period varied between 69 and 37 months (average: 55 months). 61. Rejected manuscripts that appeared elsewhere as short communications were cited on average with the same frequency as rejected communications published elsewhere as full papers. The dif-

96

Notes ferences in average citation rates (logarithmically transformed raw scores) are statistically not significant (F1^5 = 0.11, n.s.) if the number of months following publication is held constant for the two types of publication (short communications, average 56 months; full papers, average 49 months).

62. Summary scale, established on the basis of three items each with a seven-point response scale: "Evaluation relative to other works same time/topic", "Evaluation relative to other works any time/same topic", "Overall quality". 63. Summary scale, established on the basis of two items each with a seven-point response scale: "Impact on subject matter", "Impact on psychological knowledge". 64. Since distributions of citations are highly skewed, citation counts were transformed logarithmically. Nevertheless, transformation of the raw data had only a minor influence on the magnitudes of the coefficients: "Interestingly, the transformation had little effect; the largest difference between coefficients based on raw scores and those based on the transformed data was .06" (Gottfredson, 1978, p. 931, footnote 8). Moreover, it appeared that "controlling for self-citation is not necessary" (p. 932). 65. Sabine (1985) reports that the average error rate for ten leading journals in biology and chemistry was 2.3%. According to this study, "corrections" appeared for 2.6% of all the papers published in the Journal of the American Chemical Society in 1983. 66. See also Bowen, Perloff & Jacoby (1972), Patterson (1969), Prathap (1989), and Wright (1970). 67. Koshland (1987, 1989), the editor of Science, points out the fact that cases of plagiarism in the sciences are very rare. Koshland told the New York Times (12 July 1989, p. 7) that of the roughly three million papers published worldwide in the 1980s, only ten to twelve represented significant cases of fraud, although these were subject to an extraordinarily high level of publicity (cited according to DeBakey, 1990, p. 348). 68. The editor of the journal examined the reviewers' comments from the perspective of the following questions: "Did the reviewer give appropriate attention to the importance of the question?", "Did the reviewer target key issues?", "Did the reviewer clearly identify strengths and weaknesses in the study's methods?", "Did the reviewer make constructive comments about the quality of writing and presentation of the data?". Authors were requested to judge the extent to which a review was "thorough", "constructive", "fair", "courteous", and "knowledgeable". Moreover, both the editor and the authors rated the overall quality of the reviews ("Summary grade"). 69. Horrobin (1990, pp. 1440 f.) provides additional examples of groundbreaking papers that were rejected for publication in leading journals on the basis of reviewer recommendations. Nature greatly reduced its dependence on external review of manuscripts during the period 1940-1960 as a result of the fact that reviews had led to embarrassing errors in judgment by the previous editor, R. A. Gregory: "Gale and Brimble used the refereeing system only sparingly. Possibly they had been influenced by the failure of the system during Gregory's reign on three notable occasions. Hans Krebs's work on the citric acid cycle, Urey's work on heavy hydrogen, and Fermi's research on beta-decay, had each been 'rejected' by Nature on the advice of an authority" (Fifield, 1969, p. 232). More recent examples include a paper by H. Michel on a crystallographic investigation of the three-dimensional structure of the photosynthetic reaction center, which was rejected by Nature but appeared in 1982 in the Journal of Molecular Biology. This paper constituted the basis for the first structure determination of a membrane protein, for which H. Michel together with J. Deisenhofer and R. Huber received the Nobel Prize for Physiology or Medicine in 1988 (cf.

Notes

97

Tainer, 1991). H. Rohrer and G. Binnig, who received the 1986 Nobel Prize for Physics, also report that their first groundbreaking piece of work in scanning tunneling microscopy was initially rejected by a journal: "In 1981 the IBM team obtained its first real success ... But the first attempt to publish failed when a journal referee found the paper 'not interesting enough'" (cited according to Fisher, 1989, p. 106). 70. The reworked manuscripts were cited an average of 11.35 times, those published without change 11.86 times.

References

Abelson, P. H. (1980). Scientific Communication. Science, 209(4452), 60-62. Abt, H. A. (1988). What Happens to Rejected Astronomical Papers? Publications of the Astronomical Society of the Pacific, 100, 506-508. Adair, R. K. (1982). A Physics Editor Comments on Peters and Ceci's Peer-Review Study. Behavioral and Brain Sciences, 5 (2), 196. Adair, R. K. & Trigg, G. L. (1979). Should the Character of "Physical Review Letters" be Changed? (Editorial). Physical Review Letters, 43 (27), 1969-1974. Altura, B. T. (1990). Is Anonymous Peer Review the Best Way to Review and Accept Manuscripts? Magnesium and Trace Elements, 9 (3), 117-118. American Chemical Society (1985). Ethical Guidelines to Publication of Chemical Research. Columbus, Ohio. Anderson, R. E. (1990). Guidelines for Review of a Manuscript. Human Pathology, 21 (4), 359-360. Andrews, F. M. (1961). Logarithmic Transformation of Output of Scientific Products. Analysis Memo No. 11. Study of Scientific Personnel. Ann Arbor, MI: Survey Research Center, Institute for Social Research, The University of Michigan. Anonymous (1983). Guidelines to Reviewers. British Journal of Surgery, 70, 236. Anonymous (1989). Report of the Editor. American Economic Review, 79 (2), 405-408. Anonymous (1991). Zuviel Organik - zuviel Anorganik. Angewandte Chemie, 103 (4), A-l 18. Armstrong, J. S. (1982). Research on Scientific Journals: Implications for Editors and Authors. Journal of Forecasting, 7, 83-104. Asendorpf, J. & Wallbott, H. G. (1979). MaBe der Beobachteriibereinstimmung: Ein systematischer Vergleich. Zeitschrift fur Sozialpsychologie, 10, 243- 252. Atkinson, R. C. & Blanpied, W. A. (1985). Peer Review and the Public Interest. Issues in Science and Technology, 1 (4), 101-114. Backes-Gellner, U. & Sadowski, D. (1988). Validitat und Verhaltenswirksamkeit aggregierter MaBe fur Forschungsleistungen. In H. -D. Daniel & R. Fisch (Hrsg.), Evaluation von Forschung: Methoden - Ergebnisse - Stellungnahmen (S. 259-290). Konstanz: Universitatsverlag Konstanz. Bailar, J. C. (1991). Reliability, Fairness, Objectivity and Other Inappropriate Goals in Peer Review. Behavioral and Brain Sciences, 14 (1), 137-138. Bailar, J. C. Ill & Patterson, K. (1985). Journal Peer Review - The Need for a Research Agenda. New England Journal of Medicine, 312 (10), 654-657. Bakanic, V., McPhail, C. & Simon, R. J. (1987). The Manuscript Review and Decision-Making Process. American Sociological Review, 52, 631-642. Bakanic, V., McPhail, C. & Simon, R. J. (1989). MIXED MESSAGES: Referees' Comments on the Manuscripts They Review. Sociological Quarterly, 30 (4), 639-654. Baue, A. E. (1985). Peer and/or Peerless Review. Archives of Surgery, 120 (8), 885-888. Bauin, S. & Rothman, H. (1991). Der 'Impact' von Zeitschriften als AnnaherungsmaB fur Zitationsraten. In P. Weingart, R. Sehringer & M. Winterhager (Hrsg.), Indikatoren der Wissenschaft und Technik: Theorie, Methoden, Anwendungen (S. 91-111). Frankfurt/Main: Campus.

Guardians of Science: Fairness and Reliability of Peer Review. H.-D. Daniel Copyright © 1993 VCH Verlagsgesellschaft mbH, Weinheim ISBN: 3-527-29041-9

100

References

Beck, U. & Hartmann, H. (1983). Wer 1st der Schonste im ganzen Land? Uberlegungen zur Auswahl eines preiswurdigen Zeitschriftenaufsatzes (Herausgebermitteilung). Soziale Welt, 34, 257-269. Begg, C. B. & Berlin, J. A. (1989). Publication Bias and Dissemination of Clinical Research. Journal of the National Cancer Institute, 81 (2), 107-115. Binet, A. (1912). Die neuen Gedanken tiber das Schulkind. Leipzig: Wunderlich. Blank, R. M. (1991). The Effects of Double-Blind versus Single-Blind Reviewing: Experimental Evidence from The American Economic Review. American Economic Review, 81 (5), 1041-1067. Blaschke, D. (1986). Zur Beurteilung interdisziplinarer sozialwissenschaftlicher Forschung. In R. Fisch & H.-D. Daniel (Hrsg.), Messung undForderung von Forschungsleistung: Person - TeamInstitution (S. 167-189). Konstanz: Universitatsverlag Konstanz. Bornstein, R. F. (1991). The Predictive Validity of Peer Review: A Neglected Issue. Behavioral and Brain Sciences, 14 (1), 138-139. Bortz, J., Lienert, G. A. & Boehnke, K. (1990). Verteilungsfreie Methoden in der Biostatistik (Kapitel 9: Urteilerubereinstimmung, S. 449-502). Berlin: Springer. Bowen, D. D., Perloff, R. & Jacoby, J. (1972). Improving Manuscript Evaluation Procedures. American Psychologist, 27, 221-225. Braam, R. R. & Bruil, J. (1992). Quality of indexing information: authors' views on indexing of their articles in Chemical Abstracts online CA-file. Journal of Information Science, 18, 399-408. Brackbill, Y. & Korten, F. (1970). Journal Reviewing Practices: Authors' and APA Members' Suggestions for Revision. American Psychologist, 25 (7), 937-940. Braun, T. & Nagy, J. I. (1982). A Comparative Evaluation of some Hungarian and other National Biology, Chemistry, Mathematics and Physics Journals. Scientometrics, 4 (6), 439-455. Broad, W. & Wade, N. (1983). Betrayers of the Truth. New York, NY: Simon & Schuster. Brown, H. C. (1980). Aus kleinen Eicheln wachsen grofie Eichen - von den Boranen zu den Organoboranen (Nobel-Vortrag). Angewandte Chemie, 92 (9), 675-683. Campbell, D. T. (1974). Evolutionary Epistemology. In P. A. Schilpp (Ed.), The Philosophy of Karl Popper (pp. 413-463). La Salle, Illinois: Open Court. Ceci, S. J. & Peters, D. (1984). How Blind Is Blind Review? American Psychologist, 39 (2), 14911494. Chalk, R. (Ed.). (1988). Science, Technology, and Society (Chapter 7: Fraud and Misconduct in Science). Washington, DC: American Association for the Advancement of Science. Champion, D. J. & Morris, M. F. (1973). A Content Analysis of Book Reviews in the AJS, ASR, and Social Forces. American Journal of Sociology, 78 (5), 1256-1265. Chemical Abstracts Service (1984 ff.). Chemical Abstracts - Key to the World's Chemical Literature. Columbus, Ohio: Chemical Abstracts Service. Chubin, D. E. (1982). Peer Review and the Courts: Notes of a Participant Scientist. Bulletin of Science, Technology and Society, 2, 423-432. Chubin, D. E. & Hackett, E. J. (1990). Peerless Science: Peer Review and U.S. Science Policy. Albany, N.Y.: State University of New York Press. Cicchetti, D. V. (1976). Assessing Inter-Rater Reliability for Rating Scales: Resolving some Basic Issues. British Journal of Psychiatry, 129, 452-456. Cicchetti, D. V. (1985). A Critique of Whitehurst's "Interrater Agreement for Journal Manuscript Reviews": De Omnibus, Disputandem Est. American Psychologist, 40, 563-569. Cicchetti, D. V. (1988). When Diagnostic Agreement is High, but Reliability is Low: Some Paradoxes Occurring in Joint Independent Neuropsychology Assessments. Journal of Clinical and Experimental Neuropsychology, 10 (5), 605-622. Cicchetti, D. V. (1991). The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation. Behavioral and Brain Sciences, 14, 119-135. Cicchetti, D. V. & Eron, L. D. (1979). The Reliability of Manuscript Reviewing for the Journal of

References

101

Abnormal Psychology. Proceedings of the American Statistical Association (Social Statistics Section), 22, 596-600. Cicchetti, D. V. & Feinstein, A. R. (1990). High Agreement but Low Kappa: II. Resolving the Paradoxes. Journal of Clinical Epidemiology, 43 (6), 551- 558. Coe, R. K. & Weinstock, I. (1967). Editorial Policies of Major Economic Journals. Quarterly Review of Economics & Business, 7(1), 37-43. Cohen, J. (1960). A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, 20 (1), 37-46. Cohen, J. (1968). Weighted Kappa: Nominal Scale Agreement with Provision for Scaled Disagreement or Partial Credit. Psychological Bulletin, 70 (4), 213-220. Cohen, R. (1973). Patterns of Personality Judgment. New York: Academic Press. Cole, S. (1983). The Hierarchy of the Sciences. American Journal of Sociology, 89, 111-139. Cole, S. (1991). Consensus and the Reliability of Peer-Review Evaluations. Behavioral and Brain Sciences, 14 (I), 140-141. Cole, S., Cole, J. R. & Simon, G. A. (1981). Chance and Consensus in Peer Review. Science, 214, 881-886. Cole, S., Rubin, L. & Cole, J. R. (1978). Peer Review in the National Science Foundation. Phase one of a Study. Washington, D. C.: National Academy Press. Cole, S., Simon, G. & Cole, J. R. (1988). Do Journal Rejection Rates Index Consensus? American Sociological Review, 53, 152-156. Collins, P. M. D. (1991). Quantitative Assessment of Departmental Research: A Survey of Academics' Views. London: The Science and Engineering Policy Studies Unit (SEPSU), The Royal Society and The Fellowship of Engineering. Conger, A. J. & Ward, D. G. (1984). Agreement Among 2 x 2 Agreement Indices. Educational and Psychological Measurement, 44, 301-314. Coughlin, E. K. (1988). Scholar who Submitted Bogus Article to Journals may be Disciplined. Chronicle of Higher Education, 35 (10), Al, A7. Crandall, R. (1978). Interrater Agreement on Manuscripts Is Not So Bad! American Psychologist, 33, 623-624. Crane, D. (1967). The Gatekeepers of Science: Some Factors Affecting the Selection of Articles for Scientific Journals. American Sociologist, 2 (4), 195-201. Daft, R. L. (1985). Why I Recommended that Your Manuscript be Rejected and What You can Do about It. In L. L. Cummings & P. J. Frost (Eds.), Publishing in the Organizational Sciences (pp. 193-209). Homewood, Illinois: Irwin. Daniel, H.-D. (1988). Methodische Probleme institutsvergleichender Analysen der Forschungsproduktivitat - Untersucht am Beispiel des Faches Psychologic. In H. -D. Daniel & R. Fisch (Hrsg.), Evaluation von Forschung: Methoden - Ergebnisse - Stellungnahmen (S. 215-241). Konstanz: Universitatsverlag Konstanz. Daniel, H.-D. (1991). Die chemische Forschung im Spiegel bibliometrischer Indikatoren. Nachrichten aus Chemie, Technik und Laboratorium, 39 (9), 978-980. Daniel, H. -D. & Fisch, R. (1986). Forschungsproduktivitat: Indikatoren, statistische Verteilung, GesetzmaBigkeiten. In R. Fisch & H.-D. Daniel (Hrsg.), Messung und Forderung von Forschungsleistung: Person - Team - Institution (S. 151-166). Konstanz: Universitatsverlag Konstanz. Daniel, H.-D. & Fisch, R. (1987). Beitrage der empirischen Wissenschaftsforschung zur hochschulund forschungspolitischen Diskussion: Freiheit oder Bindung der Forschung? - UniversitatsRanglisten - Frauen in der Wissenschaft. In C. Burrichter (Hrsg.), Theorie und Praxis der Wissenschaftsforschung (S. 49-87). Erlangen: Institut fur Gesellschaft und Wissenschaft (Verlag Deutsche Gesellschaft fur zeitgeschichtliche Fragen e. V.).

102

References

Daniel, H. -D. & Fisch, R. (1990). Research Performance Evaluation in the German University Sector. Scientometrics, 19 (5-6), 349-361. Daniel, H.-D. & Fisch, R. (in Druck). Freiheit oder Bindung der Forschung? Konträre Modelle der Wissenschaftsentwicklung und -Steuerung im Meinungsbild der Professoren. In H. MaierLeibnitz (Hrsg.), Umfrage zur Lage der Forschung an den deutschen Hochschulen 1976177 und 1983184. Tübingen: C. H. Beck. Daniel, T. M. (1991). Why Manuscripts are Rejected - With Thanks to our Reviewers (Editorial). Journal of Laboratory and Clinical Medicine, 777(1), 1-2. Deaton, A., Guesnerie, R., Hansen, L. P. & Kreps, D. (1987). Econometrica Operating Procedures. Econometrica, 55 (1), 204-206. DeBakey, L. (1990). Journal Peer Review: Anonymity or Disclosure? Archives ofOphtamology, 108 (3), 345-349. Dewitt, T. W., Nicholson, R. S. & Wilson, M. K. (1980). Science Citation Index and Chemistry. Scientometrics, 2 (4), 265-275. Ebel, R. L. (1951). Estimation of the Reliability of Ratings. Psychometrika, 16 (4), 407-424. Eberley, S. (1986). Social and Cognitive Dimensions of Manuscript Review: An Integrated Model of Publication Outcome. (Unpublished Doctoral Dissertation). Provo, Utah: Brigham Young University. Eberley, S. & Warner, W. K. (1990). Fields or Subfields of Knowledge: Rejection Rates and Agreement in Peer Review. American Sociologist, 21 (3), 217-231. Eckmann, B. (1977). Qualitätskriterien wissenschaftlicher Publikationen. In F.-H. Philipp (Hrsg.), Information und Gesellschaft - Bedingungen wissenschaftlicher Publikation (S. 61-66). Stuttgart: Wissenschaftliche Verlagsgesellschaft. Eimer, E. (1978). Varianzanalyse. Stuttgart: Kohlhammer. Elschenbroich, C. & Salzer, A. (1990). Organometallchemie (3., durchgesehene Auflage). Stuttgart: Teubner. Ernst, E. & Kienbacher, T. (1991). Chauvinism (Correspondence). Nature, 352, 560. Eysenck, H. J. (1980). Editorial. Personality and Individual Differences, 7, 1-2. Eysenck, H. J. & Eysenck, S. B. G. (1992). Peer Review: Advice to Referees and Contributors (Editorial). Personality and Individual Differences, 13 (4), 393-399. Falbe, J. & Regitz, M. (Hrsg.). (1991). Römpp Chemie Lexikon (9., erweiterte und neubearbeitete Auflage). Stuttgart: Thieme. Feinstein, A. R. & Cicchetti, D. V. (1990). High Agreement but Low Kappa: I. The Problems of Two Paradoxes. Journal of Clinical Epidemiology, 43 (6), 543-549. Fifield, D. (1969). 'Nature', 1869-1969. New Scientist, 44 (673), 230-232. Fisher, A. (1989). Seeing Atoms. Popular Science, 102-107. Fiske, D. W. & Fogg, L. (1990). But the Reviewers Are Making Different Criticisms of My Paper! American Psychologist, 45 (5), 591-598. Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions (Chapter 13: The Measurement of Interrater Agreement, pp. 212-236). New York: Wiley. Fleiss, J. L. (1982). Deception in the Study of the Peer-Review Process. Behavioral and Brain Sciences, 5 (2), 210-211. Fleiss, J. L., Cohen, J. & Everitt, B. S. (1969). Large Sample Standard Errors of Kappa and Weighted Kappa. Psychological Bulletin, 72 (5), 323-327. Forscher, B. K. (1980). The Role of the Referee. Scholarly Publishing, 11 (2), 165-169. Funder, D. C. (1987). Errors and Mistakes: Evaluating the Accuracy of Social Judgment. Psychological Bulletin, 707(1), 75-90. Garfield, E. (1976 ff.). SCI Journal Citation Reports: A Bibliometric Analysis of Science Journals in the ISI Data Base. Philadelphia, PA: Institute for Scientific Information.

References

103

Garson, L. R. (1980). Computer-aided Selection of Reviewers and Manuscript Control. Scholarly Publishing, October, 65-74. Ghosh, J. S. (1975). Uncitedness of Articles in 'Nature', a Multidisciplinary Scientific Journal. Information Processing & Management, 11, 165-169. Ghosh, J. S. & Neufeld, M. L. (1974). Uncitedness of Articles in the 'Journal of the American Chemical Society'. Information Storage and Retrieval, 10, 365-369. Giles, M. W., Patterson, D. & Mizell, F. (1989). Discretion in Editorial Decision Making: The Case of the Journal of Politics. PS: Political Science & Politics, 22 (1), 58-62. Glanzel, W. (n.d.). IRWIN - A Statistical Module for Fitting And Mapping Discrete Distributions. Budapest: Hungarian Academy of Sciences, The Library. Glaze, W. H. (1988). Peer Review: A Foundation of Science. Environment, Science & Technology, 22 (3), 235. Glenn, N. D. (1976). The Journal Article Review Process: Some Proposals for Change. American Sociologist, 11, 179-185. Golitz, P. (1990). This manuscript must either be drastically reduced or fully oxidized - Die ,,Angewandte Chemie" im Spannungsfeld der Interessen. Unveroffentlichtes Manuskript eines Vortrags, gehalten im November 1990 beim Ortsverband Frankfurt am Main der Gesellschaft Deutscher Chemiker. Gordon, M. (1978). A study of the evaluation of research papers by primary journals in the UK. University of Leicester: Primary Communications Research Centre. Gordon, M. (1979a). Peer Review in Physics. Physics Bulletin, 30, 112-113. Gordon, M. (1979b). Deficiencies of Scientific Information Access and Output in Less Developed Countries. Journal of the American Society for Information Science, 30, 340-342. Gottfredson, S. D. (1978). Evaluating Psychological Research Reports: Dimensions, Reliability, and Correlates of Quality Judgments. American Psychologist, 33 (10), 920-934. Greenwald, A. G. (1976). An Editorial. Journal of Personality and Social Psychology, 33 (1), 1-7. Grissom, A. (1991). The Highest-Impact, Highest-Influence Chemistry Journals. The Scientist, April 1, 14. Hall, J. N. (1974). Inter-rater Reliability of Ward Rating Scales. British Journal of Psychiatry, 125, 248-255. Hamilton, D. P. (1990). Publishing by - and for? - the Numbers. Science, 250 (4986), 1331-1332. Hamilton, D. P. (1991). Research Papers: Who's Uncited Now? Science, 251 (4989), 25. Hargens, L. L. (1988a). Scholarly Consensus and Journal Rejection Rates. American Sociological Review, 53, 139-151. Hargens, L. L. (1988b). Further Evidence on Field Differences in Consensus from the NSF Peer Review Studies. American Sociological Review, 53, 157-160. Hargens, L. L. (1990). Variation in Journal Peer Review Systems. Journal of the American Medical Association, 263 (10), 1348-1352. Hargens, L. L. (1991). Referee Agreement in Context. Behavioral and Brain Sciences, 14 (1), 150151. Hargens, L. L. & Herting, J. R. (1990). A New Approach to Referees' Assessments of Manuscripts. Social Science Research, 19, 1-16. Hartenstein, W., Boos, M. & Bertl, W. (1988). Entwicklung und Erprobung von Kriterien fiir die Bewertung der Ergebnisse sozialwissenschaftlicher Forschungsprojekte. In H.-D. Daniel & R. Fisch (Hrsg.), Evaluation von Forschung: Methoden - Ergebnisse - Stellungnahmen (S. 397431). Konstanz: Universitatsverlag Konstanz. Hartmann, H. (1991). Kritik als Spielraum: Pladoyer fur neue Orientierungen (Editorial). Soziologische Revue, 14 (2), 142-151.

104

References

Hartmann, H. & Dübbers, E. (1984). Kritik in der Wissenschaftspraxis: Buchbesprechungen und ihr Echo. Frankfurt am Main: Campus. Hartmann, I. (1988). Fachspezifische Beurteilungskriterien von Gutachtern in der Forschungsförderung - dargestellt am Beispiel des Normalverfahrens in der Deutschen Forschungsgemeinschaft. In H.-D. Daniel & R. Fisch (Hrsg.), Evaluation von Forschung: Methoden - Ergebnisse - Stellungnahmen (S. 383-396). Konstanz: Universitätsverlag Konstanz. Hartmann, I. (1990). Begutachtung in der Forschungsförderung - Die Argumente der Gutachter in der Deutschen Forschungsgemeinschaft. Frankfurt/Main: Fischer. Heller, C. & Kirstätter, R. (1989). Vom Autor zum Leser. Der VCH Bogen, 4 (6), 20-22. Hendrick, C. (1976). Editorial Comment. Personality and Social Psychology Bulletin, 2(1), 207-208. Hendrick, C. (1977). Editorial Comment. Personality and Social Psychology Bulletin, 3 (1), 1-2. Hobbs, N. T. (1988). Obligations and Expectations of your Peers: Manuscript Review at the Journal of Range Management'. Journal of Range Management, 41 (5), 368-369. Hornbostel, S. & Neidhardt, F. (1991). Hochschulranhing auf der Basis von Studentenbefragungen. Methodische Anmerkungen zum „Spiegel"-Projekt über Studienbedingungen an deutschen Hochschulen. Unveröffentlichtes Manuskript. Barcelona/Berlin. Horrobin, D. F. (1974). Referees and Research Administrators: Barriers to Scientific Research? British MedicalJournal, 2, 216-218. Horrobin, D. F. (1990). The Philosophical Basis of Peer Review and the Suppression of Innovation. Journal of the American Medical Association, 263 (10), 1438-1441. Ingelfinger, F. J. (1974). Peer Review in Biomedical Publication. American Journal of Medicine, 56, 686-692. Institute for Scientific Information (1984 ff.). Science Citation Index. Philadelphia, PA: Institute for Scientific Information. Jacobson, R. L. (1986). Scholars Fault Journals and College Libraries in Survey by Council of Learned Societies. Chronicle of Higher Education, 32 (23), 1, 21-22. Janeway, C. A. Jr. (1990). JMCI: The Last Issue. Journal of Molecular and Cellular Immunology, 4, 293. Juhasz, S., Calvert, E., Jackson, T., Kronick, D. A. & Shipman, J. (1975). Acceptance and Rejection of Manuscripts. IEEE Transactions on Professional Communication, 18 (3), 177-185. Kean, P. & Ronayne, J. (1972). Preliminary Communications in Chemistry. Journal of Chemical Documentation, 12 (4), 218-220. Kiesler, C. A. (1991). Confusion Between Reviewer Reliability and Wise Editorial and Funding Decisions. Behavioral and Brain Sciences, 14 (I), 151-152. Kimball, A. W. (1954). Short-Cut Formulas for the Exact Partition of %2 in Contingency Tables. Biometrics, 10, 452-458. Knox, F. G. (1981). No Unanimity about Anonymity. Journal of Laboratory and Clinical Medicine, 97(1), 1-3. Kornhuber, H. H. (1988). Mehr Forschungseffizienz durch objektivere Beurteilung von Forschungsleistungen. In H.-D. Daniel & R. Fisch (Hrsg.), Evaluation von Forschung: Methoden - Ergebnisse - Stellungnahmen (S. 361-382). Konstanz: Universitätsverlag Konstanz. Koshland, D. E. Jr. (1987). Fraud in Science. Science, 235 (4785), 141. Koshland, D. E. Jr. (1989). The Process of Publication. Science, 245 (4918), 573. Kraemer, H. C. (1991). Do We Really Want More "Reliable" Reviewers? Behavioral and Brain Sciences, 14(1), 152-154. Kraft, D. H. (1987). The Peer Review Process for the 'Journal of the American Society for Information Science'. Journal of the American Society for Information Science, 38 (2), 81-82. Kretzenbacher, H. L. & Thurmair, M. (1992). Methoden des Textvergleichs zur Beschreibung wissenschaftlicher Textsorten - das Peer Review. In K.-D. Baumann & H. Kalverkämper (Hrsg.),

References

105

Kontrastive Fachsprachenforschung. Tubingen: Narr. Kronick, D. A. (1976). A History of Scientific & Technical Periodicals: The Origins and Development of the Scientific and Technical Press, 1665-1790 (Second Edition). Metuchen, NJ: Scarecrow Press. Laming, D. (1991). Why is the Reliability of Peer Review so Low? Behavioral and Brain Sciences, 14(1), 154-156. Landis, J. R. & Koch, G. G. (1977). The Measurement of Observer Agreement for Categorical Data. Biometrics, 33, 159-174. Lawani, S. M. & Bayer, A. E. (1983). Validity of Citation Criteria for Assessing the Influence of Scientific Publications: New Evidence with Peer Assessment. Journal of the American Society for Information Science, 34 (1), 59-66. Lehman, A., Sail, J. & Cole, M. (1989). JMP™ User's Guide. Cary, NC: SAS Institute Inc. Lempert, R. (1985). From the Editor. Law & Society Review, 19 (4), 529-536. Lewis, L. S. (1975). Scaling the Ivory Tower: Merit and its Limits in Academic Careers (Chapter 3: Professional Evaluation and Letters of Recommendation, pp. 47-76). Baltimore: The Johns Hopkins University Press. Lienert, G. A. (1978). Verteilungsfreie Methoden in der Biostatistik (Zweite, vollig neu bearbeitete Auflage; Band 2). Meisenheim am Glan: Anton Hain. Lienert, G. A. (1987). Schulnoten-Evaluation. Frankfurt am Main: Athenaum. Lindsey, D. (1989). Using Citation Counts as a Measure of Quality in Science - Measuring What's Measurable Rather Than What's Valid. Scientometrics, 15 (3-4), 189-203. Lock, S. (1985). A Difficult Balance: Editorial Peer Review in Medicine. London: Nuffield Provincial Hospitals Trust (reprinted in 1986 by ISI Press). Lock, S. & Smith, J. (1986). Peer Review at Work. Scholarly Publishing, 17 (4), 303-316. Luck, W. A. P. (1981). Vom ,,Wachteramt" uber die Wissenschaft. Physikalische Blatter, 37 (2), 3738. Luck, W. A. P. (1982). Wachter der Wissenschaft. Umschau, Heft 9, 300-302. Luhmann, N. (1968). Selbststeuerung der Wissenschaft. Jahrbuch fur Sozialwissenschaft, 19 (2), 147-170. MacRoberts, M. H. & MacRoberts, B. R. (1988). Problems of Citation Analysis: A Critical Review. Journal of the American Society for Information Science, 40 (5), 342-349. Maher, B. A. (1978). A Reader's, Writer's, and Reviewer's Guide to Assessing Research Reports in Clinical Psychology. Journal of Consulting and Clinical Psychology, 46, 835-838. Mahoney, M. J. (1977). Publication Prejudices: An Experimental Study of Confirmatory Bias in the Peer Review System. Cognitive Therapy and Research, 1 (2), 161-175. Mahoney, M. J. (1985). Open Exchange and Epistemic Progress. American Psychologist, 40 (1), 2939. Mahoney, M. J. (1990). Bias, Controversy, and Abuse in the Study of the Scientific Publication System. Science, Technology, & Human Values, 15 (1), 50-55. Mahoney, M. J., Kazdin, A. E. & Kenigsberg, M. (1978). Getting Published. Cognitive Therapy and Research, 2 (1), 69-70. Marsh, H. W. & Ball, S. (1989). The Peer Review Process Used to Evaluate Manuscripts Submitted to Academic Journals: Interjudgmental Reliability. Journal of Experimental Education, 57 (2), 151-169. Marton, J. (1983). Causes of Low and High Citation Potentials in Science: Citation Analysis of Biochemistry and Plant Physiology Journals. Journal of the American Society for Information Science, 34 (4), 244-246. McDowell, J. M. & Amacher, R. C. (1986). Economic Value of In-House-Editorship. Public Choice, 48,101-112.

106

References

McNutt, R. A., Evans, A. T., Fletcher, R. H. & Fletcher, S. W. (1990). The Effects of Blinding on the Quality of Peer Review. Journal of the American Medical Association, 263 (10), 1371-1376. McTavish, D. G., Cleary, J. D., Brent, E. E., Perman, L. & Knudsen, K. R. (1977). Assessing Research Methodology - The Structure of Professional Assessments of Methodology. Sociological Methods & Research, 6, 3-44. Merton, R. K. (1968). The Matthew Effect in Science. Science, 159 (3810), 56-63. Merton, R. K. (1988). The Matthew Effect in Science, II: Cumulative Advantage and the Symbolism of Intellectual Property. ISIS, 79, 606-623. Monsen, E. R. (1983). Reviewing Manuscripts for the 'Journal': A Brief Guideline. Journal of the American Dietetic Association, 83 (2), 131. Morton, H. C. & Price, A. J. (1989). The ACLS Survey of Scholars: Final Report of Views on Publications, Computers, and Libraries. Washington, D. C.: American Council of Learned Societies. Munzinger, F. & Daniel, H.-D. (1992). Die Forschung in der ehemaligen DDR im Spiegel bibliometrischer Indikatoren: Moglichkeiten und Grenzen von Online-Datenbanken. In W. Neubauer & K.-H. Meier (Hrsg.), Deutscher Dokumentartag 1991: Information und Dokumentation in den 90er Jahren - Neue Herausforderungen, neue Technologien (S. 303-319). Frankfurt am Main: Deutsche Gesellschaft fiir Dokumentation. Nagl, W., Walter, H.-G. & Staud, J. L. (Hrsg.). (1986). Statistische Verfahren der empirischen Sozialforschung in einem Programmpaket. Das Konstanzer Statistische Analysesystem KOSTAS. Konstanz: Universitat Konstanz, Zentrum I (Bildungsforschung), Sonderforschungsbereich 23, Forschungsbericht 47/1. National Academy of Sciences (1982). The Quality of Research in Science. Washington, D. C.: National Academy Press. National Research Council (1987). Improving Research Through Peer Review. Washington, D. C.: National Academy Press. Neidhardt, F. (1988). Selbststeuerung in der Forschungsforderung: Das Gutachterwesen der DFG. Opladen: Westdeutscher Verlag. Neuliep, J. W. & Crandall, R. (1990). Editorial Bias Against Replication Research. Journal of Social Behavior and Personality, 5 (4), 85-90. Neurath, H. & Garson, L. (1979). Computer System for 'Biochemistry'. Biochemistry, 18 (23), 50355037. Parshall, G. W. (1987). Trends and Opportunities for Organometallic Chemistry in Industry. Organometallics, 6 (4), 687-692. Patterson, C. H. (1969). Evaluation of Manuscripts Submitted for Publication. American Psychologist, 24, 73. Patterson, K. & Bailar, J. C. Ill (1985). A Review of Journal Peer Review. In K. S. Warren (Ed.), Selectivity in Information Systems - Survival of the Fittest (pp. 64-82). New York: Praeger. Patterson, S. C., Bailey, M. S., Martinez, V. J. & Angel, S. C. (1987). Report of the Managing Editor of the 'American Political Science Review', 1986-87. PS, 20, 1006-1016. Pendlebury, D. (1988). Seven Chemistry Journals Carrying Lots of Clout. The Scientist, September 19, p. 19. Peters, D. P. & Ceci, S. J. (1982). Peer-Review Practices of Psychological Journals: The Fate of Published Articles, Submitted Again. Behavioral and Brain Sciences, 5, 187-195. Petruzzi, J. M. (1985). Peer Review in ANALYTICAL CHEMISTRY (Editor's Column). Analytical Chemistry, 57 (8), 868 A - 870 A. Polanyi, M. (1966). The Tacit Dimension. New York: Doubleday. Porter, A. L., Chubin, D. E. & Jin, X.-Y. (1988). Citations and Scientific Progress: Comparing Bibliometric Measures with Scientist Judgments. Scientometrics, 13 (3-4), 103-124. Prathap, G. (1989). A Modest Proposal for Glasnost in the Peer Review Process. Current Science, 58

References

107

(20), 1114-1116. Price, A. (1975). Peer and Peerage. In The Encyclopedia Americana (pp. 469-470). New York: Americana Corporation. Rehm, D., Montforts, F.-P., Ockenfeld, M. & Wess, G. (1982). Online-Recherchen in Datenbanken des Chemical Abstracts Service. Weinheim: Verlag Chemie. Relman, A. S. (1980). Are Journals Really Quality Filters? In W. Goffman, J. T. Bruer & K. S. Warren (Eds.), Research on Selective Information Systems (pp. 54-60). New York: Rockefeller Foundation. Rennie, D. (Eds.). (1990). Guarding the Guardians - Research on Editorial Peer Review (Selected Proceedings From the First International Congress on Peer Review in Biomedical Publication). Journal of the American Medical Association, 263 (10), 1311-1441. Roberts, W. C. (1985). Country of Origin of Articles in the AJC in 1984. American Journal of Cardiology, 56 (4), 380. Roediger, H. L. Ill (1987). The Role of Journal Editors in the Scientific Process. In D. N. Jackson & J. P. Rushton (Eds.), Scientific Excellence: Origins and Assessment (pp. 222-252). Newbury Park: Sage. Rosenblatt, A. & Kirk, S. A. (1980). Recognition of Authors in Blind Review of Manuscripts. Journal of Social Service Research, 3 (4), 383-394. Rosenthal, R. (1991). Some Indices of the Reliability of Peer Review. Behavioral and Brain Sciences, 14 (I), 160-161. Ross, P. F. (1980). The Sciences' Self-Management: Manuscript Refereeing, Peer Review, and Goals in Science. The Ross Company, Todd Pond, Lincoln, Massachusetts 01773, USA. Ross, P. F. (1993). Improving Decisions in Science. The Ross Company, Todd Pond, Lincoln, Massachusetts 01773, USA. Ryan, M. (1982). Evaluating Scholarly Manuscripts in Journalism and Communications. Journalism Quarterly, 59, 273-285. Sabine, J. R. (1985). The Error Rate in Biological Publication: A Preliminary Survey. Science, Technology, & Human Values, 10 (1), 62-69. Sachs, L. (1978). Angewandte Statistik. Berlin: Springer. Sachs, L. (1990). Statistische Methoden 2: Planung und Auswertung. Berlin: Springer. Sahner, H. (1982). Zur Selektivitat von Herausgebern: Eine Input-output-Analyse der ,,Zeitschrift fur Soziologie". Zeitschriftfur Soziologie, 11 (1), 82-98. Saracevic, T. (1986). The Refereeing Process at 'Information Processing & Management'. Information Processing & Management, 22 (1), 1-3. Scheffe, H. (1953). A Method for Judging all Contrasts in the Analysis of Variance. Biometrika, 40, 87-104. Schoenberger, M. (1989). GEOPHYSICS Guidelines for Reviewers. Geophysics, 54 (4), 424. Schubert, A., Glanzel, W. & Braun, T. (1989). World Flash on Basic Research: Scientometric Datafiles. A Comprehensive Set of Indicators on 2649 Journals and 96 Countries in all Major Science Fields and Subfields. Scientometrics, 16 (1-6), 3-478. Scott, W. A. (1974). Interreferee Agreement on Some Characteristics of Manuscripts Submitted to the "Journal of Personality and Social Psychology". American Psychologist, 29, 698-702. Shadish, W. R. (1989). The Perception and Evaluation of Quality in Science. In B. Gholson, W. R. Shadish, Jr. , R. A. Neimeyer & A. C. Houts (Eds.), Psychology of Science - Contributions to Metascience (pp. 383-426). Cambridge: Cambridge University Press. Shils, E. (1990). The University World Turned Upside Down: Does Confidentiality of Assessment by Peers Guarantee the Quality of Academic Appointment? Minerva, 28 (3), 324-334. Siegelman, S. S. (1991). Assassins and Zealots: Variations in Peer Review (Editor's Page). Radiology, 178,631-642.

108

References

Small, H. (1974). Characteristics of Frequently Cited Papers in Chemistry. Final Report on Contract Number NSF-C795. Smigel, E. O. & Ross, H. L. (1970). Factors in the Editorial Decision. American Sociologist, 5, 1921. Snizek, W. E., Dudley, C. J. & Hughes, J. E. (1982). The Second Process of Peer Review: Some Correlates of Comments Published in the ASR (1947-1979). Scientometrics, 4 (6), 417-430. Spencer, N. J., Hartnett, J. & Mahoney, J. (1986). Problems with Reviews in the Standard Editorial Practice. Journal of Social Behavior and Personality, 1 (1), 21-36. Squires, B. P. (1989). Biomedical Manuscripts: What Editors Want From Authors and Peer Reviewers. Canadian Medical Association Journal, 141, 17-19. Stelzl, I. (1982). Fehler und Fallen der Statistik. Bern: Huber. Sterling, T. D. (1959). Publication Decisions and their Possible Effects on Inferences Drawn from Tests of Significance - or Vice Versa. American Statistical Association Journal, 54, 30-34. Stout, J. W. (1986). THE JOURNAL OF CHEMICAL PHYSICS: The First 50 Years. Annual Review of Physical Chemistry, 37, 1-23. Strauss, S. (1969). Guidelines for Analysis of Research Reports. Journal of Educational Research, 63, 165-169. Stull, G. R. (1989). Peer-Review Process Is Key to Quality Publication. Ceramic Bulletin, 68 (4), 850-852. Tainer, J. A. (1991). Science, Citation, and Funding (Letter). Science, 251 (5000), 1408. Teevan, J. J. (1980). Journal Prestige and Quality of Sociological Articles. American Sociologist, 15, 109-112. Thurmair, M. & Kretzenbacher, H. L. (1991). Kurzvorstellung des Arbeitskomplexes ,,Das PeerReview als Textsorte der Wissenschaftssprache" im Rahmen der Arbeitsgruppe „ Wissenschaftssprache" der Akademie der Wissenschaften zu Berlin, anla'Blich des 3. Kolloquiums im Schwerpunktprogramm ,,Wissenschaftsforschung" der Deutschen Forschungsgemeinschaft vom 25. bis 27. September 1991 an der Universitat Bielefeld. Todorov, R. & Glanzel, W. (1988). Journal citation measures: a concise review. Journal of Information Science, 14, 47-56. Tolman, A., Farrier, N. & Farrier, K. (1988). DynaStat's Kappa Program. Eugene, OR: DynaStat. Twentyman, P. & Selby, P. (1991). A Guide to Editorial Policies and Procedures. Cambridge: Macmillan. Ulrich's International Periodicals Directory 1992-93. (1992). New Providence, NJ: Bowker. Virgo, J. A. (1977). A Statistical Procedure for Evaluating the Importance of Scientific Papers. Library Quarterly, 47 (4), 415-430. Walling, C. (n.d.). The Refereeing of Scientific Manuscripts - Does the Peer System Work? (Unpublished manuscript, reproduced from the collections of the archives of The National Academy of Sciences, Washington, D. C.). Watkins, M. W. (1979). Chance and Interrater Agreement on Manuscripts. American Psychologist, 34, 796-797. Weisheit, R. A. & Regoli, R. M. (1984). Ranking Journals. Scholarly Publishing, (July), 313-325. Wenzel, R. P., Maki, D. G., Crow, S.; Schaffner, W. & McGowan, J. E. Jr. (1990). Duplicate Publication of a Manuscript. Infection Control and Hospital Epidemiology, 11, 341-342. Weymann, A. (1991). Orientierung durch sozialwissenschaftliches Rezensionswesen? (Editorial). Soziologische Revue, 14 (3), 275-279. Whitehurst, G. J. (1982). The Quandary of Manuscript Reviewing. Behavioral and Brain Sciences, 5 (2), 241-242. Whitehurst, G. J. (1984). Interrater Agreement for Journal Manuscript Reviews. American Psychologist, 39, 22-28.

References

109

Whitehurst, G. J. (1985). On Lies, Damned Lies, and Statistics: Measuring Interrater Agreement. American Psychologist, 40, 568-569. Wilkinson, G. (1974). Die lange Suche nach stabilen Alkyl-Ubergangsmetall-Verbindungen (NobelVortrag). Angewandte Chemie, 86 (18), 664-667. Wilson, J. D. (1978). Peer Review and Publication. Journal of Clinical Investigation, 61 (4), 16971701. Wolff, W. M. (1970). A Study of Criteria for Journal Manuscripts. American Psychologist, 25, 636639. Wright, R. D. (1970). Truth and its Keepers. New Scientist, 45, 402-404. Yalow, R. S. (1978). Radioimmunoassay: A Probe for the Fine Structure of Biologic Systems (Nobel Lecture). Science, 200 (4347), 1236-1245. Yotopoulos, P. A. (1961). Institutional Affiliation of the Contributors to Three Professional Journals. American Economic Review, 51, 655-670. Zentall, T. R. (1991). What to Do About Peer Review: Is the Cure Worse Than The Disease? Behavioral and Brain Sciences, 14 (1), 166-167. Ziegler, K. (1964). Folgen und Werdegang einer Erfindung (Nobel-Vortrag). Angewandte Chemie, 76(13),545-553. Ziman, J. M. (1968). Public Knowledge: An Essay Concerning the Social Dimension of Science. Cambridge: Cambridge University Press. Ziman, J. M. (1976). Journal Guidelines (Correspondence). Nature, 259, 264. Zuckerman, H. & Merton, R. K. (197la). Patterns of Evaluation in Science: Institutionalisation, Structure and Functions of the Referee System. Minerva, 9, 66-100. Zuckerman, H. & Merton, R. K. (1971b). Sociology of Refereeing. Physics Today, 24 (July), 28-33.

Index

B

absence from post 17 academic appointment 1,71 academic language 75 f., 91 (note 15) academic status of the author see bias accepted manuscripts see manuscript agreement see reliability agreement matrix see reliability American Chemical Society -"Ethical Guidelines to Publication of Chemical Research" 67 ff. American Economic Review see journals American Journal of Cardiology see journals American Political Science Review see journals American Psychologist see journals American Sociological Review see journals analysis of covariance 54, 57 ff. Analytical Chemistry see journals Angewandte Chemie see journals Angewandte Chemie Advisory Board 11 Angewandte Chemie International Edition in English see journals anonymity of the reviewers 11, 64 ff. applied chemistry and chemical engineering 35 ff., 95 (note 58) Archives of Surgery see journals "assassins" see referee Astrophysical Journal see journals author involvement in the selection of reviewers see suggestions for reform

Behavioral & Brain Sciences see journals Berlin Academy of Sciences 91 (note 15) bias see also referee -academic status of the author 4, 32 ff., 73 -bias: definition 32 -"established researchers" 35 -institutional affiliation of the author 4, 14, 66, 93 (note 37) -Matthew effect 33, 73, 93 (note 39) -national publication bias: definition 42 -nationality of the author 4, 32,42 ff., 73 -preconceived opinions of the reviewer 4 -publication outcome 34, 39,42 ff. -replication studies 4, 93 (note 37) -sex of author 4, 66, 93 (note 37) -statistically not significant findings 4, 93 (note 37) -subject area 4, 32, 35 ff., 94 (notes 43, 46) Binnig, G. 96 f. (note 69) biochemistry 32, 35 ff., 42, 95 (note 58) Biochemistry see journals Biometrica see journals Bonferroni alpha adjustment 93 (note 40) book reviews 92 (note 31) British Journal of Cancer see journals British Journal of Surgery see journals British Medical Journal see journals Brown, H. C. 40

Guardians of Science: Fairness and Reliability of Peer Review. H.-D. Daniel Copyright © 1993 VCH Verlagsgesellschaft mbH, Weinheim ISBN: 3-527-29041-9

112

Index

D

CA sections see subject area Canadian Medical Association Journal see journals cancer research 95 (note 56) category-specific agreement see reliability Ceramic Bulletin see journals chance agreement see reliability chance-corrected agreement see reliability Chemical Abstracts 32, 35 ff., 48 Chemical Abstracts Service 32, 35 ff., 48, 94 (notes 43, 45, 50) Chemical Communications see journals chemotherapy 94 (note 51) chi-square "goodness of fit" test statistic 37, 52,95 (note 59) chi-square test 93 (note 41) circulation 48 ff. citation analysis: search strategy see validity citations of the work in other scientific papers see validity comment see referee comment sheet 11, 19, 91 (note 16), 91 (note 20) communication: definition 9, 90 (note 13) -rejected communications 48 ff. -submitted communications 13 f. competing obligations 17 concordant judgments see reliability configuration frequency analysis 18 conflict of interest 17 consensus see reliability content of reviews see quality criteria corrections 96 (note 65) corresponding author 11, 13, 33 ff., 42 ff. -foreign corresponding author 42 ff. -German corresponding author 42 ff. course of action 15 criteria for scientific quality see quality criteria critical selection 1 Current Anthropology see journals

Deisenhofer, J. 96 f. (note 69) "demoters" see referee Deutsche Forschungsgemeinschaft 64, 89 (note 1) developing countries 46 development of science 89 (note 2) Developmental Review see journals DIMDI 5 I f . discordance see reliability dislocation see reliability double-blind procedure see suggestions for reform duplicate publication 74

editor 3 editorial processing 11 editor-in-chief 9 ff., 15, 17 editor's computer-based information system 64 editor's expectations 67 editor's final decision 15, 32 ff., 42 ff. editor's initial comment 15, 57 Ehrlich,Paul94(note51) Environment, Science & Technology see journals epithets 75 "established researchers" see bias "Ethical Guidelines to Publication of Chemical Research" see American Chemical Society ethical principles for research with human subjects 93 f. (note 42) evaluation form 11 f., 19 f., 22 evolutionary theory of epistemology 1 external review 9, 11, 16, 91 (note 14)

Index

fairness see refereeing false statements of fact see refereeing fate of rejected manuscripts see validity Federal Republic of Germany 40 f. Fischer, Ernst Otto 40 formalization of the reviewing instrument see suggestions for reform frame of reference see referee fraud 89 (note 4), 96 (note 67) full paper 50, 54, 64, 90 f. (note 13), 92 (note 26), 95 f. (note 61)

"Gatekeepers of Science" 1,71 generalist see referee Geophysics see journals German share of world research in chemistry 40 f. Gesellschaft Deutscher Chemiker 9, 11, 90 (note 11) grant application 1 "Guarding the Guardians" 3 guidelines for manuscript review see suggestions for reform H

"Hanging Committee" 56 Helvetica Chimica Acta see journals Herrmann, W. A. 52 "high-impact" journals 51 Huber, R. 96 f. (note 69) Human Pathology see journals I

impression formation see referee Information Processing & Management see journals Information Science and Scientometrics Research Unit 51 f. "in-house periodicals" 13

113

innovative research 3, 96 f. (note 69) inorganic chemicals and reactions 25, 35 ff., 94 (note 46), 95 (note 58) Institute for Scientific Information 32, 48, 5 Iff., 94 (note 43) institutional affiliation of the author see bias "Instructions to Authors for Angewandte Chemie" 9 interdisciplinary research projects 92 (note 31) internal evaluation 15 f. interreferee agreement see reliability intraclass-correlation see reliability IRWIN 95 (note 59) "ISI Journal Impact Factor" see validity

JMP 93 (note 36) Journal of Abnormal Psychology see journals Journal of Chemical Physics see journals Journal of Clinical Investigation see journals Journal of Educational Psychology see journals Journal of Forecasting see journals Journal of General Internal Medicine see journals Journal of Laboratory and Clinical Medicine see journals Journal of Molecular and Cellular Immunology see journals Journal of Molecular Biology see journals Journal of Personality and Social Psychology see journals Journal of Politics see journals Journal of Range Management see journals Journal of The American Chemical Society see journals Journal of The American Dietetic Association see journals Journal of The American Society for Information Science see journals

114

Index

journal ranking 90 (note 10), 95 (note 55) journals -American Economic Review 66 f. -American Journal of Cardiology 42 -American Political Science Review 34, 91 (note 14) -American Psychologist 3, 24 -American Sociological Review 4, 66, 90 (note 12) -Analytical Chemistry 91 (note 18) -Angewandte Chemie 7, 9 ff., 22 ff., 29 ff., 32 ff., 42 ff., 47 ff., 75, 90 (notes 10,11), 91 (notes 18, 20), 95 (note 54), 95 (note 60) -Angewandte Chemie International Edition in English 9, 51 ff. -Archives of Surgery 67 -Astrophysical Journal 94 f. (note 53) -Behavioral & Brain Sciences 65 -Biochemistry 64 -Biometrica 91 (note 14) -British Journal of Cancer 67 -British Journal of Surgery 67 -British Medical Journal 5, 47 ff., 55 f., 91 (note 14) -Canadian Medical Association Journal 67 -Ceramic Bulletin 66 f. -Chemical Communications 90 f. (note 13) -Current Anthropology 65 -Developmental Review 3 -Environment, Science & Technology 67 -Geophysics 67 —Helvetica Chimica Acta 42 -Human Pathology 67 -Information Processing & Management 67 —Journal of Abnormal Psychology 3, 64 -Journal of Chemical Physics 90 (note 13) -Journal of Clinical Investigation 5, 47 ff., 54, 56, 90 (note 9), 94 f. (note 53) —Journal of Educational Psychology 3

-Journal of Forecasting 67 —Journal of General Internal Medicine 65 -Journal of Laboratory and Clinical Medicine 46, 65 -Journal of Molecular and Cellular Immunology 65 -Journal of Molecular Biology 96 (note 69) -Journal of Personality and Social Psychology 3, 66 f. -Journal of Politics 64 —Journal of Range Management 67 —Journal of The American Chemical Society 4, 9,42,48 ff., 92 (note 26), 95 (note 58), 96 (note 65) -Journal of The American Dietetic Association 67 -Journal of The American Society for Information Science 67 -Law & Society Review 4 -Nature 91 (note 14), 95 (note 58), 96 f. (note 69) -New England Journal of Medicine 4, 24, 65, 94 f. (note 53) -New York Times 96 (note 67) -Personality and Individual Differences 67 -Personality and Social Psychology Bulletin 3, 89 (note 6) -Physical Review Letters 66 -Psychologische Rundschau 66 -Publications of The Astronomical Society of the Pacific 94 f. (note 53) -Radiology 29 -Science 75, 90 (note 9), 95 (note 58), 96 (note 67) -Social Problems 24, 93 (note 35) -Sociometry 3 -Tetrahedron Letters 90 f. (note 13) -Zeitschrift fur Soziologie 34 journals that published communications rejected by Angewandte Chemie see validity

Index

judgmentalism see subjective judgmental tendencies

K k χ k table 22 kappa statistic see reliability "Kappa with scores computed as agreement if within one point" see reliability KOSTAS 92 (note 24)

lack of perceived competence 17 Law & Society Review see journals letter see communication letters of recommendation 92 (note 31) linear weights see reliability linguistic analysis 75 f., 91 (note 15) logarithmically transformation of raw citation counts see validity "luck of the reviewer draw" 4

M macromolecular chemistry 35 ff. manuscript 10, 18, 32 -accepted manuscripts 32,42 ff. -manuscript attribute rating form 64 -manuscript submission 10, 18, 38, 42 ff. -published manuscripts 32 -rejected manuscripts 32, 95 f. (note 61) -revised manuscripts 75 -withdrawn manuscripts 15 f., 93 (note 38) Matthew effect see bias mean citation rate see validity Metschnikow, I. 94 (note 51) Michel, H. 96 f. (note 69) multiple pairwise comparison of means 93 (note 40)

115

N National Science Foundation 95 (note 56) nationality of the author see bias Natta, G. 40 Nature see journals New England Journal of Medicine see journals New York Times see journals Nobel prize 40, 90 (note 9), 94 (note 51), 96 f. (note 69) note see communication note of confirmation 10 number of reviewers see suggestions for reform O

one-sided anonymity see refereeing organic chemistry 25, 35 ff., 94 (note 46), 95 (note 58) organometallic and organometalloidal compounds 24 f., 35 ff., 94 (notes 46, 48, 51) -industrial significance 40 f. -scientific significance 40 f. "other" category 38

paradox of poor reliability despite high percentages of agreement see reliability peer: definition 1, 89 (note 3) peer review 1, 2 -abolishment of 3 -and innovative research 3 -process vs. outcome 7,47 -target for criticism 3 percentage of agreement see reliability Personality and Individual Differences see journals Personality and Social Psychology Bulletin see journals physical, inorganic, and analytical chemistry 35

116

Index

physical organic chemistry 25, 35 ff., 94 (note 46) Physical Review Letters see journals plagiarism 65, 96 (note 67) Popper, Sir Karl 1 preconceived opinions of the reviewer see bias predictive validity see validity prescriptive norms of science 5 principle of complementarity see referee priority 64, 90 (note 13) privatdozent 33 ff. product-moment correlation 89 (note 6), 91 (note 21) progress of science 6 Psychologische Rundschau see journals psychology of science 4, 32 ff. publication bias 4 f., 6, 32 ff., 73 f. publication outcome see bias publication profile 35 ff. Publications of The Astronomical Society of the Pacific see journals published manuscripts see manuscript "pushovers" see referee

quality criteria 5, 95 (note 56) -consensus 5 -content of reviews 28, 92 (note 31) -for professional evaluation 3 quality filter 6, 90 (note 12) R

Radiology see journals rating form see evaluation form rebuttal 65 recommendation to publish see referee referee 3, 17 f., 29 ff. -"assassins" 29 -bias 4 f. -choice of referees 4 -"demoters" 29 -false statements of fact 67

-first referee 17, 19 ff., 33 ff., 42 ff. -foreign referee 17 f., 42 ff. -frame of reference 6, 31 -generalist vs. specialist 6 -German referee 17 f., 42 ff. -impression formation 29 -lenient vs. strict referees 4, 29 ff., 72 f., 93 (notes 34, 35) -pairs of referees 18, 21, 23, 26, 32 ff. -"pushovers" 29 -recommendation to publish 3, 6 -reviewer's anonymity see suggestions for reform -reviewer's comment 11, 75 f., 91 (note 15), 96 (note 68) -reviewer's suggestions for revision of manuscript 75 -second referee 17, 19 ff., 33 ff., 42 ff. -selection of referees on the principle of complementarity 28 -subjective judgmental tendencies 4, 6, 32 ff. -third referee ("reviewer-in-chief')11, 17 -typology of referees 29 ff. -"zealots" 29 refereeing 3 -douple-blind procedure see suggestions for reform -fairness 3, 4 f., 6, 29 ff. -guidelines for manuscript review see suggestions for reform -one-sided anonymity: definition 11 -reliability 3, 6, 21 ff. -two-sided anonymity: definition 66 -validity 3, 5 f., 47 ff. rejected manuscripts see manuscripts rejection of groundbreaking papers see validity rejection rate 15 f., 92 f. (note 33), 94 f. (note 53) reliability 3 f., 6, 21 ff., 71 f. -"a form of epistemic warrant" 6 -agreement matrix 26 f. -category-specific agreement 25

Index

-category-specific agreement: definition 22 -chance agreement 21 f. -chance-corrected agreement 21 ff., 26, 28,92 (note 29) -cognitive consensus 28, 92 f. (note 33) -concordant judgments 22, 27 -discordance vs. dislocation 6 -inter-referee agreement 21 ff. -intraclass correlation coefficient 3, 4, 21, 23f., 89 (note 6), 92 (note 32) -intraclass correlation: definition 22 f., 9 !(note 23) -kappa coefficient 23 f., 92 (note 24) -kappa statistic: definition 21 f. -"Kappa with scores computed as agreement if within one point" 22, 26 -lack of agreement 28 -linear weights 22 -"moderately better than a chance result" 4 -of recommendations for rejected manuscripts 26 -of recommendations with respect to acceptance 25 f. -of recommendations with respect to rejection 24 f. -paradox of poor reliability despite high percentages of agreement 28 -percentage of agreement 21, 24, 26, 92 (note 28) -reliability of a group of reviewers 91 (note 22) -reliability of an "average" reviewer 22, 91 (note 22) -sample size 92 (note 27) -section-specific concordance coefficients 24 f. -"the correlation within pairs (of referees) isn't very good" 4 -two-category nominal scales 21 -weighted kappa coefficient 23 f. -weighted kappa statistic 21 -weighted kappa statistic: definition 22, 91 (note 21)

117

reminders 11 replication studies see bias response categories see comment sheet review see refereeing review article 9, 64 reviewer see referee reviewer agreement see reliability "reviewer-in-chief' see referee reviewer's suggestions for revision of manuscript see referee reviews 17 -positive vs. negative 11, 16, 75 f. right of appeal see suggestions for reform Rohrer, H. 96 f. (note 69)

sample size see reliability SAS Institute 93 (note 36) scholarly quality 4 scientific standards 1 Science see journals Science Citation Index 32, 48, 51 ff., 94 (note 45) "Science of Science" 89 (note 1) SCISEARCH 51 section-specific concordance coefficients see reliability selection mechanisms 1 selectivity 38 self-citation see validity self-regulation of science 1, 6, 71 sequential communication number 1Of. sex of author see bias signed reviews 66 "single initial referee" system 89 (note 5) social judgment 6 Social Problems see journals sociology of science 7, 32 ff. Sociometry see journals specialist see referee statistically not significant findings see bias subject area see bias

118

Index

subjective judgmental tendencies see referee submitted manuscripts see manuscript submission suggestions for reform, 63 ff., 74 -author involvement in the selection of reviewers 63 f. -development of guidelines for manuscript review 63, 67 ff. -elimination of reviewer anonymity 63, 64 ff. -formalization of the reviewing instrument 63 f. -increasing the number of reviewers 63 f. -review by a double-blind procedure 63, 66 f., 73 -right of appeal by authors 63, 67

Tetrahedron Letters see journals threshold countries 46 time window for citation see validity top-ranked chemistry journals 10 t-test 93 (note 40) two-category nominal scales see reliability "two initial referee" system 89 (note 5) two-sided anonymity see refereeing

(note 52), 95 (note 54) -journals that published communications rejected by Angewandte Chemie 48 ff. -logarithmically transformation of raw citation counts 54 ff., 95 f. (note 61), 96 (note 64) -mean citation rate 52 ff. -of editor's decision 54 ff. -of editor's initial judgment 57 ff. -of first referees' recommendations 58 ff. -of first and second referees' recommendations combined 60 f. -of second referees' recommendations 59 f. -"papers that became highly cited received generally lower referee evaluations than papers which were cited less frequently" 5 -predictive validity 5 f., 51 ff., 74 -rejection of groundbreaking papers 96 f. (note 69) -self-citation 96 (note 64) -time window for citation 55 ff., 95 (note 60) -uncited publications 52 f., 95 (note 58) VCH Verlagsgesellschaft 9 W

U

"uncensored" publication 71 uncited publications see validity

validity see also refereeing 47 ff., 74 -citation analysis: search strategy 51 ff. -citations of the work in other scientific papers 6, 13,47, 51 ff., 74, 95 (note 56) -fate of rejected manuscripts 5,48 ff. -"ISI Journal Impact Factor" 5, 9 f., 42, 47 ff., 51,74, 90 (note 10), 94

weighted kappa statistic see reliability Werner, H. 39 Wilkinson, G. 40 Wittig, Georg 40

Yalow, R. S. 90 (note 9) Z

"zealots" see referee Zeitschrift fur Soziologie see journals Ziegler, Karl Waldemar 40

Peer Review

Read more

How to Survive Peer Review

Read more

Political Economy of Fairness

Read more

Handbook of Peer-to-Peer Networking

Read more

Handbook of Peer-to-Peer Networking

Read more

Handbook of Peer-to-Peer Networking

Read more

Legitimate Applications of Peer-to-Peer Networks

Read more

Death of a Peer

Read more

Proportional Optimization and Fairness

Read more

Guardians of the Lost

Read more

Guardians of the West

Read more

Guardians of the West

Read more

guardians of destiny 02

Read more

Guardians of the Akasha

Read more

Guardians of the Lost

Read more

Guardians of the Phoenix

Read more

Guardians of the west

Read more

guardians of destiny 03

Read more

End of the Peer

Read more

Guardians of the Lost

Read more

Guardians of the Akasha

Read more

Guardians of the West

Read more

Guardians of the West

Read more

Guardians of the West

Read more

Guardians Of the West

Read more

Guardians of the West

Read more

Guardians of the Phoenix

Read more

Guardians of the West

Read more

Guardians of the West

Read more

Guardians of the Lost

Read more

Recommend Documents

Peer Review

How to Survive Peer Review

How to Survive Peer Review How to Survive Peer Review Elizabeth Wager Publications Consultant, Sideview, Princes Risb...

Political Economy of Fairness

Handbook of Peer-to-Peer Networking

Handbook of Peer-to-Peer Networking Xuemin Shen · Heather Yu · John Buford · Mursalin Akon Editors Handbook of Peer-...

Handbook of Peer-to-Peer Networking

Handbook of Peer-to-Peer Networking Xuemin Shen · Heather Yu · John Buford · Mursalin Akon Editors Handbook of Peer-...

Handbook of Peer-to-Peer Networking

Handbook of Peer-to-Peer Networking Xuemin Shen · Heather Yu · John Buford · Mursalin Akon Editors Handbook of Peer-...

Legitimate Applications of Peer-to-Peer Networks

fmatter.qxd 2/22/2004 10:28 AM Page iii LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS DINESH C. VERMA IBM T. J. ...

Death of a Peer

Proportional Optimization and Fairness

International Series in Operations Research & Management Science Volume 127 Series Editor Frederick S. Hillier Stanfo...

Guardians of the Lost

Volume Two of the Sovereign Stone Trilogy Table of Contents: Map Part I 1 Gustav knew he was being watched. 2 Gustav ...